一、RAG的局限与高级RAG
基础RAG(检索增强生成)存在明显短板:检索精度低、缺乏多跳推理、无法处理复杂查询。高级RAG通过查询改写、重排序、知识图谱增强等技术,将RAG从简单检索提升到深度问答。LlamaIndex是构建高级RAG系统的首选框架,提供丰富的索引结构和检索策略。
二、LlamaIndex核心架构
核心组件:
- Document/Node:文档与分片
- Index:索引(向量/关键词/知识图谱)
- Retriever:检索器
- ResponseSynthesizer:响应合成器
- QueryEngine:查询引擎
- Tool/Agent:工具与智能体
三、环境搭建
pip install llama-index llama-index-llms-openai
pip install llama-index-embeddings-openai
pip install llama-index-graph-stores-nebula
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
四、基础RAG vs 高级RAG对比
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# 基础RAG
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("什么是微服务架构?")
print(response)
# 高级RAG:带检索后处理
from llama_index.core.postprocessor import SentenceTransformerRerank
rerank = SentenceTransformerRerank(top_n=3, model="cross-encoder/ms-marco-MiniLM-L-2-v2")
query_engine = index.as_query_engine(
similarity_top_k=10, # 先检索10个
node_postprocessors=[rerank] # 再重排取前3
)
response = query_engine.query("微服务和单体架构的核心区别是什么?")
print(response)
五、查询改写:HyDE技术
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine import TransformQueryEngine
# HyDE:先让LLM生成假设性文档,再用假设文档做检索
hyde = HyDEQueryTransform(include_original=True)
query_engine = index.as_query_engine()
hyde_query_engine = TransformQueryEngine(query_engine, hyde)
# 对比效果
question = "如何设计高并发系统?"
normal = query_engine.query(question)
hyde_result = hyde_query_engine.query(question)
print("普通RAG:", normal)
print("HyDE RAG:", hyde_result)
六、多跳推理:子问题分解
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
# 为不同文档集创建独立索引
sql_index = VectorStoreIndex.from_documents(sql_docs)
java_index = VectorStoreIndex.from_documents(java_docs)
sql_tool = QueryEngineTool(
query_engine=sql_index.as_query_engine(),
metadata=ToolMetadata(name="sql_docs", description="SQL优化相关文档")
)
java_tool = QueryEngineTool(
query_engine=java_index.as_query_engine(),
metadata=ToolMetadata(name="java_docs", description="Java性能优化文档")
)
# 子问题分解引擎
sub_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=[sql_tool, java_tool])
# 复杂查询会自动分解为子查询
response = sub_engine.query(
"如何优化Java应用中的数据库查询性能?需要同时考虑Java层面和SQL层面"
)
print(response)
七、知识图谱RAG
from llama_index.core import KnowledgeGraphIndex
from llama_index.core.graph_stores import SimpleGraphStore
# 构建知识图谱索引
graph_store = SimpleGraphStore()
kg_index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=5,
graph_store=graph_store,
include_embeddings=True
)
# 知识图谱查询(支持多跳关系推理)
kg_query_engine = kg_index.as_query_engine(
include_text=True,
response_mode="tree_summarize",
embedding_mode="hybrid",
similarity_top_k=5
)
response = kg_query_engine.query(
"Spring Boot自动配置的完整流程是什么?涉及哪些核心注解?"
)
print(response)
八、混合检索:向量+关键词+知识图谱
from llama_index.core.retrievers import QueryFusionRetriever
# 向量检索器
vector_retriever = index.as_retriever(similarity_top_k=5)
# 关键词检索器
keyword_retriever = index.as_retriever(
similarity_top_k=5,
retriever_mode="keyword"
)
# 融合检索器(Reciprocal Rank Fusion)
fusion_retriever = QueryFusionRetriever(
retrievers=[vector_retriever, keyword_retriever],
num_queries=3, # 生成3个改写查询
similarity_top_k=10,
mode="reciprocal_rerank"
)
nodes = fusion_retriever.retrieve("分布式事务如何保证一致性?")
for node in nodes:
print(f"Score: {node.score:.4f} | {node.text[:80]}")
九、与Spring Boot集成
@Service
public class AdvancedRAGService {
private final RestTemplate restTemplate = new RestTemplate();
@Value("${llama-index.service.url}")
private String llamaServiceUrl;
public String query(String question, String mode) {
Map<String, Object> body = Map.of(
"question", question,
"mode", mode, // basic / hyde / sub_question / kg
"top_k", 5
);
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_JSON);
HttpEntity<Map<String, Object>> entity = new HttpEntity<>(body, headers);
ResponseEntity
十、评估与优化
from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator
# 评估忠实度(回答是否基于检索内容)
faith_eval = FaithfulnessEvaluator(llm=llm)
# 评估相关性(回答是否切题)
rel_eval = RelevancyEvaluator(llm=llm)
# 批量评估
questions = ["什么是RAG?", "向量数据库如何选择?", "知识图谱如何构建?"]
for q in questions:
response = query_engine.query(q)
faith_result = faith_eval.evaluate_response(query=q, response=response)
rel_result = rel_eval.evaluate_response(query=q, response=response)
print(f"Q: {q}")
print(f" 忠实度: {faith_result.passing}")
print(f" 相关性: {rel_result.passing}")
十一、最佳实践
- 分块策略:根据文档类型选择合适的chunk_size(256-1024)
- 混合检索:向量+关键词融合效果优于单一检索
- 重排序:检索top_k大,rerank后取小,精度更高
- 知识图谱:多跳推理场景必须用知识图谱增强
- 持续评估:用Faithfulness和Relevancy指标持续监控质量
十二、总结
高级RAG是基础RAG的全面升级。通过查询改写(HyDE)、重排序、子问题分解、知识图谱增强、混合检索等技术,可以构建真正可用的企业级问答系统。LlamaIndex提供了完整的工具链,配合Spring Boot可快速落地。
---
📌 **如果觉得文章对你有帮助,欢迎点赞👍收藏⭐!**
💬 有问题或建议?欢迎在评论区留言讨论~
🔗 更多技术干货请关注作者:弥烟袅绕
📚 本文地址:https://www.cnblogs.com/czlws/p/19863106/llamaindex-advanced-rag-knowledge-graph-tutorial
来源:https://www.cnblogs.com/czlws/p/19863106/llamaindex-advanced-rag-knowledge-graph-tutorial |