南极磁场 發表於 2026-4-14 09:02:00

LlamaIndex高级RAG实战:从检索增强到知识图谱问答

<h2>一、RAG的局限与高级RAG</h2>
<p>基础RAG(检索增强生成)存在明显短板:检索精度低、缺乏多跳推理、无法处理复杂查询。高级RAG通过查询改写、重排序、知识图谱增强等技术,将RAG从简单检索提升到深度问答。LlamaIndex是构建高级RAG系统的首选框架,提供丰富的索引结构和检索策略。</p>
<h2>二、LlamaIndex核心架构</h2>
<pre><code>核心组件:
- Document/Node:文档与分片
- Index:索引(向量/关键词/知识图谱)
- Retriever:检索器
- ResponseSynthesizer:响应合成器
- QueryEngine:查询引擎
- Tool/Agent:工具与智能体</code></pre>
<h2>三、环境搭建</h2>
<pre><code>pip install llama-index llama-index-llms-openai
pip install llama-index-embeddings-openai
pip install llama-index-graph-stores-nebula

import os
os.environ["OPENAI_API_KEY"] = "your-api-key"</code></pre>
<h2>四、基础RAG vs 高级RAG对比</h2>
<pre><code>from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# 基础RAG
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("什么是微服务架构?")
print(response)

# 高级RAG:带检索后处理
from llama_index.core.postprocessor import SentenceTransformerRerank

rerank = SentenceTransformerRerank(top_n=3, model="cross-encoder/ms-marco-MiniLM-L-2-v2")
query_engine = index.as_query_engine(
    similarity_top_k=10,         # 先检索10个
    node_postprocessors=    # 再重排取前3
)
response = query_engine.query("微服务和单体架构的核心区别是什么?")
print(response)</code></pre>
<h2>五、查询改写:HyDE技术</h2>
<pre><code>from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine import TransformQueryEngine

# HyDE:先让LLM生成假设性文档,再用假设文档做检索
hyde = HyDEQueryTransform(include_original=True)
query_engine = index.as_query_engine()
hyde_query_engine = TransformQueryEngine(query_engine, hyde)

# 对比效果
question = "如何设计高并发系统?"
normal = query_engine.query(question)
hyde_result = hyde_query_engine.query(question)
print("普通RAG:", normal)
print("HyDE RAG:", hyde_result)</code></pre>
<h2>六、多跳推理:子问题分解</h2>
<pre><code>from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# 为不同文档集创建独立索引
sql_index = VectorStoreIndex.from_documents(sql_docs)
java_index = VectorStoreIndex.from_documents(java_docs)

sql_tool = QueryEngineTool(
    query_engine=sql_index.as_query_engine(),
    metadata=ToolMetadata(name="sql_docs", description="SQL优化相关文档")
)
java_tool = QueryEngineTool(
    query_engine=java_index.as_query_engine(),
    metadata=ToolMetadata(name="java_docs", description="Java性能优化文档")
)

# 子问题分解引擎
sub_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=)

# 复杂查询会自动分解为子查询
response = sub_engine.query(
    "如何优化Java应用中的数据库查询性能?需要同时考虑Java层面和SQL层面"
)
print(response)</code></pre>
<h2>七、知识图谱RAG</h2>
<pre><code>from llama_index.core import KnowledgeGraphIndex
from llama_index.core.graph_stores import SimpleGraphStore

# 构建知识图谱索引
graph_store = SimpleGraphStore()
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=5,
    graph_store=graph_store,
    include_embeddings=True
)

# 知识图谱查询(支持多跳关系推理)
kg_query_engine = kg_index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=5
)

response = kg_query_engine.query(
    "Spring Boot自动配置的完整流程是什么?涉及哪些核心注解?"
)
print(response)</code></pre>
<h2>八、混合检索:向量+关键词+知识图谱</h2>
<pre><code>from llama_index.core.retrievers import QueryFusionRetriever

# 向量检索器
vector_retriever = index.as_retriever(similarity_top_k=5)
# 关键词检索器
keyword_retriever = index.as_retriever(
    similarity_top_k=5,
    retriever_mode="keyword"
)

# 融合检索器(Reciprocal Rank Fusion)
fusion_retriever = QueryFusionRetriever(
    retrievers=,
    num_queries=3,   # 生成3个改写查询
    similarity_top_k=10,
    mode="reciprocal_rerank"
)

nodes = fusion_retriever.retrieve("分布式事务如何保证一致性?")
for node in nodes:
    print(f"Score: {node.score:.4f} | {node.text[:80]}")</code></pre>
<h2>九、与Spring Boot集成</h2>
<pre><code>@Service
public class AdvancedRAGService {
    private final RestTemplate restTemplate = new RestTemplate();
    @Value("${llama-index.service.url}")
    private String llamaServiceUrl;

    public String query(String question, String mode) {
      Map&lt;String, Object&gt; body = Map.of(
            "question", question,
            "mode", mode,          // basic / hyde / sub_question / kg
            "top_k", 5
      );
      HttpHeaders headers = new HttpHeaders();
      headers.setContentType(MediaType.APPLICATION_JSON);
      HttpEntity&lt;Map&lt;String, Object&gt;&gt; entity = new HttpEntity&lt;&gt;(body, headers);
      ResponseEntity<map> resp = restTemplate.exchange(
            llamaServiceUrl + "/query", HttpMethod.POST, entity, Map.class);
      return (String) resp.getBody().get("answer");
    }
}</map></code></pre>
<h2>十、评估与优化</h2>
<pre><code>from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator

# 评估忠实度(回答是否基于检索内容)
faith_eval = FaithfulnessEvaluator(llm=llm)
# 评估相关性(回答是否切题)
rel_eval = RelevancyEvaluator(llm=llm)

# 批量评估
questions = ["什么是RAG?", "向量数据库如何选择?", "知识图谱如何构建?"]
for q in questions:
    response = query_engine.query(q)
    faith_result = faith_eval.evaluate_response(query=q, response=response)
    rel_result = rel_eval.evaluate_response(query=q, response=response)
    print(f"Q: {q}")
    print(f"忠实度: {faith_result.passing}")
    print(f"相关性: {rel_result.passing}")</code></pre>
<h2>十一、最佳实践</h2>
<ol>
<li><strong>分块策略</strong>:根据文档类型选择合适的chunk_size(256-1024)</li>
<li><strong>混合检索</strong>:向量+关键词融合效果优于单一检索</li>
<li><strong>重排序</strong>:检索top_k大,rerank后取小,精度更高</li>
<li><strong>知识图谱</strong>:多跳推理场景必须用知识图谱增强</li>
<li><strong>持续评估</strong>:用Faithfulness和Relevancy指标持续监控质量</li>
</ol>
<h2>十二、总结</h2>
<p>高级RAG是基础RAG的全面升级。通过查询改写(HyDE)、重排序、子问题分解、知识图谱增强、混合检索等技术,可以构建真正可用的企业级问答系统。LlamaIndex提供了完整的工具链,配合Spring Boot可快速落地。</p>

</div>
<div id="MySignature" role="contentinfo">
   

---

📌 **如果觉得文章对你有帮助,欢迎点赞👍收藏⭐!**

💬 有问题或建议?欢迎在评论区留言讨论~

🔗 更多技术干货请关注作者:弥烟袅绕

📚 本文地址:https://www.cnblogs.com/czlws/p/19863106/llamaindex-advanced-rag-knowledge-graph-tutorial<br><br>
来源:https://www.cnblogs.com/czlws/p/19863106/llamaindex-advanced-rag-knowledge-graph-tutorial
頁: [1]
查看完整版本: LlamaIndex高级RAG实战:从检索增强到知识图谱问答