In LlamaIndex, what are nodes and query engines, and how is RAG exposed as a tool to an agent?

In LlamaIndex a Node is a chunk of a source document with metadata and relationships, indexed for retrieval; a query engine wraps an index to take a natural-language query, retrieve relevant nodes, and synthesize an answer. RAG-as-a-tool wraps a query engine in a QueryEngineTool so an agent can call it like any other tool, deciding when to retrieve from that knowledge source as part of its reasoning loop.

What is Retrieval-Augmented Generation (RAG) and how does a basic RAG pipeline work?

RAG augments an LLM by retrieving relevant documents from an external knowledge store at query time and feeding them into the prompt as grounding context. A basic pipeline chunks and embeds documents into a vector store, retrieves the top-k most similar chunks for a query, and the LLM generates an answer conditioned on them, reducing hallucination and keeping knowledge current.

What is Retrieval-Augmented Generation (RAG) and why is it used?

RAG couples a retrieval step — fetching relevant documents from an external store — with a generative model so the LLM can answer questions about knowledge it was never trained on. It solves the stale-knowledge and hallucination problems without retraining. The pattern is preferred when the knowledge base changes frequently or contains proprietary data.

How do you evaluate the quality of an LLM or RAG system?

Evaluation splits into retrieval quality (did we fetch the right chunks?) and generation quality (did the model use them correctly?). Key metrics are context precision/recall for retrieval and faithfulness plus answer relevance for generation. Frameworks like RAGAS automate LLM-as-judge scoring; human annotation anchors the ground truth.

LlamaParse — document parsing — Agentic AI

Here’s a RAG failure mode nobody warns you about: your retrieval is tuned, your prompts are great, and the answers are still garbage — because the documents were mangled the moment you loaded them. Real-world PDFs are full of tables, multiple columns, headers, and scanned pages, and naive text extraction turns all of that into scrambled soup. LlamaParse exists to fix the very first step.

Why naive extraction fails

A basic PDF text extractor reads the raw text stream, which has no idea about visual structure. The result:

Tables collapse — rows and columns flatten into a meaningless run of numbers with no association between a label and its value.
Columns interleave — a two-column page gets read straight across, splicing unrelated sentences together.
Reading order breaks — headers, footnotes, and captions land wherever they happen to sit in the byte stream.

Garbage in, garbage chunks, garbage retrieval.

Same table: naive extraction scrambles label↔value pairs; VLM parsing preserves them as clean markdown.

How LlamaParse is different

LlamaParse treats each page as an image and uses a vision-language model to read it the way a person would — seeing the table grid, the column boundaries, the heading hierarchy — and emits clean markdown (with real markdown tables). That markdown is what you then split into Nodes and index. Because the structure survives, a chunk like the Enterprise row keeps its price and its refund window together, so retrieval can actually answer “what’s the Enterprise refund window?”

from llama_cloud_services import LlamaParse
from llama_index.core import VectorStoreIndex

# Parse a gnarly PDF into clean markdown documents
docs = LlamaParse(result_type="markdown").load_data("pricing.pdf")

# Then the usual LlamaIndex pipeline — but now on clean, structured text
index = VectorStoreIndex.from_documents(docs)
answer = index.as_query_engine().query("What's the Enterprise refund window?")
print(answer)   # correct, because the table survived parsing

In one breath

Naive PDF text extraction reads the raw byte stream with no sense of layout, so tables collapse, columns interleave, and reading order breaks — garbage chunks before retrieval even runs.
LlamaParse treats each page as an image and uses a vision-language model to read it like a person, emitting clean markdown (real markdown tables).
Because structure survives, a chunk keeps its label↔value pairs together — the Enterprise row keeps its price and refund window — so retrieval can actually answer.
Parsing quality is the silent ceiling on RAG — for table/column/scan-heavy corpora, fixing ingestion beats most chunking or re-ranking tweaks.
Fix the first step first: clean parse → split into Nodes → index → query.

Quick check

0/3

Q1Why does naive PDF text extraction hurt RAG quality?

Q2How does LlamaParse produce clean output?

Q3If your RAG answers are poor and your documents are table-heavy PDFs, what should you fix first?

That completes the LlamaIndex cluster. Next, a different production framework: the OpenAI Agents SDK and its handoffs + guardrails model.

LlamaParse — document parsing

What you'll learn

Before you start

Why naive extraction fails

How LlamaParse is different

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further