LlamaParse — document parsing
Naive PDF text extraction destroys tables, columns, and layout — and poisons your RAG. LlamaParse uses vision models to turn messy documents into clean, LLM-ready markdown.
What you'll learn
- Why naive PDF extraction wrecks tables, columns, and reading order
- How VLM-based parsing produces clean, structured markdown
- Why parsing quality is the silent ceiling on RAG quality
Before you start
Here’s a RAG failure mode nobody warns you about: your retrieval is tuned, your prompts are great, and the answers are still garbage — because the documents were mangled the moment you loaded them. Real-world PDFs are full of tables, multiple columns, headers, and scanned pages, and naive text extraction turns all of that into scrambled soup. LlamaParse exists to fix the very first step.
Why naive extraction fails
A basic PDF text extractor reads the raw text stream, which has no idea about visual structure. The result:
- Tables collapse — rows and columns flatten into a meaningless run of numbers with no association between a label and its value.
- Columns interleave — a two-column page gets read straight across, splicing unrelated sentences together.
- Reading order breaks — headers, footnotes, and captions land wherever they happen to sit in the byte stream.
Garbage in, garbage chunks, garbage retrieval.
How LlamaParse is different
LlamaParse treats each page as an image and uses a vision-language model to read it the way a person would — seeing the table grid, the column boundaries, the heading hierarchy — and emits clean markdown (with real markdown tables). That markdown is what you then split into Nodes and index. Because the structure survives, a chunk like the Enterprise row keeps its price and its refund window together, so retrieval can actually answer “what’s the Enterprise refund window?”
from llama_cloud_services import LlamaParse
from llama_index.core import VectorStoreIndex
# Parse a gnarly PDF into clean markdown documents
docs = LlamaParse(result_type="markdown").load_data("pricing.pdf")
# Then the usual LlamaIndex pipeline — but now on clean, structured text
index = VectorStoreIndex.from_documents(docs)
answer = index.as_query_engine().query("What's the Enterprise refund window?")
print(answer) # correct, because the table survived parsing
Quick check
Quick check
Next
That completes the LlamaIndex cluster. Next, a different production framework: the OpenAI Agents SDK and its handoffs + guardrails model.
Practice this in an interview
All questionsIn LlamaIndex a Node is a chunk of a source document with metadata and relationships, indexed for retrieval; a query engine wraps an index to take a natural-language query, retrieve relevant nodes, and synthesize an answer. RAG-as-a-tool wraps a query engine in a QueryEngineTool so an agent can call it like any other tool, deciding when to retrieve from that knowledge source as part of its reasoning loop.
RAG augments an LLM by retrieving relevant documents from an external knowledge store at query time and feeding them into the prompt as grounding context. A basic pipeline chunks and embeds documents into a vector store, retrieves the top-k most similar chunks for a query, and the LLM generates an answer conditioned on them, reducing hallucination and keeping knowledge current.
RAG couples a retrieval step — fetching relevant documents from an external store — with a generative model so the LLM can answer questions about knowledge it was never trained on. It solves the stale-knowledge and hallucination problems without retraining. The pattern is preferred when the knowledge base changes frequently or contains proprietary data.
Evaluation splits into retrieval quality (did we fetch the right chunks?) and generation quality (did the model use them correctly?). Key metrics are context precision/recall for retrieval and faithfulness plus answer relevance for generation. Frameworks like RAGAS automate LLM-as-judge scoring; human annotation anchors the ground truth.