datarekha

LlamaIndex: indexes, query engines & retrievers

LlamaIndex is the data framework for LLM apps — it turns documents into a queryable index through swappable components: loaders, node parsers, indexes, retrievers, and response synthesizers.

7 min read Beginner Agentic AI Lesson 20 of 42

What you'll learn

  • The LlamaIndex pipeline — Documents, Nodes, Index, Retriever, Synthesizer
  • Why it splits an offline index build from online querying
  • How every stage is a swappable component you configure

Before you start

If LangChain and LangGraph are about orchestrating agents, LlamaIndex is about feeding them — it’s the data framework that turns your messy documents into something an LLM can query. It’s one of the three frameworks the major agent courses teach, and in 2026 it has grown well beyond “a RAG library” into the leading platform for document agents. This lesson is the on-ramp.

The LlamaIndex vocabulary

LlamaIndex’s whole design is that every step of RAG is a named, swappable component. Learn the five nouns and you can read any LlamaIndex codebase:

  • Documents — raw content loaded from a source (PDF, web page, database) by a loader / reader.
  • Nodes — the chunks a node parser splits Documents into, each with text + metadata. (LlamaIndex calls chunks “Nodes.”)
  • Index — a structure built over the Nodes for fast lookup. The default, VectorStoreIndex, embeds each Node and stores the vectors.
  • Retriever — given a query, fetches the most relevant Nodes from the Index.
  • Response Synthesizer — composes the final answer from the retrieved Nodes (and a Query Engine wraps retriever + synthesizer into one .query() call).

Run the pipeline and watch the offline build hand off to the online query:

The five lines that do it

The reason LlamaIndex is beginner-friendly: the defaults wire all of that together for you.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("./docs").load_data()   # loader → Documents
index = VectorStoreIndex.from_documents(documents)        # parse → embed → index
query_engine = index.as_query_engine()                    # retriever + synthesizer
response = query_engine.query("What's our refund window?") # retrieve → synthesize
print(response)            # grounded answer
print(response.source_nodes)  # the Nodes it used — citations for free

Five lines, and you have grounded RAG. But because each piece is a component, you can swap any of them — a smarter node parser, a hybrid retriever, a re-ranking synthesizer — without rewriting the rest.

Quick check

Quick check

0/3
Q1In LlamaIndex, what is a 'Node'?
Q2What does a Query Engine combine?
Q3Why does LlamaIndex split an offline index build from online querying?

Next

Modern LlamaIndex builds everything — including agents — on event-driven Workflows. And for the messy-PDF problem that breaks naive loaders, see LlamaParse.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
In LlamaIndex, what are nodes and query engines, and how is RAG exposed as a tool to an agent?

In LlamaIndex a Node is a chunk of a source document with metadata and relationships, indexed for retrieval; a query engine wraps an index to take a natural-language query, retrieve relevant nodes, and synthesize an answer. RAG-as-a-tool wraps a query engine in a QueryEngineTool so an agent can call it like any other tool, deciding when to retrieve from that knowledge source as part of its reasoning loop.

How do function/tool calling and LLM agents work at a high level?

Tool calling extends the LLM's output space to include structured function invocations. The model emits a JSON object naming a tool and its arguments; the runtime executes the tool and feeds the result back as a new message. An agent is a loop that repeats this cycle — observe, think, act — until the task is complete or a stopping condition is met.

How do you evaluate the quality of an LLM or RAG system?

Evaluation splits into retrieval quality (did we fetch the right chunks?) and generation quality (did the model use them correctly?). Key metrics are context precision/recall for retrieval and faithfulness plus answer relevance for generation. Frameworks like RAGAS automate LLM-as-judge scoring; human annotation anchors the ground truth.

What is tool use or function calling in LLMs, and how do you design good tools for an agent?

Function calling lets an LLM output a structured request to invoke an external function with arguments, which the runtime executes and feeds back, enabling agents to act in the world. Good tool design uses clear names and descriptions, minimal well-typed parameters, narrow single-purpose scope, least privilege, and informative error messages so the model can choose and call them reliably.

Related lessons

Explore further

Skip to content