In LlamaIndex, what are nodes and query engines, and how is RAG exposed as a tool to an agent?

In LlamaIndex a Node is a chunk of a source document with metadata and relationships, indexed for retrieval; a query engine wraps an index to take a natural-language query, retrieve relevant nodes, and synthesize an answer. RAG-as-a-tool wraps a query engine in a QueryEngineTool so an agent can call it like any other tool, deciding when to retrieve from that knowledge source as part of its reasoning loop.

How do function/tool calling and LLM agents work at a high level?

Tool calling extends the LLM's output space to include structured function invocations. The model emits a JSON object naming a tool and its arguments; the runtime executes the tool and feeds the result back as a new message. An agent is a loop that repeats this cycle — observe, think, act — until the task is complete or a stopping condition is met.

What is vectorless retrieval (PageIndex), and when would you use it over a vector database?

Vectorless retrieval skips embeddings entirely: it organizes a document into a hierarchical tree (titles, summaries, page ranges) and the LLM reasons a path down it — root to chapter to section — then reads only the chosen section to answer. It is structure-aware and explainable, but it spends an LLM call at each hop, so it suits a small number of well-structured documents. A vector database is the opposite trade: one millisecond ANN lookup that scales to millions of chunks but is flat and blind to document structure. Use vectors for large, messy corpora and speed; use PageIndex for bounded structured docs where the answer is found by reasoning about where it lives; combine them by shortlisting with vectors then navigating within a document.

How do you evaluate the quality of an LLM or RAG system?

Evaluation splits into retrieval quality (did we fetch the right chunks?) and generation quality (did the model use them correctly?). Key metrics are context precision/recall for retrieval and faithfulness plus answer relevance for generation. Frameworks like RAGAS automate LLM-as-judge scoring; human annotation anchors the ground truth.

LlamaIndex: indexes, query engines & retrievers — Agentic AI

If LangChain and LangGraph are about orchestrating agents, LlamaIndex is about feeding them — it’s the data framework that turns your messy documents into something an LLM can query. It’s one of the three frameworks the major agent courses teach, and in 2026 it has grown well beyond “a RAG library” into the leading platform for document agents. This lesson is the on-ramp.

The LlamaIndex vocabulary

LlamaIndex’s whole design is that every step of RAG is a named, swappable component. Learn the five nouns and you can read any LlamaIndex codebase:

Documents — raw content loaded from a source (PDF, web page, database) by a loader / reader.
Nodes — the chunks a node parser splits Documents into, each with text + metadata. (LlamaIndex calls chunks “Nodes.”)
Index — a structure built over the Nodes for fast lookup. The default, VectorStoreIndex, embeds each Node and stores the vectors.
Retriever — given a query, fetches the most relevant Nodes from the Index.
Response Synthesizer — composes the final answer from the retrieved Nodes (and a Query Engine wraps retriever + synthesizer into one .query() call).

The five nouns split across two phases — an expensive offline build that hands its index to a cheap online query:

The five lines that do it

The reason LlamaIndex is beginner-friendly: the defaults wire all of that together for you.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("./docs").load_data()   # loader → Documents
index = VectorStoreIndex.from_documents(documents)        # parse → embed → index
query_engine = index.as_query_engine()                    # retriever + synthesizer
response = query_engine.query("What's our refund window?") # retrieve → synthesize
print(response)            # grounded answer
print(response.source_nodes)  # the Nodes it used — citations for free

Five lines, and you have grounded RAG. But because each piece is a component, you can swap any of them — a smarter node parser, a hybrid retriever, a re-ranking synthesizer — without rewriting the rest.

# A miniature LlamaIndex, to feel the component pipeline (no install needed).
class Node:
    def __init__(self, text, meta): self.text, self.meta = text, meta

# 1) Loader -> Documents  2) Node parser -> Nodes
doc = "Enterprise plans: full refund within 30 days. Starter: non-refundable."
nodes = [Node(s.strip(), {"id": i}) for i, s in enumerate(doc.split(".")) if s.strip()]

# 3) Index = Nodes + a (mock) keyword retriever
def retrieve(query, nodes, k=1):
    q = set(query.lower().split())
    scored = [(len(q & set(n.text.lower().split())), n) for n in nodes]
    scored.sort(key=lambda x: -x[0])
    return [n for _, n in scored[:k]]

# 4) Query engine = retrieve + synthesize
def query(q):
    hits = retrieve(q, nodes)
    ctx = " ".join(n.text for n in hits)
    return f"[answer grounded in]: {ctx}", [n.meta for n in hits]

ans, sources = query("enterprise refund")
print(ans)
print("source nodes:", sources)

[answer grounded in]: Enterprise plans: full refund within 30 days
source nodes: [{'id': 0}]

Trace the five components: one Document becomes two Nodes (split on .), the keyword “index” scores each Node against the query, the retriever returns the best one, and the query engine synthesizes an answer plus the source Node’s metadata — citations for free. Swap any piece (a smarter parser, a vector retriever, a re-ranking synthesizer) and the rest is unchanged.

In one breath

LlamaIndex is the data framework — it turns messy documents into a queryable index through five named, swappable components.
Learn the five nouns: Documents (loaded), Nodes (chunks), Index (embedded structure), Retriever (fetches Nodes), Response Synthesizer (composes the answer) — and a Query Engine wraps retriever + synthesizer behind .query().
It splits an offline build (load → parse → embed → index, expensive, once) from online querying (retrieve → synthesize, cheap, per question).
Five default lines get you grounded RAG with source_nodes for free, yet every stage is swappable without rewriting the rest.
It’s not LangChain-or-LlamaIndex — many stacks use LlamaIndex for retrieval, LangGraph for the agent loop.

Quick check

0/3

Q1In LlamaIndex, what is a 'Node'?

Q2What does a Query Engine combine?

Q3Why does LlamaIndex split an offline index build from online querying?

Modern LlamaIndex builds everything — including agents — on event-driven Workflows. And for the messy-PDF problem that breaks naive loaders, see LlamaParse.

LlamaIndex: indexes, query engines & retrievers

What you'll learn

Before you start

The LlamaIndex vocabulary

The five lines that do it

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further