What are embeddings and why are they central to modern deep learning?

An embedding is a dense, learned vector representation of a discrete or high-dimensional object — a word, image, user, product — in a continuous low-dimensional space. Proximity in embedding space reflects semantic or behavioural similarity, making embeddings a universal interface between raw data and neural networks.

What is Retrieval-Augmented Generation (RAG) and how does a basic RAG pipeline work?

RAG augments an LLM by retrieving relevant documents from an external knowledge store at query time and feeding them into the prompt as grounding context. A basic pipeline chunks and embeds documents into a vector store, retrieves the top-k most similar chunks for a query, and the LLM generates an answer conditioned on them, reducing hallucination and keeping knowledge current.

What is Retrieval-Augmented Generation (RAG) and why is it used?

RAG couples a retrieval step — fetching relevant documents from an external store — with a generative model so the LLM can answer questions about knowledge it was never trained on. It solves the stale-knowledge and hallucination problems without retraining. The pattern is preferred when the knowledge base changes frequently or contains proprietary data.

In LlamaIndex, what are nodes and query engines, and how is RAG exposed as a tool to an agent?

In LlamaIndex a Node is a chunk of a source document with metadata and relationships, indexed for retrieval; a query engine wraps an index to take a natural-language query, retrieve relevant nodes, and synthesize an answer. RAG-as-a-tool wraps a query engine in a QueryEngineTool so an agent can call it like any other tool, deciding when to retrieve from that knowledge source as part of its reasoning loop.

Graphs for ML & Knowledge Graphs — DSA

What you'll learn

How knowledge graphs store facts as subject-predicate-object triples, and why LLMs lean on them

What node2vec does — random walks treated as sentences, then word2vec on top

The message-passing idea behind Graph Neural Networks — each node aggregates its neighbours

How PageRank's random surfer turns the adjacency structure into a score of importance

Every tool in this chapter — BFS, DFS, adjacency lists, edge walks — arrived as a way to solve graph puzzles. It turns out those same plain primitives sit underneath some of the most valuable ML systems running today. This lesson maps the graph ideas you already hold onto their machine-learning counterparts.

Knowledge graphs: facts as triples

A knowledge graph stores the world as triples — (subject, predicate, object). The nodes are entities, and each edge carries a typed label, which is richer than a plain adjacency list:

Three triples. In graph terms this is a directed, labelled multigraph — and you already know how to traverse it.

A language model generates fluent text but has no reliable grip on specific facts — which drug interacts with which, who currently runs a company. GraphRAG (Graph Retrieval-Augmented Generation) grounds it: instead of fetching a flat list of text chunks, the system does a BFS out to two or three hops around the query’s entities and hands the model that structured neighbourhood. The model receives verifiable, related facts rather than ambiguous prose — and the retrieval step is exactly the “print everything within k hops” traversal you already wrote.

Node embeddings: word2vec for graphs

Word2vec learns a vector per word by predicting a word from the words around it. DeepWalk and node2vec take that idea to graphs: run many random walks — a probabilistic stroll along the adjacency list — and treat each walk as a sentence of node IDs. Train word2vec on those sentences, and nodes that keep showing up near each other land close in vector space. Now you can measure similarity between nodes, cluster them, or feed the vectors to a classifier — never touching the raw graph again. The walk is just a randomised neighbour traversal of the structure you already built.

Graph Neural Networks: message passing

A GNN turns “aggregate your neighbours” into a learnable step. Each node starts with a feature vector, and one round of message passing goes: every node sends its vector to its neighbours; each node aggregates what it receives (mean, sum, or max); each node updates itself from its old vector plus that aggregate. Stack k rounds, and a node’s vector carries information from everything within k hops — the same radius idea as BFS depth. Here is the simplest possible layer, mean-aggregation, run twice by hand:

graph = {"A": ["B","C"], "B": ["A","D"], "C": ["A","D","E"], "D": ["B","C"], "E": ["C"]}
features = {"A": 1.0, "B": 3.0, "C": 2.0, "D": 5.0, "E": 4.0}

def message_pass(graph, feat):
    return {node: sum(feat[n] for n in nbrs) / len(nbrs)      # mean of neighbours
            for node, nbrs in graph.items()}

f1 = message_pass(graph, features)
f2 = message_pass(graph, f1)
print("round 0:", features)
print("round 1:", {k: round(v, 3) for k, v in f1.items()})
print("round 2:", {k: round(v, 3) for k, v in f2.items()})

round 0: {'A': 1.0, 'B': 3.0, 'C': 2.0, 'D': 5.0, 'E': 4.0}
round 1: {'A': 2.5, 'B': 3.0, 'C': 3.333, 'D': 2.5, 'E': 2.0}
round 2: {'A': 3.167, 'B': 2.5, 'C': 2.333, 'D': 3.167, 'E': 3.333}

After one round, A holds the average of B and C (its direct neighbours); after two, A’s value has been touched by D and E as well — nodes two hops away. In a real GNN the aggregation is a learned weighted sum with a nonlinearity, but the traversal logic is exactly what you see here.

PageRank: importance flows along links

PageRank scores a node by imagining a random surfer who, at each step, follows a random outgoing edge with probability d (the damping factor, usually 0.85) or teleports to a random node otherwise. The teleport keeps “dangling” nodes from draining all the importance away. The steady-state chance of being at a node is its score, computed by repeating one update — PR(v) = (1−d)/N + d · Σ PR(u)/out_degree(u) over v’s in-neighbours — until the numbers stop moving (power iteration, usually 20-50 rounds). In adjacency-list terms, each round is a single O(V + E) pass over the edges.

Practice

Quick check

0/3

Q1In GraphRAG, why retrieve a subgraph instead of flat text chunks?

Q2In node2vec, what makes an embedding capture a node's neighbourhood?

Q3A GNN has 3 message-passing layers. After all 3, a node's vector reflects information from how far away?

Graphs for ML & Knowledge Graphs

What you'll learn

Before you start

Knowledge graphs: facts as triples

Node embeddings: word2vec for graphs

Graph Neural Networks: message passing

PageRank: importance flows along links

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further