datarekha

Graphs for ML & Knowledge Graphs

How the graph abstractions from this chapter — nodes, edges, BFS, adjacency — quietly power knowledge graphs, node embeddings, GNNs, and GraphRAG in production AI systems.

7 min read Intermediate Data Structures & Algorithms Lesson 23 of 32

What you'll learn

  • How knowledge graphs encode facts as subject–predicate–object triples and why that structure lets LLMs reason over structured data
  • What node2vec / DeepWalk do — random walks treated as sentences, then word2vec-style training to produce node embeddings
  • The message-passing intuition behind Graph Neural Networks — each node aggregates its neighbors, stacking k layers reaches k hops
  • How PageRank's random-surfer model defines node importance through power iteration on the adjacency structure

Before you start

Every algorithm in this chapter — BFS, DFS, adjacency lists, edge traversal — was introduced as a tool for solving graph puzzles. It turns out those same primitives are the foundation of some of the most commercially valuable ML systems running today. This lesson maps the DSA concepts you already know onto their ML counterparts.

Knowledge graphs: facts as triples

A knowledge graph stores the world as a set of triples: (subject, predicate, object).

(Paris,        capital_of,     France)
(Marie Curie,  won,            Nobel Prize in Physics)
(Aspirin,      treats,         Headache)

Nodes are entities (people, places, concepts, drugs). Edges are typed relations — the predicate gives the edge a label, which is richer than an unweighted adjacency list. In graph terms, you already know this structure: it is a directed, labeled multigraph.

Why LLMs care

A large language model generates text from statistical patterns. It has no reliable way to know whether a specific drug–drug interaction exists, or who the current CEO of a company is. Grounding the model in a knowledge graph solves this.

GraphRAG (Graph Retrieval-Augmented Generation) is the pattern: instead of retrieving a flat list of text chunks, the system traverses a subgraph around the query entities and passes that structured context to the model. BFS from a query node out to depth 2 or 3 retrieves the relevant neighborhood. The model gets structured, verifiable facts rather than potentially contradicting prose.

The retrieval step is exactly neighborhood traversal on an adjacency structure — the same operation as printing all nodes reachable within k hops.

Node embeddings: word2vec for graphs

Word2vec learns vector representations of words by training a shallow neural network to predict a word from its context words in a sentence. The insight behind DeepWalk and node2vec is: generate “sentences” over the graph using random walks, then apply word2vec.

The procedure in plain terms:

  1. For each node, run many random walks of length L, following edges at random (node2vec biases the walk toward BFS-like or DFS-like behavior via two hyperparameters).
  2. Treat each walk as a sentence, each node ID as a token.
  3. Train a skip-gram model: predict which nodes appear near a given node across all walks.
  4. The hidden-layer weights become the node embeddings.

Nodes that are structurally close or share similar neighborhoods end up with similar vectors. You can now compute cosine similarity between nodes, cluster them, or feed the embeddings into a downstream classifier — all without ever looking at the raw graph.

The key connection: a random walk is a probabilistic traversal of the adjacency list. It uses the same data structure you have already built.

Graph Neural Networks: message passing

A GNN generalizes the idea of “aggregate your neighbors” into a learnable operation.

Each node starts with a feature vector (e.g., a one-hot encoding, or an embedding). One round of message passing works like this:

  1. Every node sends its current vector to each of its neighbors.
  2. Each node collects all received vectors and aggregates them — mean, sum, or max.
  3. Each node updates its own vector using its current vector and the aggregated result (through a learned linear layer + activation).

Stack k such layers and the node’s vector encodes information from all nodes within k hops. A node 3 hops away influences the representation after 3 rounds — exactly the same radius concept as BFS depth.

This is what makes GNNs powerful for molecular graphs (atoms as nodes, bonds as edges), social graphs, and recommendation graphs. The architecture mirrors the graph’s structure directly.

PageRank assigns a score to every node by modeling a random surfer: at each step, the surfer follows a random outgoing edge with probability d (the damping factor, typically 0.85) or teleports to a random node with probability 1 - d. The teleportation term is necessary because nodes with no outgoing edges — “dangling” nodes — would otherwise drain rank out of the graph entirely and the scores would not converge to a meaningful distribution.

The steady-state probability of being at a node is its PageRank score. The iterative update rule for node v is:

PR(v) = (1 - d) / N  +  d * sum( PR(u) / out_degree(u)  for u in in_neighbors(v) )

Running this update for all nodes repeatedly — a technique called power iteration, where you apply the same linear rule until scores stop changing — converges in roughly 20–50 iterations on most graphs. High-in-degree nodes from high-PageRank sources receive more of the “importance flow.”

In adjacency-list terms: for each node, iterate over its in-neighbors, accumulate their current scores divided by their out-degree, scale by d, add the teleport term. One round is O(V + E) — a single pass over the edge list.

The message-passing kernel, live

This playground runs one round of neighbor-feature aggregation on a small graph. Each node has a one-dimensional feature. The update rule is: new feature = mean of neighbors’ features. That is the simplest possible GNN layer.

Watch the values converge toward the graph’s average over rounds. In a real GNN the aggregation is a learned weighted sum and the update applies a nonlinear transformation — but the traversal logic is identical to what you see here.

Quick check

Quick check

0/3
Q1In GraphRAG, why is a subgraph retrieved instead of flat text chunks?
Q2In node2vec, what makes a node embedding encode structural neighborhood information?
Q3A GNN has 3 message-passing layers. A node's updated representation after all 3 layers aggregates information from how many hops away?

Practice this in an interview

All questions
What are embeddings and why are they central to modern deep learning?

An embedding is a dense, learned vector representation of a discrete or high-dimensional object — a word, image, user, product — in a continuous low-dimensional space. Proximity in embedding space reflects semantic or behavioural similarity, making embeddings a universal interface between raw data and neural networks.

What is Retrieval-Augmented Generation (RAG) and why is it used?

RAG couples a retrieval step — fetching relevant documents from an external store — with a generative model so the LLM can answer questions about knowledge it was never trained on. It solves the stale-knowledge and hallucination problems without retraining. The pattern is preferred when the knowledge base changes frequently or contains proprietary data.

When should you use deep learning vs classical machine learning?

Deep learning wins when data is abundant, inputs are unstructured (images, text, audio), and features are hard to engineer by hand. Classical ML wins on structured tabular data, small datasets, and when interpretability or training speed matter.

What metrics should you monitor for a production ML model, and at what layer?

Production ML monitoring spans four layers: data quality (schema, distributions, null rates), model behaviour (prediction drift, confidence calibration), operational health (latency, error rate, throughput), and business KPIs (conversion, revenue impact). Each layer has different owners and different alert thresholds.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Explore further

Related lessons

Skip to content