Graph Theory & Representations
A graph is just vertices joined by edges. Learn the vocabulary, the handshake fact, and the two ways to store a graph — the bedrock for every BFS/DFS question.
What you'll learn
- A graph G = (V, E); directed vs undirected; degree, in-degree and out-degree
- The handshake fact: the sum of all degrees equals twice the number of edges
- Paths, cycles, connectivity, DAGs, and a tree as a connected acyclic graph with V - 1 edges
- Adjacency matrix (O(V squared) space, O(1) edge check) vs adjacency list (O(V + E) space, best for sparse graphs)
Before you start
A graph is the simplest structure for relationships: a set of vertices (nodes) joined by edges (connections). Friendships, road networks, task dependencies, web links — all graphs. So are the dependency graph (DAG) behind every Spark or Airflow pipeline and the data behind graph neural networks. Before any traversal algorithm makes sense, you need the vocabulary and the two standard ways to store a graph in memory. That is this lesson.
Vertices, edges, and direction
Formally a graph is a pair G = (V, E), where V is the set of vertices and E is
the set of edges. Two flavours:
- Undirected — an edge
{u, v}is symmetric: ifuconnects tov, thenvconnects tou. Facebook friendships. - Directed (a digraph) — an edge
(u, v)is an arrow fromutovand says nothing about the reverse direction. Twitter follows.
Degree counts a vertex’s edges. In an undirected graph the degree of a vertex is the number of edges touching it. In a directed graph you split this into in-degree (arrows arriving) and out-degree (arrows leaving).
Paths, cycles, connectivity, DAGs, trees
- A path is a sequence of vertices where consecutive ones are joined by an edge.
- A cycle is a path that returns to its start (and repeats no edge).
- A graph is connected if there is a path between every pair of vertices.
- A DAG (directed acyclic graph) is a directed graph with no directed cycle — the structure behind task scheduling and the only graph that can be topologically sorted (next lesson).
- A tree is a connected acyclic graph. A key fact: a tree on
Vvertices has exactlyV − 1edges — add one more edge and you create a cycle; remove one and it disconnects.
Two ways to store a graph
How you store a graph decides which operations are cheap. The two standard choices:
- Adjacency matrix — a
V × Vgrid where cell(i, j) = 1if an edge joinsiandj, else 0.O(V²)space, but checking “is there an edgei-j?” isO(1). For an undirected graph the matrix is symmetric. - Adjacency list — for each vertex, store a list of its neighbours.
O(V + E)space, which is far smaller when the graph is sparse (few edges). Listing a vertex’s neighbours is direct; an edge-existence check costs up toO(degree).
Explore the trade-off live — click an edge and watch it light up in both the list and the matrix, and toggle directed/undirected to see the matrix symmetry change:
How GATE asks this
Expect a NAT or MCQ. Common patterns: “given the degree sequence, how many
edges?” (use the handshake fact), “how many edges in a complete graph on n vertices?”
(n(n−1)/2), “how much space does an adjacency matrix need?” (O(V²)), and
terminology checks on DAGs and trees. These are fast marks if the definitions are
reflexes.
Worked example
Take an undirected graph on 4 vertices with edge set
{ (1,2), (1,3), (1,4), (2,4) } — that is 4 edges (the graph drawn above).
Step 1 — degrees. Count edges at each vertex:
deg(1) = 3 (joins 2, 3, 4)
deg(2) = 2 (joins 1, 4)
deg(3) = 1 (joins 1)
deg(4) = 2 (joins 1, 2)
Step 2 — verify the handshake fact. Sum of degrees = 3 + 2 + 1 + 2 = 8, and
2 × |E| = 2 × 4 = 8. They match.
Step 3 — representations. The adjacency matrix is the symmetric grid above; the
adjacency list is 1 → 2,3,4, 2 → 1,4, 3 → 1, 4 → 1,2.
Step 4 — a complete graph. If these 4 vertices were all mutually connected, the
edge count would be C(4, 2) = 4·3/2 = 6. Our graph has only 4 of those 6 possible
edges, so it is not complete.
Quick check
Quick check
Practice this in an interview
All questionsScan every cell. When you find a '1' that hasn't been visited, increment the island count and immediately flood-fill all connected land cells (DFS or BFS) so they won't be counted again. The total number of floods equals the number of islands.
Use a queue (deque) to process nodes layer by layer. At each step, snapshot the current queue length to know exactly how many nodes belong to the current level, drain those, then enqueue their children. The result is a list of lists without any depth-tracking variable.
An embedding is a dense, learned vector representation of a discrete or high-dimensional object — a word, image, user, product — in a continuous low-dimensional space. Proximity in embedding space reflects semantic or behavioural similarity, making embeddings a universal interface between raw data and neural networks.