The Relational Model
A relation is just a table — rows are tuples, columns are attributes. Get this vocabulary right and the rest of the DBMS block falls into place.
What you'll learn
- A relation is a table: rows are tuples, columns are attributes
- Schema vs instance: the column definitions vs the actual rows right now
- Degree counts columns (schema); cardinality counts rows (instance)
- Set semantics — pure relational algebra removes duplicate tuples
Before you start
You’ve probably used Excel — every row is a record, every column is a field. A
relational database is just that, taken seriously: rules about what a column
can hold, what makes a row unique, and how tables link up. Before any SQL or
algebra, get the four nouns straight: relation, tuple, attribute, schema.
This vocabulary isn’t just exam trivia — every pandas DataFrame, feature table,
and SQL warehouse you’ll touch as a data engineer is a relation underneath, and
“degree vs cardinality” is exactly the “columns vs rows” distinction you reach
for daily.
The four words
A relation is a table. Each row is a tuple. Each column is an attribute. The schema is the column definitions (names + types); the instance is the actual rows sitting in the table right now.
- Relation — a table.
- Tuple — a row (one record).
- Attribute — a column (a named field).
- Schema
R(A1, A2, …, An)— the column definitions, fixed by design. - Instance — the current set of tuples; changes every insert/delete.
- Degree — number of attributes (columns). A schema property.
- Cardinality — number of tuples (rows). An instance property.
One more rule: pure relational algebra uses set semantics — every tuple is
distinct, duplicates collapse to one. (SQL is the loose cousin that allows
duplicates unless you say DISTINCT; for the relational-model questions, think
sets.)
How GATE asks this
Two patterns. First, vocabulary checks — an MCQ that gives a small table and asks for degree or cardinality, or asks which statement about schema vs instance is true. Second, embedded vocabulary — later questions on relational algebra or normalization assume you already know what “degree of R” means. Both are easy marks if the four words are reflexive.
Worked example
Take Students(id, name, age, dept) and suppose 50 students are currently
enrolled.
- Degree = number of attributes in the schema = 4. Doesn’t change when you insert or delete rows — it’s set by the schema.
- Cardinality = number of tuples in the current instance = 50. Drops to 49 if one student leaves; rises to 51 if one joins.
If the question becomes “after deleting 3 rows, what is the new degree?”, the answer is still 4 — the schema didn’t change.
Quick check
Quick check
Practice this in an interview
All questionsOLTP systems handle many small, latency-sensitive transactions that read and write a few rows at a time, so they are optimized for fast point lookups and row-level locking. OLAP systems run infrequent but wide analytical queries over millions of rows, so they benefit from columnar storage, bulk scans, and denormalized schemas that minimize joins.
1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.
Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.
A primary key uniquely identifies each row in a table and implicitly creates a unique index; it cannot be NULL. A foreign key in a child table references the primary key of a parent table and enforces referential integrity — the database rejects inserts or updates that reference a non-existent parent row, and rejects parent deletes that would orphan child rows unless a cascade rule is defined.