What is the difference between OLTP and OLAP workloads, and how does that drive database design choices?

OLTP systems handle many small, latency-sensitive transactions that read and write a few rows at a time, so they are optimized for fast point lookups and row-level locking. OLAP systems run infrequent but wide analytical queries over millions of rows, so they benefit from columnar storage, bulk scans, and denormalized schemas that minimize joins.

What are 1NF, 2NF, and 3NF, and when would you intentionally denormalize?

1NF eliminates repeating groups and requires atomic column values. 2NF further removes partial dependencies on a composite key. 3NF removes transitive dependencies — every non-key column must depend on the key, the whole key, and nothing but the key. Denormalization trades update anomalies for read performance, and is appropriate when the read path dominates and write correctness can be enforced at the application layer or with materialized views.

Should you normalize or denormalize tables in a data warehouse, and why?

Data warehouses favor denormalization — wide, flat tables that trade storage for query simplicity and performance. Normalization (splitting tables to eliminate redundancy) reduces storage but multiplies join hops, increasing query complexity and optimizer cost. In columnar warehouses with compression, the storage cost of redundancy is negligible, so denormalized star schemas consistently outperform normalized models for analytical workloads.

What is the difference between a primary key and a foreign key, and what guarantees do they provide?

A primary key uniquely identifies each row in a table and implicitly creates a unique index; it cannot be NULL. A foreign key in a child table references the primary key of a parent table and enforces referential integrity — the database rejects inserts or updates that reference a non-existent parent row, and rejects parent deletes that would orphan child rows unless a cascade rule is defined.

The Relational Model — GATE DA

The last chapter ended by asking what durable, query-able shape could replace the structures that vanish when a program exits — and named the answer: rows and columns, taken seriously. You have probably used a spreadsheet, where every row is a record and every column is a field. A relational database is exactly that idea made rigorous: precise rules about what a column may hold, what makes a row unique, and how tables link to one another. Before any SQL or algebra, get the four nouns straight: relation, tuple, attribute, schema. This vocabulary is not just exam trivia — every pandas DataFrame, feature table, and SQL warehouse you will touch as a data engineer is a relation underneath, and “degree vs cardinality” is exactly the “columns vs rows” distinction you reach for daily.

The four words

A relation is a table. Each row is a tuple. Each column is an attribute. The schema is the column definitions (names + types); the instance is the actual rows sitting in the table right now.

Degree = column count (a property of the schema). Cardinality = row count (a property of the current instance).

Relation — a table.
Tuple — a row (one record).
Attribute — a column (a named field).
Schema R(A1, A2, …, An) — the column definitions, fixed by design.
Instance — the current set of tuples; changes every insert/delete.
Degree — number of attributes (columns). A schema property.
Cardinality — number of tuples (rows). An instance property.

One more rule that separates the theory from SQL: pure relational algebra uses set semantics — every tuple is distinct, and duplicates collapse to one. (SQL is the loose cousin that allows duplicate rows unless you say DISTINCT; for the relational-model questions, think sets.)

How GATE asks this

Two patterns. First, vocabulary checks — an MCQ that gives a small table and asks for degree or cardinality, or asks which statement about schema vs instance is true. Second, embedded vocabulary — later questions on relational algebra or normalization quietly assume you already know what “degree of R” means. Both are easy marks once the four words are reflexive.

Worked example

Take Students(id, name, age, dept) and suppose 50 students are currently enrolled.

Degree = number of attributes in the schema = 4. It does not change when you insert or delete rows — it is set by the schema.
Cardinality = number of tuples in the current instance = 50. Drops to 49 if one student leaves; rises to 51 if one joins.

If the question becomes “after deleting 3 rows, what is the new degree?”, the answer is still 4 — the schema did not change, only the instance did.

A question to carry forward

The relational model tells you what a table is once you have one — but it is silent on where the tables come from. Faced with a real business — students, the courses they take, the departments that offer them — how do you decide what deserves to be its own table, what is merely a column, and how the tables should connect? Sketching that before writing a single CREATE TABLE is its own discipline. Here is the thread onward: how do you turn a tangle of real-world things and the relationships between them into a clean set of relations — and what happens to a “many-to-many” relationship when it has to become rows and columns?

In one breath

A relation = a table; a tuple = a row; an attribute = a column. The schema = column definitions (design-time); the instance = the current rows.
Degree = number of columns (a schema property — unchanged by insert/delete). Cardinality = number of rows (an instance property — changes with every insert/delete).
Mnemonic: Degree = Definition (columns); cardinality = count of “cards” (rows).
Pure relational algebra uses set semantics — duplicate tuples collapse to one (SQL keeps duplicates unless DISTINCT).
Projection keeps chosen columns and can shrink cardinality (duplicates among the kept columns merge).

Practice

Quick check

0/6

Q1Recall: which statements about the relational model are TRUE? (select all that apply)select all that apply

Q2Recall: which of the following are valid descriptions of a 'schema' (as opposed to an 'instance')? (select all that apply)select all that apply

Q3Trace: a relation Employees(emp_id, name, salary, dept_id, joining_date) has 1200 rows. What is its degree? (integer)numerical answer — type a number

Q4Trace: the relation Courses(course_id, title, credits) currently contains 80 tuples. 12 new courses are inserted and 5 are deleted. What is the cardinality afterwards? (integer)numerical answer — type a number

Q5Apply: a relation Books(isbn, title, author, year, price) starts empty, then 10 rows are inserted (all distinct), then 3 of them are deleted. Schema is unchanged. What are (degree, cardinality)?

Q6Create: a relation R has degree 6 and cardinality 100. After projecting onto 4 of its attributes, the result R' has at most how many tuples in pure relational algebra?

The Relational Model

What you'll learn

Before you start

The four words

How GATE asks this

Worked example

A question to carry forward

In one breath

Practice

Quick check

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further