Section 4 chapters · 18 lessons
PySpark
Understand the cluster, the DAG, the Catalyst optimizer, and the shuffle that ate your job. Learn PySpark the way platform engineers actually run it.
0 / 18 lessons
- 01
Big Data Background
4 lessons - 01 What 'big data' actually means
- 02 Hadoop overview
- 03 HDFS architecture
- 04 The Spark ecosystem
- 02
Spark Architecture
4 lessons - 05 Driver & executors
- 06 Jobs → stages → tasks
- 07 DAG & lazy evaluation
- 08 Shuffles
- 03
DataFrames
5 lessons - 09 DataFrame intro
- 10 Schemas
- 11 Joins in Spark
- 12 Window functions
- 13 Pandas UDFs
- 04
Internals & Optimization
5 lessons - 14 Catalyst optimizer
- 15 Reading execution plans
- 16 Adaptive Query Execution
- 17 Partitioning
- 18 Skew & salting