What is the difference between ETL and ELT, and when should you choose each?
ETL transforms data before loading it into the destination, which was necessary when warehouses were expensive and compute-constrained. ELT loads raw data first and transforms inside the warehouse, leveraging cheap cloud compute and making raw data available for reprocessing. ELT is the default in modern cloud stacks; ETL still makes sense when you must mask sensitive fields before they ever land in the warehouse.
How to think about it
ETL (Extract → Transform → Load) and ELT (Extract → Load → Transform) both move data from source systems into an analytical destination. The difference is where the transformation happens and therefore who owns it.
ETL — transform before landing
A dedicated transformation engine (SSIS, Informatica, Spark job) reads source data, reshapes it, and writes only the clean result to the warehouse. The warehouse never sees raw data.
When it still wins:
- PII / PHI must be masked or tokenized before touching a cloud warehouse governed by a different compliance boundary.
- The destination is a legacy on-prem warehouse with metered storage and slow compute.
- Transformations require ML inference that cannot run inside SQL.
ELT — land first, transform in-warehouse
Tools like Fivetran or Airbyte replicate raw tables into a staging schema. dbt then builds curated marts on top using the warehouse’s own compute.
-- dbt model: marts/orders_daily.sql
SELECT
DATE_TRUNC('day', created_at) AS order_date,
SUM(amount_usd) AS revenue_usd,
COUNT(*) AS order_count
FROM {{ ref('stg_orders') }}
GROUP BY 1
Advantages:
- Raw data is preserved; you can re-run transforms without re-extracting.
- Transformation logic lives in version-controlled SQL, not locked in a GUI tool.
- Warehouse auto-scaling absorbs compute spikes cheaply.