Reverse ETL

What you'll learn

Why the warehouse's best data is useless stuck in a BI dashboard

What reverse ETL is — syncing modeled tables back out to operational SaaS tools

Why every sync is an idempotent, incremental upsert keyed by an external id

How it differs from ETL/ELT and CDC — direction, target, and source of truth

Everything so far points one way: ETL, ELT, and CDC all move data into the warehouse, where you model it, version it, and store it columnar. And there it sits — joined across every source, enriched with scores and segments, more complete than any single app’s database.

The problem is who can act on it. A sales rep lives in Salesforce; a support agent in Zendesk; a marketer in the ad platforms. None of them open your BI dashboard. Reverse ETL is the last mile — it syncs modeled data back out of the warehouse into the operational tools where people actually do their jobs.

The same data, sent back the other way — out to where people work, not just in for analysis.

The last-mile problem

You build a beautiful churn model that joins product usage, support tickets, and billing into one health_score per account — exactly what a Customer Success Manager needs. But it is a column in a warehouse table, and the CSM never opens the warehouse; they open Salesforce. So the score sits unused while the company keeps churning customers it could have saved. That gap — great data on one side, the people who would act on it on the other — is what reverse ETL (also called operational analytics or data activation) closes.

A sync is the mirror image of ETL: the warehouse is the source, and a SaaS tool’s API is the destination. You write a SELECT against a modeled “gold” table, map its columns to the destination’s fields (health → Health_Score__c), and push the rows — but always as an idempotent upsert keyed by an external id, run incrementally. Those last two words are the discipline. Upserting by a stable external id means a re-run overwrites the matching record in place rather than duplicating it, and pushing only the rows that changed since last time keeps you under the API’s rate limit.

Why it is its own discipline

	ETL / ELT / CDC	Reverse ETL
Direction	sources → warehouse	warehouse → SaaS tools
Destination	a database you control	a third-party API with rate limits
Source of truth	the operational apps	the warehouse model

That “destination is an API” row is what sets reverse ETL apart. You cannot just INSERT into Salesforce — you call its REST API, with rate limits, required fields, validation rules, and its own notion of identity. Dedicated tools (Hightouch, Census) exist to handle exactly that: batching, retries, field mapping, and identity resolution against dozens of finicky APIs. The mechanics are simple — a SELECT and an upsert; the hard, valuable work happened upstream, in the modeling that produced a trustworthy health_score in the first place. Reverse ETL just makes that work reach the people who can use it.

Practice

Quick check

0/3

Q1What problem does reverse ETL specifically solve?

Q2Why is every reverse-ETL sync an idempotent upsert keyed by an external id, run incrementally?

Q3TRANSFER: A team's accurate 'propensity to buy' model has had zero business impact for six months; it runs nightly into a warehouse table. Most likely fix?

Questions about this lesson

What is reverse ETL?

Reverse ETL syncs modeled data from the warehouse back out to operational SaaS tools — Salesforce, Slack, ad platforms — where people actually work. It is the mirror image of ETL: the warehouse is the source and a third-party API is the destination. It closes the last-mile gap so that scores and segments built in the warehouse reach the point of action instead of staying trapped in dashboards.

How is reverse ETL different from ETL?

ETL and CDC move data from operational sources into the warehouse; reverse ETL moves modeled data from the warehouse out to operational tools. The destination is a rate-limited API rather than a database you control, the warehouse is the source of truth, and the unit of work is an idempotent upsert into a SaaS object keyed by an external id.

What tools are used for reverse ETL?

The dedicated platforms are Hightouch and Census, plus a growing set of warehouse-native features. They exist to handle the hard part — batching, retries, field mapping, and identity resolution against dozens of finicky destination APIs — so a sync is reliable and idempotent rather than a brittle one-off integration.

What you'll learn

Before you start

The last-mile problem

Why it is its own discipline

Practice

Quick check

Sign in to track your progress

Questions about this lesson

Practice this in an interview

Related lessons

Explore further