datarekha

The cloud — AWS, Azure & GCP for ML

Three providers, three hundred service names, one mental model. What 'the cloud' actually is, the four primitives you'll touch, a Rosetta stone across AWS/Azure/GCP, and why it bills you by the second — and for leaving.

11 min read Intermediate MLOps Lesson 22 of 28

What you'll learn

  • What 'the cloud' actually is — renting computers by the second instead of buying them
  • The four primitives every data/ML person touches: compute, object storage, managed ML, and a warehouse
  • A Rosetta stone mapping the same service across AWS, Azure, and GCP — learn categories, not catalogs
  • The managed-vs-serverless spectrum, and the cost model (pay-per-second + egress) that surprises everyone

Before you start

A junior engineer spun up a GPU instance to fine-tune a model on Friday, got it working, closed the laptop, and went home. The instance kept running all weekend — nobody told it to stop. Monday’s surprise was a $900 line item for two days of doing nothing. Nobody was malicious or careless; they just hadn’t internalised the one fact that governs the cloud: you are renting, and the meter runs whether you’re using it or not.

The cloud feels impossibly large — AWS alone lists hundreds of services — but the part a data or ML person actually touches is small, and it’s the same small part on all three big providers. This lesson is the map.

What “the cloud” actually is

Strip away the branding and the cloud is one idea: instead of buying computers, you rent someone else’s by the second. That swap changes everything downstream.

  • Capex → opex. No up-front purchase of servers (capital expenditure); you pay as you go (operating expenditure). A startup can rent a $30,000 GPU box for an afternoon for a few dollars.
  • Elastic. Need 100 machines for an hour, then zero? You can have exactly that. Some services even scale to zero — you pay nothing when no request is in flight.
  • Managed. The provider runs the hard parts — replacing dead disks, patching the OS, replicating your data across buildings — so a tiny team can run infrastructure that used to need a department.

The flip side is the rental trap: the meter never sleeps. A forgotten GPU, or a job that quietly copies a terabyte across regions, bills you all the same. Cost awareness is a cloud skill, not an afterthought.

Three providers, one mental model

There are three providers you’ll meet: AWS (Amazon — the biggest, the most services, the default for startups), Azure (Microsoft — the enterprise default, and the home of the hosted OpenAI models), and GCP (Google — strongest in data and ML, home of BigQuery and Vertex AI). They compete hard, which means they mostly mirror each other.

The trap beginners fall into is trying to memorise the catalog. Don’t. Learn the categories; the names are just translations. Here’s the Rosetta stone for the services you’ll actually use:

What it isAWSAzureGCP
Rent a server (a VM)EC2Virtual MachinesCompute Engine
Run code, no server (serverless)LambdaFunctionsCloud Functions / Run
Object storage (the “bucket”)S3Blob StorageCloud Storage (GCS)
Managed KubernetesEKSAKSGKE
Managed ML platformSageMakerAzure MLVertex AI
Data warehouseRedshiftSynapseBigQuery
Hosted LLM APIBedrockAzure OpenAIVertex AI (Gemini)

Read a row, not a column. “Where do I put my files?” is object storage — S3, Blob, or GCS depending on which house you’re in, but the same idea: a near-infinite, cheap, durable key→blob store you reach over HTTP.

The four things you’ll actually touch

Out of the hundreds of services, four cover the vast majority of data/ML work:

  1. Compute — a machine to run code on. Rent a whole VM (EC2 / Azure VM / Compute Engine) when you want control, or go serverless (Lambda / Functions / Cloud Run) to just hand over a function and let the platform run and scale it.
  2. Object storage — the bucket (S3 / Blob / GCS). Your datasets, model artifacts, and logs live here. Cheap, durable, accessed by key. This is the backbone; almost everything reads from and writes to it.
  3. Managed ML — SageMaker / Azure ML / Vertex AI. Training jobs, notebooks, a model registry, and one-click endpoints, without you running Kubernetes by hand (it’s there, just hidden — see Just enough Kubernetes for an ML engineer).
  4. Data warehouse — Redshift / Synapse / BigQuery. Where analytics SQL runs over billions of rows. (Databricks runs across all three as a cloud-neutral option.)

A spectrum, not a switch

The deeper intuition: cloud services sit on a spectrum of how much the provider runs for you. More managed means less operations work and less control — and usually a higher price per unit of compute, traded for not needing an ops team.

How much does the provider run for you?Virtual machinerent a bare boxyou manage everythingContainersyou package the appK8s scales itManaged MLtraining + servingprovider runs infraServerlessjust a functionscales to zeromore control · more opsless ops · pay-per-use

The same job can run anywhere on this line. Move right to delete operations work; move left when you need control or cheaper bulk compute.

There’s no single right answer. A research team training big models lives on the left (raw VMs with GPUs, maximum control). A two-person startup serving a model lives on the right (a managed endpoint or serverless, zero ops). Most teams sit in the middle on the managed ML platforms.

It bills by the second — and for leaving

Two cost facts cause most surprise bills. First, compute is billed by the second it’s running, so an idle-but-on machine is pure waste. Second, storing data is cheap, but moving data out of the cloud (egress) is not — providers charge per gigabyte you download, which is the fee people forget until the invoice. Run the numbers:

The GPU left running costs about $2,200/month; the same box stopped when idle costs about $400 — auto-stop alone saves roughly $1,800. And storing half a terabyte is $11.50, but downloading it once is $45 — egress costs ~4× the storage. “The cloud charges you to leave” is a real design constraint: keep compute and the data it reads in the same region, and turn things off.

Quick check

Quick check

0/3
Q1You need to store 2 TB of training images that your jobs read from repeatedly. Which primitive is that?
Q2A teammate says 'we should use Vertex AI but our company is an AWS shop.' What's the equivalent service on AWS?
Q3Your monthly cloud bill jumped, but your compute usage is flat. Your app serves model files to users worldwide from a bucket in one region. What's the most likely culprit?

Next

You now have the ground the whole stack runs on. The rest of the Platform & Infrastructure chapter — Kubernetes, Kubeflow, feature stores — is what you build on top of these primitives once one managed endpoint stops being enough.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
How do you attribute and control ML spend across teams and models (FinOps for ML)?

Apply FinOps to ML by tagging every workload (training jobs, endpoints, GPU pools) by team, model, and environment so cost is attributable, then track unit-economics metrics like cost per prediction or per training run rather than just total spend. Set budgets and alerts, identify idle GPUs and overprovisioned endpoints, and enforce guardrails like autoscaling and instance-type policies. The goal is continuous visibility and accountability so teams optimize cost without killing experimentation.

What metrics should you monitor for a production ML model, and at what layer?

Production ML monitoring spans four layers: data quality (schema, distributions, null rates), model behaviour (prediction drift, confidence calibration), operational health (latency, error rate, throughput), and business KPIs (conversion, revenue impact). Each layer has different owners and different alert thresholds.

Walk me through the full ML lifecycle from problem definition to model retirement.

The ML lifecycle spans eight phases: problem framing, data collection and validation, feature engineering, training and experimentation, offline evaluation, deployment, production monitoring, and retirement or retraining. Each phase has distinct owners, artefacts, and failure modes that an MLOps practice must systematise.

How would you reduce the cost of serving an ML or LLM model in production without hurting quality?

Work top-down: start at the model layer with quantization, distillation, or routing cheaper models for easy requests, since model choices drive every downstream cost. Then optimize the runtime with batching, caching, and techniques like prompt caching for LLMs, and finally match infrastructure to the load using autoscaling on queue depth and spot or batch capacity. Track cost per token or per prediction alongside latency percentiles and accuracy so optimizations never silently degrade quality.

Related lessons

Explore further

Skip to content