The cloud — AWS, Azure & GCP for ML
Three providers, three hundred service names, one mental model. What 'the cloud' actually is, the four primitives you'll touch, a Rosetta stone across AWS/Azure/GCP, and why it bills you by the second — and for leaving.
What you'll learn
- What 'the cloud' actually is — renting computers by the second instead of buying them
- The four primitives every data/ML person touches: compute, object storage, managed ML, and a warehouse
- A Rosetta stone mapping the same service across AWS, Azure, and GCP — learn categories, not catalogs
- The managed-vs-serverless spectrum, and the cost model (pay-per-second + egress) that surprises everyone
Before you start
A junior engineer spun up a GPU instance to fine-tune a model on Friday, got it working, closed the laptop, and went home. The instance kept running all weekend — nobody told it to stop. Monday’s surprise was a $900 line item for two days of doing nothing. Nobody was malicious or careless; they just hadn’t internalised the one fact that governs the cloud: you are renting, and the meter runs whether you’re using it or not.
The cloud feels impossibly large — AWS alone lists hundreds of services — but the part a data or ML person actually touches is small, and it’s the same small part on all three big providers. This lesson is the map.
What “the cloud” actually is
Strip away the branding and the cloud is one idea: instead of buying computers, you rent someone else’s by the second. That swap changes everything downstream.
- Capex → opex. No up-front purchase of servers (capital expenditure); you pay as you go (operating expenditure). A startup can rent a $30,000 GPU box for an afternoon for a few dollars.
- Elastic. Need 100 machines for an hour, then zero? You can have exactly that. Some services even scale to zero — you pay nothing when no request is in flight.
- Managed. The provider runs the hard parts — replacing dead disks, patching the OS, replicating your data across buildings — so a tiny team can run infrastructure that used to need a department.
The flip side is the rental trap: the meter never sleeps. A forgotten GPU, or a job that quietly copies a terabyte across regions, bills you all the same. Cost awareness is a cloud skill, not an afterthought.
Three providers, one mental model
There are three providers you’ll meet: AWS (Amazon — the biggest, the most services, the default for startups), Azure (Microsoft — the enterprise default, and the home of the hosted OpenAI models), and GCP (Google — strongest in data and ML, home of BigQuery and Vertex AI). They compete hard, which means they mostly mirror each other.
The trap beginners fall into is trying to memorise the catalog. Don’t. Learn the categories; the names are just translations. Here’s the Rosetta stone for the services you’ll actually use:
| What it is | AWS | Azure | GCP |
|---|---|---|---|
| Rent a server (a VM) | EC2 | Virtual Machines | Compute Engine |
| Run code, no server (serverless) | Lambda | Functions | Cloud Functions / Run |
| Object storage (the “bucket”) | S3 | Blob Storage | Cloud Storage (GCS) |
| Managed Kubernetes | EKS | AKS | GKE |
| Managed ML platform | SageMaker | Azure ML | Vertex AI |
| Data warehouse | Redshift | Synapse | BigQuery |
| Hosted LLM API | Bedrock | Azure OpenAI | Vertex AI (Gemini) |
Read a row, not a column. “Where do I put my files?” is object storage — S3, Blob, or GCS depending on which house you’re in, but the same idea: a near-infinite, cheap, durable key→blob store you reach over HTTP.
The four things you’ll actually touch
Out of the hundreds of services, four cover the vast majority of data/ML work:
- Compute — a machine to run code on. Rent a whole VM (EC2 / Azure VM / Compute Engine) when you want control, or go serverless (Lambda / Functions / Cloud Run) to just hand over a function and let the platform run and scale it.
- Object storage — the bucket (S3 / Blob / GCS). Your datasets, model artifacts, and logs live here. Cheap, durable, accessed by key. This is the backbone; almost everything reads from and writes to it.
- Managed ML — SageMaker / Azure ML / Vertex AI. Training jobs, notebooks, a model registry, and one-click endpoints, without you running Kubernetes by hand (it’s there, just hidden — see Just enough Kubernetes for an ML engineer).
- Data warehouse — Redshift / Synapse / BigQuery. Where analytics SQL runs over billions of rows. (Databricks runs across all three as a cloud-neutral option.)
A spectrum, not a switch
The deeper intuition: cloud services sit on a spectrum of how much the provider runs for you. More managed means less operations work and less control — and usually a higher price per unit of compute, traded for not needing an ops team.
The same job can run anywhere on this line. Move right to delete operations work; move left when you need control or cheaper bulk compute.
There’s no single right answer. A research team training big models lives on the left (raw VMs with GPUs, maximum control). A two-person startup serving a model lives on the right (a managed endpoint or serverless, zero ops). Most teams sit in the middle on the managed ML platforms.
It bills by the second — and for leaving
Two cost facts cause most surprise bills. First, compute is billed by the second it’s running, so an idle-but-on machine is pure waste. Second, storing data is cheap, but moving data out of the cloud (egress) is not — providers charge per gigabyte you download, which is the fee people forget until the invoice. Run the numbers:
The GPU left running costs about $2,200/month; the same box stopped when idle costs about $400 — auto-stop alone saves roughly $1,800. And storing half a terabyte is $11.50, but downloading it once is $45 — egress costs ~4× the storage. “The cloud charges you to leave” is a real design constraint: keep compute and the data it reads in the same region, and turn things off.
Quick check
Quick check
Next
You now have the ground the whole stack runs on. The rest of the Platform & Infrastructure chapter — Kubernetes, Kubeflow, feature stores — is what you build on top of these primitives once one managed endpoint stops being enough.
Practice this in an interview
All questionsApply FinOps to ML by tagging every workload (training jobs, endpoints, GPU pools) by team, model, and environment so cost is attributable, then track unit-economics metrics like cost per prediction or per training run rather than just total spend. Set budgets and alerts, identify idle GPUs and overprovisioned endpoints, and enforce guardrails like autoscaling and instance-type policies. The goal is continuous visibility and accountability so teams optimize cost without killing experimentation.
Production ML monitoring spans four layers: data quality (schema, distributions, null rates), model behaviour (prediction drift, confidence calibration), operational health (latency, error rate, throughput), and business KPIs (conversion, revenue impact). Each layer has different owners and different alert thresholds.
The ML lifecycle spans eight phases: problem framing, data collection and validation, feature engineering, training and experimentation, offline evaluation, deployment, production monitoring, and retirement or retraining. Each phase has distinct owners, artefacts, and failure modes that an MLOps practice must systematise.
Work top-down: start at the model layer with quantization, distillation, or routing cheaper models for easy requests, since model choices drive every downstream cost. Then optimize the runtime with batching, caching, and techniques like prompt caching for LLMs, and finally match infrastructure to the load using autoscaling on queue depth and spot or batch capacity. Track cost per token or per prediction alongside latency percentiles and accuracy so optimizations never silently degrade quality.