datarekha

AutoML in practice

Let a tool search models, hyperparameters, and preprocessing for you. What AutoML does well as a baseline and prototyping accelerator — and why it's no substitute for understanding the fundamentals.

6 min read Intermediate Machine Learning Lesson 25 of 33

What you'll learn

  • What AutoML automates — model search, tuning, ensembling, preprocessing
  • The main tools (AutoGluon, FLAML) and where each fits
  • When AutoML is the right move, and its real limits

Before you start

You’ve now learned to pick models, engineer features, tune hyperparameters, and ensemble. AutoML automates that whole loop — it searches over models, preprocessing, and hyperparameters, then stacks the best into an ensemble, often beating a hand-built pipeline on tabular data. The skill in 2026 isn’t avoiding AutoML; it’s knowing when to reach for it and how to read what it produces.

What AutoML actually automates

A good tabular AutoML system runs the pipeline you’d build by hand, automatically:

raw dataauto-preprocessencode · impute · scalemodel + HP searchtrees · linear · NNensemble bestfinal model
AutoML automates preprocessing, model + hyperparameter search, and ensembling — the same loop you’d build by hand.

The tools

  • AutoGluon — the tabular benchmark leader. Famously, fit() in three lines, and it routinely tops tabular AutoML comparisons by aggressively stacking diverse models. The go-to for a strong baseline fast.
  • FLAML (Microsoft) — optimized for finding good models with low compute; great when you’re time- or cost-constrained.
  • Cloud AutoML — SageMaker Autopilot, Vertex AI, Azure AutoML wrap the same idea with infrastructure and deployment.
# AutoGluon: a strong tabular baseline in three lines.
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label="target").fit(train_df, time_limit=600)
leaderboard = predictor.leaderboard(test_df)   # every model it tried, ranked
# It auto-encoded features, tried trees/linear/NN, and stacked the best.

The limits — why fundamentals still matter

  • It can’t engineer domain features. AutoML searches over models, not ideas. The groupby-aggregation or ratio that wins the problem has to come from you.
  • It’s a black box by default. You still need interpretability, calibration, and fairness checks — AutoML optimizes a metric, not trustworthiness.
  • Leakage in, leakage out. If your data has leakage, AutoML will happily exploit it and report a fantastic, fake score. It can’t protect you from a badly-framed problem.
  • Cost and opacity. A long search burns compute, and the stacked ensemble it produces can be slow and hard to debug in production.

That’s the real lesson: AutoML raises the floor (a strong baseline is now cheap) but not the ceiling — the framing, features, and judgment that separate good ML from bad are exactly the things it can’t automate.

Quick check

Quick check

0/3
Q1What does a tabular AutoML tool like AutoGluon automate?
Q2What's the single biggest thing AutoML can't do for you?
Q3Your AutoML run reports 99.5% accuracy on a hard problem. What should you suspect first?

Next

That completes the Machine Learning track. To take these models to production — serving, monitoring, versioning, and testing — head into the MLOps section.

Sign in to track your progress

Completed lessons, your XP, level, and streak save to your account — it's free and takes a few seconds.

Practice this in an interview

All questions
What is AutoML, what does it automate, and where does it fall short?

AutoML automates parts of the ML pipeline such as data preprocessing, feature engineering, model selection, hyperparameter tuning, and sometimes neural architecture search, lowering the barrier to building models. It falls short on problem framing, data quality, domain feature engineering, careful evaluation against leakage, fairness, and deployment concerns, which still need human expertise. It's best as an accelerator and strong baseline generator, not a replacement for an ML engineer.

How do you attribute and control ML spend across teams and models (FinOps for ML)?

Apply FinOps to ML by tagging every workload (training jobs, endpoints, GPU pools) by team, model, and environment so cost is attributable, then track unit-economics metrics like cost per prediction or per training run rather than just total spend. Set budgets and alerts, identify idle GPUs and overprovisioned endpoints, and enforce guardrails like autoscaling and instance-type policies. The goal is continuous visibility and accountability so teams optimize cost without killing experimentation.

Walk me through the full ML lifecycle from problem definition to model retirement.

The ML lifecycle spans eight phases: problem framing, data collection and validation, feature engineering, training and experimentation, offline evaluation, deployment, production monitoring, and retirement or retraining. Each phase has distinct owners, artefacts, and failure modes that an MLOps practice must systematise.

How does CI/CD for ML differ from standard software CI/CD, and what stages should an ML pipeline include?

ML CI/CD must validate not just code correctness but also model quality — automated retraining triggers, data validation, model evaluation gates, and canary deployment checks that standard software pipelines have no equivalent for. A regression in model AUC is as much a deployment failure as a 500 error.

Related lessons

Explore further

Skip to content