About datarekha
datarekha is a free, open, interactive tutorial site for data engineering, machine learning, and AI. The goal is simple: cover the things you'll actually use in industry, in the order a new hire actually learns them — and give you a runnable sandbox for every concept.
What's here
- Python — from
print("hello")to async LLM apps, Pydantic, and FastAPI. - Data stack — NumPy, Pandas, Matplotlib, Seaborn.
- SQL & PySpark — analytics-grade SQL, warehouse dialects, Spark internals.
- ML & Deep Learning — scikit-learn, XGBoost, PyTorch, Hugging Face.
- MLOps — Docker, CI/CD, MLflow, serving, monitoring.
- Generative & Agentic AI — LLMs in practice, RAG, LangChain, LangGraph, Microsoft Agent Framework, Google ADK.
How it's built
Static site generated with Astro, styled with Tailwind CSS v4, content authored in MDX. Interactive widgets are React islands that hydrate only when needed — the rest of each page ships zero JavaScript, so pages load almost instantly. Python runs in your browser via Pyodide; SQL via SQLite compiled to WebAssembly. Both are loaded lazily so the landing pages stay fast.
Contributing
Lessons are .mdx files. To add one, drop a file at
src/content/lessons/<section>/<slug>.mdx and
add it to src/data/sections.ts. PRs welcome.