What is tool use or function calling in LLMs, and how do you design good tools for an agent?

Function calling lets an LLM output a structured request to invoke an external function with arguments, which the runtime executes and feeds back, enabling agents to act in the world. Good tool design uses clear names and descriptions, minimal well-typed parameters, narrow single-purpose scope, least privilege, and informative error messages so the model can choose and call them reliably.

What is an AI agent, and how does it differ from a single LLM call?

An agent is an LLM placed in a loop where it reasons, chooses and calls tools or actions, observes the results, and repeats until a goal is met, rather than producing one response and stopping. The key differences are autonomy, tool use, memory and state, and multi-step control flow driven by the model's own decisions.

How do function/tool calling and LLM agents work at a high level?

Tool calling extends the LLM's output space to include structured function invocations. The model emits a JSON object naming a tool and its arguments; the runtime executes the tool and feeds the result back as a new message. An agent is a loop that repeats this cycle — observe, think, act — until the task is complete or a stopping condition is met.

What types of memory do agents use, and what is context engineering and compaction?

Agents use short-term memory (the working context window) and long-term memory stored in vector databases or files, often split into episodic, semantic, and procedural memory. Context engineering is the discipline of curating what goes into the limited context window, and compaction summarizes or prunes older history so the agent retains key information without overflowing the window or degrading from too much noise.

AGENTS.md, Skills & Tool Calls — Agentic AI

You already know how an agent calls a tool — it emits a function name plus JSON arguments and your code runs it. This lesson is the layer above tools. It answers a question the tools lesson doesn’t: when you have a whole project’s worth of conventions, procedures, and actions, where does each piece live, and what does it cost you in context?

The answer is three distinct mechanisms that are easy to conflate because two of them are “just Markdown.” Here is the one-line mental model to carry through the whole lesson:

Instructions you always carry • a manual you pull off the shelf when relevant • a button you press to make something happen.

1. AGENTS.md — always-on instructions

The official site (agents.md) calls it “a README for agents: a dedicated, predictable place to provide the context and instructions to help AI coding agents work on your project.” It is a plain Markdown file at the root of the repository, and it complements your README.md rather than replacing it — the human README is for humans; AGENTS.md holds the agent-facing detail (build steps, tests, conventions) that would clutter it.

Two properties define it:

It is vendor-neutral. The site frames it as “a simple, open format” and says: “Rather than introducing another proprietary file, we chose a name and format that could work for anyone.” It was introduced by OpenAI (it’s the format Codex reads — the Codex docs say plainly, “Codex reads AGENTS.md files before doing any work”), but it is now read by 20-plus agents — Cursor, Gemini CLI, GitHub Copilot, Jules, Devin, goose, Windsurf and more — across 60,000+ open-source projects. In December 2025 it became a founding project of the Linux Foundation’s Agentic AI Foundation, which is about as concrete as “vendor-neutral” gets.
There are no required fields. “AGENTS.md is just standard Markdown. Use any headings you like.” Common sections are project overview, build/test commands, code style, and testing instructions — but nothing is mandatory.

The proprietary cousins are the same idea under a different filename: Claude Code’s CLAUDE.md, Cursor’s rules (.cursor/rules), Windsurf’s .windsurfrules. They all do the job of persistent project instructions loaded every session. (Claude Code’s docs even note it reads CLAUDE.md, not AGENTS.md, and recommend bridging the two with an @AGENTS.md import or a symlink so both tools read one source of truth — vivid proof that these are the “same idea, different filename.”)

2. Skills — on-demand knowledge the model loads itself

A Skill is a folder containing a SKILL.md file that “packages instructions, metadata, and optional resources (scripts, templates) that [the agent] uses automatically when relevant.” Where AGENTS.md is always-on, a skill is conditionally activated. Two things make a skill a skill:

Required YAML frontmatter — exactly two fields. Every SKILL.md begins with name and description:

---
name: pdf-form-filler
description: Fill out PDF forms by mapping field names to values and flattening the result. Use when the user asks to complete, fill in, or populate a PDF form.
---

The constraints (from the official docs): name is at most 64 characters, lowercase letters / numbers / hyphens only; description is non-empty and at most 1024 characters. The single most important authoring rule is that the description must say both what the skill does and when to use it — because the description is the only thing the model sees at first, and it’s what the model matches the task against to decide whether to trigger the skill.

Skills are model-invoked. This is the crux of the distinction from AGENTS.md. The model itself decides to load a skill. If it judges a skill relevant to the current task, it reads the full SKILL.md into context. There is no human flipping a switch and no “always on” — it’s a runtime decision driven by the description. (Some products also let you trigger a skill by name, but automatic, description-based triggering is the defining behaviour.)

The mechanism that makes this cheap is progressive disclosure — three levels of loading:

Only the ~100-token name + description sits in context until the model triggers the skill; then the body loads. (Token figures are the docs’ rough guides, not hard limits.)

This is why “won’t installing 30 skills bloat my context window?” is a misconception: 30 skills cost ~30 small descriptions until one fires. The big body and any bundled scripts only enter context on demand — and a script’s source never enters context at all, only its output.

Skills work across Claude Code (~/.claude/skills/ for personal, .claude/skills/ for project), the Claude API, and claude.ai; Anthropic ships prebuilt ones (pptx, xlsx, docx, pdf) and open-sources more. The SKILL.md folder convention is starting to spread beyond Anthropic — OpenAI’s Codex documents an Agent Skills concept too — so treat it as an Anthropic-originated pattern that is becoming a shared one, while hedging the exact cross-vendor details.

3. Tool calls — runtime actions

The bottom layer is the one the tools lesson covers in depth, so we’ll be brief and only nail the distinction. A tool call (function calling) is when the model, mid-generation, decides it needs a capability and emits a structured request — a tool name plus JSON arguments. Crucially, the model does not run the function. As OpenAI’s docs put it, a tool call is “a special kind of response we can get from the model if it… determines that… it needs to call one of the tools we made available to it.” Your application code (or, for a server-side tool, the provider’s infrastructure) executes it and feeds the result back; then the model continues. A tool is declared with a name, a description of when to use it, and a JSON-Schema for its arguments.

MCP tools are the same primitive, just standardized so any client can discover and invoke tools exposed by any server. The MCP spec states verbatim that tools are “model-controlled, meaning that the language model can discover and invoke tools automatically.” Clients discover them with a tools/list request and invoke them with tools/call. MCP’s three primitives actually sharpen the “who decides?” axis that this whole lesson turns on:

MCP primitive	Who controls it
Tools	model-controlled — the LLM chooses to call them
Resources	application-controlled — the client app supplies the data
Prompts	user-controlled — the user picks the template

So an MCP tool surfaces to the model exactly like a native function call. “MCP is a different thing from tool calling” is a misconception: MCP just standardizes how external servers expose those calls.

How they compose — and when to reach for each

These are layers, not competitors. They stack, and each can point down to the one below:

AGENTS.md sets the standing rules, a Skill supplies the just-in-time playbook, and Tools are the hands that act.

The composition is real and documented: Claude Code’s docs tell you to move a multi-step procedure out of CLAUDE.md and into a skill rather than bloating the always-on file; a skill body, in turn, commonly tells the agent which tools or MCP servers to call and in what order. So the decision rule is about frequency and kind, not preference:

Reach for	When
AGENTS.md / CLAUDE.md	A fact that must hold in every session — build/test commands, project layout, conventions, “always do X.” Keep it short; the Claude Code docs suggest under ~200 lines because longer files reduce adherence.
A Skill	A repeatable, specialized procedure needed only sometimes — “how we fill our compliance PDF,” “our release checklist.” The docs say to move “task-specific instructions that don’t need to be in context all the time” here.
A Tool / MCP	The agent must do something or fetch live data — take an action, hit an API, query a DB, read/write files.

A clean way to remember the failure modes, too: if your instruction file is 3,000 lines, you’ve probably stuffed skills’ worth of procedures into the always-on layer. If you find yourself pasting the same procedure into chat every few days, that’s a skill waiting to be written. And if the agent “knows” what to do but can’t actually make the change, you’re missing a tool.

In one breath

Three different layers, easy to conflate because two are “just Markdown”: AGENTS.md = instructions, Skill = knowledge, tool call = action.
AGENTS.md / CLAUDE.md is always-on, vendor-neutral standing rules loaded every session — it shapes behaviour, it doesn’t enforce it (hard blocks are hooks/permissions).
A Skill (SKILL.md, name + description frontmatter) is on-demand knowledge the model itself loads when a task matches its description; progressive disclosure keeps only the ~100-token metadata in context until it triggers.
A tool call is a runtime action — the model emits a name + JSON args, your code (or an MCP server) executes it; MCP tools are the same primitive, just standardized.
Reach by frequency/kind: every-session fact → AGENTS.md; sometimes-needed procedure → Skill; do-something / fetch-live-data → Tool.

This taxonomy is the foundation for the rest of the agent-engineering lessons: the agent harness is what wires these layers together into a loop, and agent protocols are how separate agents expose capability to each other.

Quick check

0/3

Q1An agent needs to know your team's commit-message convention on every single task. Which mechanism is the right home for it?

Q2You install 25 skills. What is the context-window cost before any of them is triggered?

Q3Transfer: a SKILL.md for 'fill our quarterly compliance PDF' triggers. Its body says to call read_file on the template, then write_file on the result. In this single request, which layer EXECUTES the file write, and what decided to load the skill?

You now have the taxonomy: always-on instructions, on-demand knowledge, runtime actions. The next question is what runs the loop that loads AGENTS.md, matches a task to a skill, and dispatches tool calls — that’s the agent harness.

AGENTS.md, Skills & Tool Calls

What you'll learn

Before you start

1. AGENTS.md — always-on instructions

2. Skills — on-demand knowledge the model loads itself

3. Tool calls — runtime actions

How they compose — and when to reach for each

In one breath

Quick check

Quick check

Next

Sign in to track your progress

Practice this in an interview

Related lessons

Explore further