LLM Agents Ecosystem Handbook
A practical operating manual for building, evaluating, securing, and shipping modern LLM agent systems.
</div>Modern agents are not "a prompt + a tool." They are systems β with identity, memory, skills, tools, MCP integrations, guardrails, observability, evals, and a provider strategy. This handbook teaches the whole stack and ships templates, blueprints, runnable adapters, and curated examples you can adopt today.
What's in this repo
A curated, opinionated, production-oriented handbook in seven parts:
- Concepts β Agent OS, identity, memory, skills, MCP, safety, observability β every layer of the modern agent stack
- Provider ecosystem β adapters + docs for 24+ LLM providers (frontier APIs, fast inference, marketplaces, enterprise clouds, specialty, local runtimes), with a router for fallback chains
- Skills ecosystem β design guide, taxonomy, maturity model, security checklist, and a curated skill catalog
- Prompt engineering β agent prompt patterns, instruction hierarchy, context engineering, prompt-injection defense
- Coding-agent workflows β for Claude Code, Cursor, Codex, Aider, Cline, and custom runtimes β repo instructions, prompts, review checklist, safe refactoring
- Design docs β agent / technical design docs, ADR guide, design reviews, rollout plans, the
DESIGN.mdmachine-readable spec - Curated catalog β 100+ existing agent skeletons, framework comparisons, evaluation tools, tutorials β preserved and improved
Who this is for
| You are⦠| Start at |
|---|---|
| New to agents | docs/beginners_guide.md β agent_os/README.md |
| Building a production agent | blueprints/ β checklists/production_readiness_checklist.md |
| Picking / wiring providers | providers/README.md β providers/provider_matrix.md |
| Comparing frameworks | docs/framework_comparison.md |
| Adding memory / RAG | memory/ β tutorials/rag_tutorials |
| Adding MCP | mcp/ β mcp/mcp_security.md |
| Designing Skills | skills/ β skills/skill_design_guide.md |
| Working with coding agents | coding_agents/ β coding_agents/prompts/ |
| Writing better prompts | prompt_engineering/ |
| Designing & rolling out | design_docs/ |
| Hardening safety/evals | safety/ β evals/ |
| Coding agent reading this repo | llms.txt β llm_wiki/index.md |
Modern Agent Stack
| Layer | Purpose | Where in this repo |
|---|---|---|
| Model / Provider | LLM choice + abstraction + routing | providers/ |
| Orchestration | Agent loops, planning, handoffs | docs/framework_comparison.md, blueprints/ |
| Tool | Function calling and external actions | agent_os/mcp_layer.md |
| MCP | Standardized external context and tools | mcp/ |
| Memory | Durable user/project/semantic memory | memory/ |
| Skills | Reusable, progressive-loading workflows | skills/ |
| Identity | Personality, mission, refusal style | agent_os/agent_identity.md, templates/ |
| Prompt | System prompt design, instruction hierarchy, defenses | prompt_engineering/ |
| Safety | Guardrails, approvals, policy | safety/ |
| Observability | Tracing, spans, cost, latency, evals | observability/, evals/ |
| Deployment | Shipping agents to production | design_docs/rollout_plan.md |
| Coding-agent harness | Claude Code, Cursor, Codex, Aider, Cline | coding_agents/ |
π Deep dive: agent_os/README.md
Provider ecosystem
The handbook ships an LLMProvider abstraction with 24+ providers across six families. Most providers go through a single OpenAI-compatible code path; specialty / local providers are first-class.
| Provider type | Examples | Best for |
|---|---|---|
| Frontier APIs | OpenAI, Anthropic, Google Gemini | Reasoning, tool use, production agents |
| Fast inference | Groq, Cerebras, SambaNova | Low-latency workloads |
| Marketplaces | OpenRouter, Together, Fireworks, DeepInfra | Model choice and routing |
| Enterprise clouds | Azure OpenAI, AWS Bedrock, Vertex AI | Compliance, governance |
| Specialty | xAI, Perplexity, Mistral, Cohere, DeepSeek, Hugging Face, Replicate, NVIDIA NIM, MiniMax | Domain-specific |
| Local runtimes | Ollama, LM Studio, vLLM, llama.cpp | Privacy, cost control, offline dev |
Quick start:
from utilities import get_provider
from utilities.provider_router import ProviderRouter
# Use any single provider
out = get_provider("groq").chat(
[{"role": "user", "content": "Summarize MCP."}],
model="llama-3.1-8b-instant",
)
# Or route by task class with fallback
router = ProviderRouter()
out = router.chat(messages, task_class="cheap") # Groq β DeepSeek β Together β OpenRouter
π providers/README.md β’ providers/provider_matrix.md β’ providers/router_patterns.md β’ providers/local_models.md
Repository map
.
βββ README.md β’ llms.txt β’ llms-full.txt
βββ agent_os/ β the Agent OS concept, layers, workspace examples
βββ providers/ β 24+ provider docs + adapters + router patterns
βββ templates/ β AGENTS.md / SOUL.md / MEMORY.md / SKILL.md / DESIGN_DOC / ADR / β¦
βββ skills/ β design guide + taxonomy + maturity model + curated catalog + 4 examples
βββ memory/ β memory taxonomy, distillation, security, examples
βββ mcp/ β MCP basics, architecture, security, server catalog, examples
βββ prompt_engineering/ β agent prompt patterns, instruction hierarchy, defenses
βββ coding_agents/ β Claude Code, Cursor, Codex, workflows, prompts, review
βββ design_docs/ β agent + technical design docs, ADR guide, design.md spec
βββ safety/ β guardrails, approvals, prompt injection, secure checklist
βββ observability/ β tracing, spans, cost/latency, dashboards
βββ evals/ β eval design, regression / tool / memory / MCP / safety / prompt
βββ blueprints/ β production architectures by use case
βββ examples/ β end-to-end runnable agent workspaces
βββ checklists/ β agent design, prod readiness, MCP security, β¦
βββ llm_wiki/ β LLM-friendly index, glossary, matrices, wiki pattern
βββ docs/ β framework comparison, best practices, beginners' guide
βββ tutorials/ β RAG, memory, fine-tuning, chat-with-X
βββ utilities/ β LLMProvider + router + provider_config
βββ agents/ β 100+ curated agent skeletons (preserved)
βββ complete_apps/, web_apps/, notebooks/, datasets/, design/, resources/, scripts/, tests/, ecosystem/
βββ .github/ β issue / PR templates
Skills ecosystem
A curated, in-repo catalog plus a clear taxonomy and maturity model:
- skills/skill_design_guide.md β write triggers the model picks
- skills/skill_vs_tool_vs_mcp.md β when to use which
- skills/skill_taxonomy.md β domains, tags, risk
- skills/skill_maturity_model.md β experimental β production
- skills/skill_packaging.md β ship a portable skill
- skills/skill_validation.md β lint / smoke / eval
- skills/awesome_skills_catalog.md β broader ecosystem map
- skills/catalog/ β index + per-domain skills
- skills/examples/ β four full reference skills
Curated skills shipped: research-summarizer, repo-auditor, mcp-security-reviewer, agent-memory-curator, api-design-reviewer, pr-summarizer, adr-writer, incident-postmortem, sprint-planner, dataset-profiler.
Prompt engineering
A dedicated section, agent-focused:
- prompt_engineering/agent_prompt_patterns.md
- prompt_engineering/system_prompt_design.md
- prompt_engineering/instruction_hierarchy.md
- prompt_engineering/context_engineering.md
- prompt_engineering/tool_use_prompting.md
- prompt_engineering/planning_and_reflection.md
- prompt_engineering/memory_prompting.md
- prompt_engineering/prompt_injection_defense.md
- prompt_engineering/prompt_eval_methods.md
- prompt_engineering/anti_patterns.md
Templates: SYSTEM_PROMPT, AGENT_PROMPT. Checklist: agent_prompt_checklist.
Use this repo with coding agents
The handbook is itself a great surface for coding agents. Drop your favorite tool (Claude Code, Cursor, Codex, Aider, Cline) into the repo:
- llms.txt gives the agent an index in 30 seconds
- coding_agents/ has tool-specific notes + prompts
- coding_agents/prompts/ β repo audit, modernization, feature, bugfix, provider expansion, docs update, release review
- templates/CODING_AGENT_TASK.md.template β task contract template
- templates/REPO_MODERNIZATION_PROMPT.md.template β multi-phase modernization
The guidance is tool-neutral: same AGENTS.md, same workflows, regardless of harness.
Design docs
Agent + technical design docs, ADRs, reviews, rollouts, and the DESIGN.md machine-readable spec for design tokens:
- design_docs/agent_design_doc.md
- design_docs/technical_design_doc.md
- design_docs/adr_guide.md
- design_docs/design_review.md
- design_docs/rollout_plan.md
- design_docs/design_md_spec.md
- design_docs/examples/ β research / MCP / memory / provider-router worked examples
Templates: DESIGN_DOC, ADR.
Frameworks at a glance
| Framework | Best for | Lang | MCP | Tracing |
|---|---|---|---|---|
| OpenAI Agents SDK | Production agents | Py / JS | β | β built-in |
| LangGraph | Stateful, branching graphs | Py / JS | β | β LangSmith |
| CrewAI | Role-based teams | Py | β | β οΈ via partners |
| AutoGen (AG2) | Event-driven multi-agent + HITL | Py | β οΈ partial | β |
| LlamaIndex Workflows | Data-heavy / RAG-first | Py / TS | β | β |
| Pydantic AI | Type-safe, FastAPI-native | Py | β | β Logfire |
| Smolagents | Code-execution mini-agents | Py | β οΈ | basic |
| Semantic Kernel | .NET / enterprise / Azure | C# / Py / Java | β | β |
| DSPy | Programmatic prompt optimization | Py | β | β |
| Strands Agents | Provider-agnostic, OpenTelemetry | Py | β | β OTEL |
| Vercel AI SDK | App-layer agents in Next.js | TS / JS | β | β |
| Google ADK | Gemini / Vertex hierarchical tools | Py | β | β |
π Full comparison + decision tree: docs/framework_comparison.md. Capability tags hedged: verify against current upstream docs.
Skills, MCP, and Memory in one minute
- Skills are reusable, model-loaded workflows (
SKILL.md+ scripts + references). Use when a task is repeatable, multi-step, and benefits from progressive disclosure. β skills/ - MCP (Model Context Protocol) is a standard for exposing tools/context to any agent. Use when integrations should be reusable (GitHub, filesystem, browser, internal APIs). β mcp/
- Memory is durable state across runs (
MEMORY.md, vector stores, decision logs). β memory/
A useful rule of thumb:
| If the thing is⦠| Use |
|---|---|
| A repeatable workflow with steps and references | Skill |
| An external system with tools to call | MCP server |
| State that should outlive the current run | Memory |
| A single function the model needs once | Plain tool |
π Decision matrix: skills/skill_vs_tool_vs_mcp.md
Guardrails & safety
Production agents need risk-tiered tool controls and human approval gates for high-impact actions.
| Risk level | Examples | Approval |
|---|---|---|
| Low | read-only search, summarization | none |
| Medium | drafting files, creating tickets | sometimes |
| High | sending email, modifying repos, running shell | required |
| Critical | deleting data, spending money, changing permissions | always + audit |
π safety/README.md β’ safety/prompt_injection.md β’ safety/secure_agent_checklist.md
Observability & evals
You cannot ship what you cannot measure. The handbook ships:
- A tracing primer (observability/tracing.md) and span model (observability/spans.md)
- Cost / latency / failure analysis playbooks
- Eval design + datasets (evals/): regression, tool-call, memory, MCP, safety, prompt evals
- A curated guide to evaluation_frameworks β Promptfoo, DeepEval, RAGAs, Langfuse, Phoenix, TruLens, LangSmith, MLflow
Templates (copy-paste ready)
| File | Purpose |
|---|---|
| AGENTS.md | Repo-specific agent instructions |
| SOUL.md | Identity, voice, values, refusal style |
| MEMORY.md | Durable project + user memory index |
| USER.md | User profile and preferences |
| TOOLS.md | Allowed/restricted/approval-gated tools |
| SKILL.md | Skill spec with progressive loading |
| MCP_SERVER.md | Documenting an MCP integration |
| SYSTEM_PROMPT.md | Long-lived system prompt |
| AGENT_PROMPT.md | Per-task / per-session prompt |
| DESIGN_DOC.md | Agent / technical design doc |
| ADR.md | Architecture Decision Record |
| EVAL_PLAN.md | What you'll evaluate and how |
| GUARDRAILS.md | Policy, refusals, escalation |
| HUMAN_APPROVAL_POLICY.md | Who approves what |
| CODING_AGENT_TASK.md | Task contract for coding agents |
| REPO_MODERNIZATION_PROMPT.md | Multi-phase modernization |
| AGENT_RELEASE_CHECKLIST.md | Ship/no-ship gate |
Merged knowledge areas (1.0.1)
This release merged seven external projects into the handbook. Each was adapted (not bulk-copied) into the structure above:
| Source theme | Lives in |
|---|---|
| Skills catalog + taxonomy patterns | skills/ β taxonomy, maturity, packaging, validation, awesome catalog |
| Personal-wiki / self-maintaining KB | llm_wiki/wiki_pattern.md, docs/llm_readable_docs.md |
| Agent prompt research patterns | prompt_engineering/ |
| Production coding-agent prompts + workflows | coding_agents/ β prompts, workflows, review |
| Machine-readable design specs | design_docs/design_md_spec.md, templates/DESIGN_DOC.md.template |
| ADRs + design reviews | design_docs/adr_guide.md, design_docs/design_review.md |
π Full migration plan: MIGRATION_AND_PROVIDER_EXPANSION_PLAN.md
Supported LLM providers
The utilities/llm_provider.py module exposes a single LLMProvider interface (and a backwards-compatible complete() function). Switch via LLM_PROVIDER without touching agent code; route automatically with ProviderRouter.
24+ providers across frontier / fast / marketplace / enterprise / specialty / local. See:
- providers/provider_matrix.md β capability comparison
- providers/env_vars.md β every variable
- .env.example β copy-and-fill
Contributing
Contributions are very welcome β new examples, framework updates, fixes, and translations all help. Start with:
- CONTRIBUTING.md β workflow, scope, quality bar
- .github/PULL_REQUEST_TEMPLATE.md
- .github/ISSUE_TEMPLATE/
- checklists/open_source_quality_checklist.md
Roadmap & changelog
- ROADMAP.md β what's next
- CHANGELOG.md β what shipped
License
MIT β see LICENSE.
Maintainer
Curated & maintained by Sayed Allam (oxbshw). If this handbook helped you ship, please β the repo and open a PR with what you learned along the way.