Agent OS MCP/RAG Server ======================= .. note:: **πŸ€– AI-Assisted Development Infrastructure** This is the infrastructure that powers AI-assisted development on the HoneyHive Python SDK. It's also a demonstration of dogfoodingβ€”using HoneyHive's own tracing to observe AI development workflows. Overview -------- The Agent OS MCP/RAG server is a Model Context Protocol (MCP) server that provides AI coding assistants (like Cursor) with intelligent access to our development standards, workflows, and architectural patterns. **What Problem Does This Solve?** Traditional AI coding assistants face three major challenges: 1. **Context Overload**: Reading entire 50KB standard files when they only need 5KB 2. **Workflow Violations**: Skipping critical phases (e.g., jumping to coding without planning) 3. **No Observability**: Can't trace what standards AI is actually using or how decisions are made **Our Solution:** - **90% Context Reduction**: RAG engine with semantic search (50KB β†’ 5KB) - **Phase Gating**: Workflow engine prevents AI from skipping steps - **Full Observability**: HoneyHive tracing on all AI development operations What is Agent OS? ----------------- `Agent OS `_ is a spec-driven development methodology created by **Brian Casel (Builder Methods)**. It provides a structured approach to AI-assisted software development through three layers of context stored as markdown files: **Layer 1: Standards (``~/.agent-os/standards/``)** Your tech stack, code style, and best practices that apply across all projects. **Layer 2: Product (``.agent-os/product/``)** Mission, roadmap, architecture decisions, and product-specific context. **Layer 3: Specs (``.agent-os/specs/YYYY-MM-DD-feature-name/``)** Individual feature specifications with requirements, technical design, and task breakdowns. **Traditional Agent OS Approach:** AI coding assistants (like Cursor, Claude Code) directly read these markdown files using tools like ``codebase_search``, ``read_file``, and ``grep`` to understand your development standards and execute workflows like: - ``plan-product`` - Analyze product and create roadmap - ``create-spec`` - Generate feature specifications - ``execute-tasks`` - Implement features following specs **Learn More**: https://buildermethods.com/agent-os Our Evolution: From Builder Methods to MCP/RAG ---------------------------------------------- Phase 1: Builder Methods Agent OS (Markdown Foundation) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We started with `Agent OS `_ as created by Brian Casel, implementing the traditional approach: **What We Adopted:** - βœ… Three-layer context architecture (Standards, Product, Specs) - βœ… Markdown-based documentation system - βœ… Spec-driven development methodology - βœ… Command-based workflows (``plan-product``, ``create-spec``, ``execute-tasks``) **How It Worked:** AI coding assistants directly read markdown files: .. code-block:: text User: "What are our git safety rules?" AI: Uses codebase_search(".agent-os/standards/") Reads entire git-safety-rules.md (2,500 lines) Extracts relevant sections manually **This foundation was excellent**, providing structure and consistency. However, as our codebase and standards grew, we discovered scaling challenges. Phase 2: HoneyHive LLM Workflow Engineering ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We extended Agent OS with our own **LLM Workflow Engineering methodology** (documented in ``.agent-os/standards/ai-assistant/LLM-WORKFLOW-ENGINEERING-METHODOLOGY.md``): **Our Innovations:** πŸ”§ **Command Language Interface** Binding commands like ``πŸ›‘ EXECUTE-NOW``, ``πŸ“Š QUANTIFY-RESULTS``, ``🎯 NEXT-MANDATORY`` that create non-negotiable obligations for AI execution. πŸ—οΈ **Three-Tier Architecture** - **Tier 1: Side-Loaded (≀100 lines)**: Automatic injection for systematic execution - **Tier 2: Active Read (200-500 lines)**: On-demand comprehensive context - **Tier 3: Output (Unlimited)**: Generated deliverables 🚨 **11 Automated Pre-Commit Hooks** Quality gates enforcing: formatting, linting, tests, documentation compliance, no-mock policy, etc. πŸ“‹ **Phase Gating with Evidence Requirements** Each workflow phase requires quantified evidence before progression (e.g., "test file created", "coverage β‰₯90%"). 🎯 **Quality Targets** 100% test pass rate + 90%+ coverage + 10.0/10 Pylint + 0 MyPy errors (non-negotiable). **Example Workflow (V3 Test Generation):** .. code-block:: markdown # Phase 1: Analysis πŸ›‘ EXECUTE-NOW: grep -n "^def\|^class" target_file.py πŸ“Š COUNT-AND-DOCUMENT: Functions and classes with signatures 🎯 NEXT-MANDATORY: phases/2/dependency-analysis.md # Evidence Required: - Function count: - Class count: - Complexity assessment: **Results:** - βœ… 22% β†’ 80%+ success rate (3.6x improvement) - βœ… Systematic quality enforcement via automation - βœ… Evidence-based validation preventing vague claims **But New Challenges Emerged:** ❌ **Context Waste** AI reads 50KB files when only 5KB needed for current task. ❌ **No Programmatic Enforcement** Phase gating relies on AI compliance, can be skipped. ❌ **Zero Observability** No way to trace which standards AI consulted or how decisions were made. ❌ **Manual Discovery** AI must search for relevant standards each time. Phase 3: MCP/RAG Innovation (This Implementation) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We evolved our LLM Workflow Engineering approach by building an **MCP server with RAG**, transforming standards access from file-based to API-based: **Builder Methods Foundation + Our Innovations + MCP/RAG = Complete Solution** βœ… **90% Context Reduction via RAG** Semantic search returns only relevant chunks (5KB vs 50KB), preserving Builder Methods' three-layer structure. .. code-block:: text User: "What are our git safety rules?" AI: Uses mcp_agent-os-rag_search_standards( query="git safety rules forbidden operations", n_results=5 ) Returns: 3 relevant chunks (840 tokens) instead of entire file (12,000 tokens) βœ… **Architectural Phase Gating** Workflow engine **programmatically enforces** our phase-gating methodology, making it impossible to skip steps. .. code-block:: python # Cannot advance to Phase 2 without Phase 1 evidence result = workflow_engine.complete_phase( session_id="abc-123", phase=1, evidence={ "test_file_created": True, "framework_decision": "pytest" } ) # Returns Phase 2 requirements ONLY if evidence validates βœ… **Full Observability (Dogfooding HoneyHive)** Every RAG query and workflow operation traced, demonstrating our own product in action. βœ… **Intelligent Filtering** Search by phase number, tags, or semantic meaning from Builder Methods' structured markdown. βœ… **Hot Reload** File watcher automatically rebuilds index when standards change. **The Complete Evolution:** .. list-table:: :header-rows: 1 :widths: 20 25 25 30 * - Aspect - Builder Methods Agent OS - + LLM Workflow Engineering - + MCP/RAG Server * - **Foundation** - 3-layer context (Standards/Product/Specs) - Command language + Phase gating - Programmatic API access * - **Standards Access** - Direct file reading - Same (file-based) - Semantic search (90% reduction) * - **Workflow Enforcement** - Manual AI compliance - Evidence-based validation - Architectural phase gating * - **Context Efficiency** - Read entire files - Tier-based sizing - RAG chunk retrieval * - **Observability** - None - Manual tracking - Full HoneyHive tracing * - **Quality Gates** - None - 11 pre-commit hooks - Same (inherited) * - **AI Interface** - Tool calls (search, read) - Command language - MCP tools (5 tools) **Credit Where Due:** - **Builder Methods (Brian Casel)**: Three-layer architecture, spec-driven methodology, markdown standards - **HoneyHive Engineering**: LLM Workflow Engineering, command language, phase gating, quality automation - **This Implementation**: MCP/RAG server combining both approaches with programmatic enforcement and observability Architecture ------------ The MCP server consists of four core components: RAG Engine (``rag_engine.py``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Semantic search over Agent OS standards with metadata filtering. **Technology**: - **LanceDB**: Vector database (migrated from ChromaDB for better filtering) - **sentence-transformers**: Local embeddings (``all-MiniLM-L6-v2`` model) - **Grep Fallback**: When vector search unavailable, falls back to grep **Key Features**: - 90%+ retrieval accuracy on standard queries - <100ms average latency - Metadata filtering (phase, tags, file path) - LRU cache with configurable TTL (5-minute default) - Automatic index rebuilding **Example Query**: .. code-block:: python from mcp_servers.rag_engine import RAGEngine engine = RAGEngine(index_path, standards_path) # Search with semantic meaning result = engine.search( query="git safety rules forbidden operations", n_results=5, filters={"phase": 8} # Only Phase 8 content ) Workflow Engine (``workflow_engine.py``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Phase-gated workflow execution with checkpoint validation. **Workflows Supported**: - ``test_generation_v3``: 8-phase TDD test generation workflow - ``production_code_v2``: Production code generation with quality gates **Phase Gating**: .. code-block:: text Phase 1 β†’ Evidence β†’ Phase 2 β†’ Evidence β†’ Phase 3 β†’ ... Cannot advance to Phase N+1 without completing Phase N evidence requirements. **Checkpoint Validation**: Each phase defines required evidence (e.g., "test file must exist", "coverage must be 90%+"). The workflow engine validates evidence before allowing progression. **Example**: .. code-block:: python from mcp_servers.workflow_engine import WorkflowEngine engine = WorkflowEngine(state_manager, rag_engine) # Start workflow state = engine.start_workflow( workflow_type="test_generation_v3", target_file="tests/unit/test_new_feature.py" ) # Complete phase with evidence result = engine.complete_phase( session_id=state.session_id, phase=1, evidence={ "test_file_created": True, "framework_decision": "pytest with fixtures" } ) State Manager (``state_manager.py``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Workflow state persistence and session lifecycle management. **Features**: - JSON-based state persistence in ``.agent-os/workflow_sessions/`` - Session expiration (30-day default) - Automatic garbage collection of expired sessions - State validation and integrity checking Chunker (``chunker.py``) ~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Markdown document chunking for RAG indexing. **Chunking Strategy**: - **Size**: 100-500 tokens per chunk (optimal for semantic search) - **Structure**: Respects markdown headers (keeps sections together) - **Metadata**: Extracts phase numbers, tags, and section titles - **Overlap**: Maintains context continuity between chunks Getting Started --------------- Prerequisites ~~~~~~~~~~~~~ 1. **Cursor IDE** with MCP support 2. **Python 3.11+** with ``python-sdk`` virtual environment 3. **Agent OS standards** in ``.agent-os/standards/`` Building the RAG Index ~~~~~~~~~~~~~~~~~~~~~~ Before using the MCP server, build the vector index: .. code-block:: bash cd /Users/josh/src/github.com/honeyhiveai/python-sdk # Activate project venv source python-sdk/bin/activate # Install MCP server dependencies pip install -r .agent-os/mcp_servers/requirements.txt # Build the index python .agent-os/scripts/build_rag_index.py **Output**: .. code-block:: text πŸ—οΈ Building RAG index from Agent OS standards... πŸ“ Standards path: .agent-os/standards πŸ’Ύ Index path: .agent-os/rag_index πŸ“„ Processing 47 markdown files... βœ… Created 342 chunks 🎯 90.2% retrieval accuracy on test queries ⚑ Average query time: 87ms βœ… Index built successfully! Enabling in Cursor ~~~~~~~~~~~~~~~~~~ The MCP server is already configured in ``.cursor/mcp.json``: .. code-block:: json { "mcpServers": { "agent-os-rag": { "command": "/Users/josh/src/github.com/honeyhiveai/python-sdk/python-sdk/bin/python", "args": [ "/Users/josh/src/github.com/honeyhiveai/python-sdk/.agent-os/run_mcp_server.py" ], "env": { "HONEYHIVE_ENABLED": "true" }, "autoApprove": [ "search_standards", "get_current_phase", "get_workflow_state" ] } } } **To Enable**: 1. Open Cursor Settings β†’ MCP 2. Locate ``agent-os-rag`` server 3. Enable the server 4. Reload Cursor window Using the MCP Tools ------------------- The MCP server provides 5 tools for AI assistants: 1. search_standards ~~~~~~~~~~~~~~~~~~~ Semantic search over Agent OS standards with filtering. **Example**: .. code-block:: text User: "What are our git safety rules?" AI uses: mcp_agent-os-rag_search_standards( query="git safety rules forbidden operations", n_results=5 ) Returns: Relevant chunks from git-safety-rules.md **Filters**: - ``phase``: Filter by workflow phase number (1-8) - ``tags``: Filter by metadata tags 2. start_workflow ~~~~~~~~~~~~~~~~~ Initialize a phase-gated workflow session. **Example**: .. code-block:: text User: "Generate tests for config/dsl/compiler.py" AI uses: mcp_agent-os-rag_start_workflow( workflow_type="test_generation_v3", target_file="tests/unit/config/dsl/test_compiler.py" ) Returns: Phase 1 requirements and session ID 3. get_current_phase ~~~~~~~~~~~~~~~~~~~~ Retrieve current phase requirements and artifacts from previous phases. 4. complete_phase ~~~~~~~~~~~~~~~~~ Submit evidence and attempt to advance to next phase. **Example**: .. code-block:: text AI uses: mcp_agent-os-rag_complete_phase( session_id="abc-123", phase=1, evidence={ "test_file_created": True, "framework_decision": "pytest" } ) Returns: Phase 2 requirements if evidence validates 5. get_workflow_state ~~~~~~~~~~~~~~~~~~~~~ Query complete workflow state for debugging/resume capability. Development ----------- Running MCP Server Tests ~~~~~~~~~~~~~~~~~~~~~~~~ MCP server tests have **separate dependencies** from the main SDK and are excluded from the main test suite: .. code-block:: bash # Activate venv with MCP dependencies source python-sdk/bin/activate pip install -r .agent-os/mcp_servers/requirements.txt # Run MCP server tests only pytest tests/unit/mcp_servers/ -v **Test Coverage**: - 28 comprehensive unit tests - 10.0/10 Pylint score - Full type annotations (MyPy clean) - Tests for all 4 core components Why Separate Tests? ~~~~~~~~~~~~~~~~~~~ The MCP server is an **independent component** with its own dependency tree: **MCP Dependencies** (not in main SDK): - ``lancedb>=0.3.0`` - Vector database - ``sentence-transformers>=2.0.0`` - Local embeddings - ``watchdog>=3.0.0`` - File watching - ``mcp>=1.0.0`` - Model Context Protocol **Rationale**: - βœ… **No dependency bloat** in main SDK - βœ… **Faster main SDK tests** (no vector DB initialization) - βœ… **Clear separation** between SDK and tooling - βœ… **Independent versioning** for MCP components Adding New Tools ~~~~~~~~~~~~~~~~ To add a new MCP tool: 1. **Define the tool function** in ``agent_os_rag.py`` 2. **Add @trace decorator** for observability 3. **Register with MCP server** in ``create_server()`` 4. **Add to autoApprove** in ``.cursor/mcp.json`` (if safe) 5. **Write tests** in ``tests/unit/mcp_servers/`` **Example**: .. code-block:: python @tool_trace @server.call_tool() async def new_tool(query: str) -> Sequence[types.TextContent]: """New tool description.""" # Enrich span with input enrich_span({"query": query}) # Tool logic here result = do_something(query) # Enrich span with output enrich_span({"result": result}) return [types.TextContent(type="text", text=result)] Hot Reload ~~~~~~~~~~ The MCP server includes a file watcher that automatically rebuilds the RAG index when standards change: .. code-block:: python from watchdog.observers import Observer from watchdog.events import FileSystemEventHandler class AgentOSFileWatcher(FileSystemEventHandler): def on_modified(self, event): if event.src_path.endswith('.md'): # Debounce and rebuild index self._schedule_rebuild() **In Development**: - Edit any ``.agent-os/standards/*.md`` file - Index automatically rebuilds in background - New content available in ~2-3 seconds Observability (Dogfooding HoneyHive) ------------------------------------ Every MCP tool operation is traced with HoneyHive instrumentation, demonstrating dogfooding of our own product. Instrumentation Pattern ~~~~~~~~~~~~~~~~~~~~~~~ All tools use the ``@trace`` decorator with span enrichment: .. code-block:: python from honeyhive import HoneyHiveTracer, trace, enrich_span from honeyhive.models import EventType # Initialize tracer once tracer = HoneyHiveTracer.init( api_key=os.getenv("HH_API_KEY"), project="your-project-here", source="agent-os-mcp-server", verbose=True ) # Wrap tool with tracing @trace(tracer=tracer, event_type=EventType.tool) async def search_standards(query: str, n_results: int): # Enrich span with inputs enrich_span({ "query": query, "n_results": n_results, "filters": filters }) # Execute RAG search result = rag_engine.search(query, n_results, filters) # Enrich span with outputs enrich_span({ "chunks_returned": len(result.chunks), "retrieval_method": result.retrieval_method, "query_time_ms": result.query_time_ms }) return result Viewing Traces ~~~~~~~~~~~~~~ 1. Navigate to HoneyHive dashboard 2. Select project: **your-project-here** 3. Filter by source: **agent-os-mcp-server** **Trace Attributes**: - ``query``: Semantic search query - ``n_results``: Number of chunks requested - ``filters``: Metadata filters applied - ``chunks_returned``: Actual chunks returned - ``retrieval_method``: "vector" or "grep_fallback" - ``query_time_ms``: RAG query latency - ``session_id``: Workflow session ID (for workflow tools) - ``phase``: Current phase number Span Enrichment Examples ~~~~~~~~~~~~~~~~~~~~~~~~ **Search Tool**: .. code-block:: json { "query": "git safety rules forbidden operations", "n_results": 5, "filters": null, "chunks_returned": 3, "retrieval_method": "vector", "query_time_ms": 87, "total_tokens": 840 } **Workflow Tool**: .. code-block:: json { "session_id": "abc-123-def-456", "workflow_type": "test_generation_v3", "target_file": "tests/unit/test_feature.py", "current_phase": 2, "phase_content_tokens": 1200 } Troubleshooting --------------- Import Errors ~~~~~~~~~~~~~ **Problem**: ``ModuleNotFoundError: No module named 'lancedb'`` **Solution**: Install MCP server dependencies: .. code-block:: bash pip install -r .agent-os/mcp_servers/requirements.txt **Why**: MCP server has separate dependencies from main SDK. Index Rebuild Issues ~~~~~~~~~~~~~~~~~~~~ **Problem**: RAG index not updating after standards changes. **Solutions**: 1. **Manual Rebuild**: .. code-block:: bash python .agent-os/scripts/build_rag_index.py 2. **Check File Watcher**: Look for errors in MCP server logs (Cursor DevTools). 3. **Clear Index**: .. code-block:: bash rm -rf .agent-os/rag_index python .agent-os/scripts/build_rag_index.py Credential Loading ~~~~~~~~~~~~~~~~~~ **Problem**: HoneyHive traces not appearing in dashboard. **Cause**: MCP server not loading credentials from ``.env``. **Solution**: Verify ``.env`` has correct format: .. code-block:: bash export HH_API_KEY="your-key-here" export HH_PROJECT="your-project-here" **How Credentials Load**: 1. ``.cursor/mcp.json`` β†’ Launches ``run_mcp_server.py`` 2. ``run_mcp_server.py`` β†’ Parses ``.env`` and loads into ``os.environ`` 3. ``agent_os_rag.py`` β†’ Reads from ``os.getenv()`` **Debug**: Check MCP server logs in Cursor DevTools for: .. code-block:: text DEBUG: HH_API_KEY=SET DEBUG: HONEYHIVE_PROJECT=your-project-here 🍯 HoneyHive tracing enabled for dogfooding No Traces Appearing ~~~~~~~~~~~~~~~~~~~ **Problem**: MCP server running but no traces in HoneyHive. **Checklist**: 1. βœ… ``HONEYHIVE_ENABLED="true"`` in ``.cursor/mcp.json`` env 2. βœ… Valid ``HH_API_KEY`` and ``HH_PROJECT`` in ``.env`` 3. βœ… Tracer initialized successfully (check logs) 4. βœ… Using correct project in HoneyHive dashboard **Debugging**: Enable verbose logging in ``agent_os_rag.py``: .. code-block:: python tracer = HoneyHiveTracer.init( verbose=True # Already enabled ) See Also -------- **Agent OS Resources**: - `Agent OS Documentation `_ - Official Agent OS guide by Builder Methods - `Builder Methods YouTube `_ - AI-assisted development tutorials **Related SDK Documentation**: - :doc:`/development/testing/setup-and-commands` - Test infrastructure overview - :doc:`/development/workflow-optimization` - AI-assisted development workflows - :doc:`/how-to/advanced-tracing/custom-spans` - HoneyHive instrumentation patterns **Internal References**: - ``.agent-os/specs/2025-10-03-agent-os-mcp-rag-evolution/`` - Complete specification - ``.agent-os/standards/ai-assistant/import-verification-rules.md`` - Import verification standard - ``.cursorrules`` - AI assistant compliance rules