π What is Cortex Memory?
Cortex Memory is a complete, production-ready framework for giving your AI applications a long-term memory. It moves beyond simple chat history, providing an intelligent memory system with a hierarchical three-tier memory architecture (L0 Abstract β L1 Overview β L2 Detail) that automatically extracts, organizes, and optimizes information to make your AI agents smarter and more personalized.
Cortex Memory uses a sophisticated pipeline to process and manage memories, centered around a hybrid storage architecture combining virtual-filesystem durability with vector-based semantic search.
| Blazing Fast Layered Context Loading | Context Organization as Virtual Files | Precision Memory Retrieval |
|---|---|---|
![]() | ![]() | ![]() |
Cortex Memory organizes data using a virtual filesystem approach with the cortex:// URI scheme:
# Basic Structure
cortex://{dimension}/{path}
# Dimensions
session/ - Session memories (conversation history, timeline)
user/ - User memories (preferences, entities, events)
agent/ - Agent memories (cases, skills)
resources/ - Knowledge base resources
# Examples
cortex://session/{session_id}/timeline/{date}/{time}.md
cortex://user/preferences/{name}.md
cortex://agent/cases/{case_id}.md
cortex://resources/{resource_name}/
<hr />
πΊ Why Use Cortex Memory?
<p align="center"> <strong>Transform your stateless AI into an intelligent, context-aware partner.</strong> </p> <div style="text-align: center; margin: 30px 0;"> <table style="width: 100%; border-collapse: collapse; margin: 0 auto;"> <tr> <th style="width: 50%; padding: 15px; background-color: #f8f9fa; border: 1px solid #e9ecef; text-align: center; font-weight: bold; color: #495057;">Before Cortex Memory</th> <th style="width: 50%; padding: 15px; background-color: #f8f9fa; border: 1px solid #e9ecef; text-align: center; font-weight: bold; color: #495057;">After Cortex Memory</th> </tr> <tr> <td style="padding: 15px; border: 1px solid #e9ecef; vertical-align: top;"> <p style="font-size: 14px; color: #6c757d; margin-bottom: 10px;"><strong>Stateless AI</strong></p> <ul style="font-size: 13px; color: #6c757d; line-height: 1.6;"> <li>Forgets user details after every session</li> <li>Lacks personalization and context</li> <li>Repeats questions and suggestions</li> <li>Limited to short-term conversation history</li> <li>Feels robotic and impersonal</li> </ul> </td> <td style="padding: 15px; border: 1px solid #e9ecef; vertical-align: top;"> <p style="font-size: 14px; color: #6c757d; margin-bottom: 10px;"><strong>Intelligent AI with Cortex Memory</strong></p> <ul style="font-size: 13px; color: #6c757d; line-height: 1.6;"> <li>Remembers user preferences and history</li> <li>Provides deeply personalized interactions</li> <li>Learns and adapts over time</li> <li>Maintains context across multiple conversations</li> <li>Builds rapport and feels like a true assistant</li> </ul> </td> </tr> </table> </div>π <strong>For:</strong>
- Developers building LLM-powered chatbots and agents.
- Teams creating personalized AI assistants.
- Open source projects that need a memory backbone.
- Anyone who wants to build truly intelligent AI applications!
β€οΈ Like <strong>Cortex Memory</strong>? Star it π or Sponsor Me! β€οΈ
π Features & Capabilities
- <strong>File-System Based Storage:</strong> Memory content stored as markdown files using the
cortex://virtual URI scheme, enabling version control compatibility and portability. - <strong>Intelligent Memory Extraction:</strong> Automatically extracts structured memories (facts, decisions, entities) from conversations using LLM-powered analysis with confidence scoring.
- <strong>Vector-Based Semantic Search:</strong> High-performance similarity search via Qdrant with metadata filtering across dimensions (user/agent/session), using weighted scoring.
- <strong>Multi-Modal Access:</strong> Interact through REST API, CLI, MCP protocol, or direct Rust library integration.
- <strong>Three-Tier Memory Hierarchy:</strong> Progressive disclosure system (L0 Abstract β L1 Overview β L2 Detail) optimizes LLM context window usage with lazy generation.
- <strong>Session Management:</strong> Track conversation timelines, participants, and message history with automatic indexing and event-driven processing.
- <strong>Multi-Tenancy Support:</strong> Isolated memory spaces for different users and agents within a single deployment via tenant-aware collection naming.
- <strong>Event-Driven Automation:</strong> File watchers and auto-indexers for background processing, synchronization, and profile enrichment.
- <strong>LLM Result Caching:</strong> Intelligent caching with LRU eviction and TTL expiration reduces redundant LLM API calls by 50-75%, with cascade layer debouncing for 70-90% reduction in layer updates.
- <strong>Incremental Memory Updates:</strong> Introduced an event-driven incremental update system (
MemoryEventCoordinator,CascadeLayerUpdater) that keeps L0/L1 layers in sync automatically as memories change. - <strong>Memory Forgetting Mechanism:</strong> Introduced
MemoryCleanupServicebased on the Ebbinghaus forgetting curve β automatically archives or deletes low-strength memories to control storage growth in long-running agents. - <strong>Agent Framework Integration:</strong> Built-in support for Rig framework and Model Context Protocol (MCP).
- <strong>Web Dashboard:</strong> Svelte 5 SPA (Insights) for monitoring, tenant management, and semantic search visualization.
π§ How It Works
Cortex Memory uses a sophisticated pipeline to process and manage memories, centered around a hybrid storage architecture combining virtual-filesystem durability with vector-based semantic search.
flowchart TB
subgraph Input["Input Layer"]
User[User Message]
Agent[Agent Message]
CLI[CLI Commands]
API[REST API]
MCP[MCP Protocol]
end
subgraph Core["Core Engine (cortex-mem-core)"]
Session[Session Manager]
Extractor[Memory Extractor]
Indexer[Auto Indexer]
Search[Vector Search Engine]
end
subgraph Storage["Storage Layer"]
FS[(Filesystem<br/>cortex:// URI)]
Qdrant[(Qdrant<br/>Vector Index)]
end
subgraph External["External Services"]
LLM[LLM Provider<br/>Extraction & Analysis]
Embed[Embedding API<br/>Vector Generation]
end
User --> Session
Agent --> Session
CLI --> Core
API --> Core
MCP --> Core
Session -->|Store Messages| FS
Session -->|Trigger Extraction| Extractor
Extractor -->|Analyze Content| LLM
Extractor -->|Store Memories| FS
Indexer -->|Watch Changes| FS
Indexer -->|Generate Embeddings| Embed
Indexer -->|Index Vectors| Qdrant
Search -->|Query Embedding| Embed
Search -->|Vector Search| Qdrant
Search -->|Retrieve Content| FS
Memory Architecture
Cortex Memory organizes data using a virtual filesystem approach with the cortex:// URI scheme:
cortex://{dimension}/{scope}/{category}/{id}
- Dimension:
user,agent,session, orresources - Scope: Tenant or identifier
- Category:
memories,profiles,entities,sessions, etc. - ID: Unique memory identifier
Three-Tier Memory Hierarchy
Cortex Memory implements a progressive disclosure system with three abstraction layers:
| Layer | Purpose | Token Usage | Use Case |
|---|---|---|---|
| L0 (Abstract) | Fast positioning, coarse-grained candidate selection | ~100 tokens | Initial screening (20% weight) |
| L1 (Overview) | Structured summary with key points and entities | ~500-2000 tokens | Context refinement (30% weight) |
| L2 (Detail) | Full conversation content | Variable | Precise matching (50% weight) |
This tiered approach optimizes LLM context window usage by loading only the necessary detail level. The search engine uses weighted scoring combining all three layers L0/L1/L2.
π The Cortex Memory Ecosystem
Cortex Memory is a modular system composed of several crates, each with a specific purpose. This design provides flexibility and separation of concerns.
graph TD
subgraph "User Interfaces"
CLI["cortex-mem-cli<br/>Terminal Interface"]
Insights["cortex-mem-insights<br/>Web Dashboard"]
end
subgraph "APIs & Integrations"
Service["cortex-mem-service<br/>REST API Server"]
MCP["cortex-mem-mcp<br/>MCP Server"]
Rig["cortex-mem-rig<br/>Rig Framework"]
end
subgraph "Core Engine"
Core["cortex-mem-core<br/>Business Logic"]
Tools["cortex-mem-tools<br/>Agent Tools"]
end
subgraph "External Services"
VectorDB[("Qdrant<br/>Vector Database")]
LLM[("LLM Provider<br/>OpenAI/Azure/Local")]
end
%% Define Dependencies
Insights -->|REST API| Service
CLI --> Core
Service --> Core
MCP --> Tools
Rig --> Tools
Tools --> Core
Core --> VectorDB
Core --> LLM
- <strong>
cortex-mem-core</strong>: The heart of the system. Contains business logic for filesystem abstraction (cortex://URI), LLM client wrappers, embedding generation, Qdrant integration, session management, layer generation (L0/L1/L2), extraction engine, search engine, automation orchestrator, and incremental update system (MemoryEventCoordinator,CascadeLayerUpdater,LlmResultCache,IncrementalMemoryUpdater) as well as forgetting mechanism (MemoryCleanupService). - <strong>
cortex-mem-service</strong>: High-performance REST API server (Axum-based) exposing all memory operations via/api/v2/*endpoints. Runs on port 8085 by default. - <strong>
cortex-mem-cli</strong>: Command-line tool (cortex-membinary) for developers and administrators to interact with the memory store directly. - <strong>
cortex-mem-insights</strong>: Pure frontend Svelte 5 SPA for monitoring, analytics, and memory management through a web interface. - <strong>
cortex-mem-mcp</strong>: Model Context Protocol server for integration with AI assistants (Claude Desktop, Cursor, etc.). - <strong>
cortex-mem-rig</strong>: Integration layer with the rig-core agent framework for tool registration. - <strong>
cortex-mem-tools</strong>: MCP tool schemas and operation wrappers for agent integration. - <strong>
cortex-mem-config</strong>: Configuration management module handling TOML loading, environment variable resolution, and tenant-specific overrides.
πΌοΈ Observability Dashboard
Cortex Memory includes a powerful web-based dashboard (cortex-mem-insights) that provides real-time monitoring, analytics and management capabilities. The dashboard is a pure frontend Svelte 5 SPA that connects to the cortex-mem-service REST API.
Key Features
- Tenant Management: View and switch between multiple tenants with isolated memory spaces
- Memory Browser: Navigate the
cortex://filesystem to view and manage memory files - Semantic Search: Perform natural language queries across the memory store
- Health Monitoring: Real-time service status and LLM availability checks
Running the Dashboard
# Start the backend service first
cortex-mem-service --data-dir ./cortex-data --port 8085
# In another terminal, start the insights dashboard
cd cortex-mem-insights
bun install
bun run dev
The dashboard will be available at http://localhost:5173 and will proxy API requests to the backend service.
π¦ Community Showcase: MemClaw
MemClaw is a deeply customized memory enhancement plugin for the OpenClaw ecosystem, powered by the locally-running Cortex Memory engine. It delivers superior memory capabilities compared to OpenClaw's built-in memory system, achieving over 80% token savings while maintaining exceptional memory accuracy, security, and performance.
Why MemClaw?
| OpenClaw Native Memory | MemClaw |
|---|---|
| Basic memory storage | Three-tier L0/L1/L2 architecture for intelligent retrieval |
| Higher token consumption | 80%+ token savings with layered context loading |
| Limited search precision | Vector search + Agentic VFS exploration for complex scenarios |
Key Features
- π― Low Token & Hardware Resource Usage: Rust-powered high-performance memory components with progressive retrieval for optimal context loading
- π Complete Data Privacy: All memories stored locally with zero cloud dependency
- π One-Click Migration: Seamlessly migrate from OpenClaw native memory to MemClaw
- βοΈ Easy Configuration: Zero runtime dependencies, one-line installation, minimal config to get started
Available Tools
| Tool | Purpose |
|---|---|
cortex_search | Semantic search across all memories with tiered retrieval |
cortex_recall | Recall memories with extended context (snippet + full content) |
cortex_add_memory | Store messages for future retrieval |
cortex_close_session | Close session and trigger memory extraction pipeline |
cortex_migrate | One-click migration from OpenClaw native memory |
cortex_maintenance | Periodic maintenance (prune, reindex, layer generation) |
Quick Start
# Install via OpenClaw
openclaw plugins install @memclaw/memclaw
Note: Set
memorySearch.enabled: falseto disable OpenClaw's built-in memory and use MemClaw instead.
Documentation
For detailed configuration, troubleshooting, and best practices, see the MemClaw README.
π Community Showcase: Cortex TARS
Meet Cortex TARS β a production-ready AI-native TUI (Terminal User Interface) application that demonstrates the true power of Cortex Memory. Built as a "second brain" companion, Cortex TARS brings auditory presence to your AI experience and can truly hear and remember your voice in the real world, showcases how persistent memory transforms AI interactions from fleeting chats into lasting, intelligent partnerships.
What Makes Cortex TARS Special?
Cortex TARS is more than just a chatbot β it's a comprehensive AI assistant platform that leverages Cortex Memory's advanced capabilities:
π Multi-Agent Management
Create and manage multiple AI personas, each with distinct personalities, system prompts, and specialized knowledge areas. Whether you need a coding assistant, a creative writing partner, or a productivity coach, Cortex TARS lets you run them all simultaneously with complete separation.
πΎ Persistent Role Memory
Every agent maintains its own long-term memory, learning from interactions over time. Your coding assistant remembers your coding style and preferences; your writing coach adapts to your voice and goals. No more repeating yourself β each agent grows smarter with every conversation.
π Memory Isolation
Advanced memory architecture ensures complete isolation between agents and users. Each agent's knowledge base is separate, preventing cross-contamination while enabling personalized experiences across different contexts and use cases.
π€ Real-Time Audio-to-Memory (The Game Changer)
This is where Cortex TARS truly shines. With real-time device audio capture, Cortex TARS can listen to your conversations, meetings, or lectures and automatically convert them into structured, searchable memories. Imagine attending a meeting while Cortex TARS silently captures key insights, decisions, and action items β all stored and ready for instant retrieval later. No more frantic note-taking or forgotten details!
Why Cortex TARS Matters
Cortex TARS isn't just an example β it's a fully functional application that demonstrates:
- Real-world production readiness: Built with Rust, it's fast, reliable, and memory-safe
- Seamless Cortex Memory integration: Shows best practices for leveraging the memory framework
- Practical AI workflows: From multi-agent conversations to audio capture and memory extraction
- User-centric design: Beautiful TUI interface with intuitive controls and rich features
Explore Cortex TARS
Ready to see Cortex Memory in action? Dive into the Cortex TARS project:
cd examples/cortex-mem-tars
cargo build --release
cargo run --release
Check out the Cortex TARS README for detailed setup instructions, configuration guides, and usage examples.
Cortex TARS proves that Cortex Memory isn't just a framework β it's the foundation for building intelligent, memory-aware applications that truly understand and remember.
π Benchmark
Cortex Memory has been rigorously evaluated on the LoCoMo10 dataset (conv-26, 152 questions, 19 conversation sessions spanning MayβOctober 2023) using LLM-as-a-Judge β the same methodology used by the OpenViking official evaluation. The results demonstrate Cortex Memory's superior performance against all other systems.
Performance Comparison
<p align="center"> <img src="./assets/benchmark/cortex_mem_vs_openclaw_3.png" alt="Cortex Memory vs OpenViking/OpenClaw's Built-in Memory Benchmark" width="800"> </p> <p align="center"> <em><strong>Overall Score:</strong> Cortex Memory v5 achieves <strong>68.42%</strong> β outperforming all OpenViking and OpenClaw configurations</em> </p>Overall Scores
| System | Score | Questions |
|---|---|---|
| Cortex Memory v5 (Intent ON) | 68.42% | 152 |
| OpenViking + OpenClaw (βmemory-core) | 52.08% | 1,540 |
| OpenViking + OpenClaw (+memory-core) | 51.23% | 1,540 |
| OpenClaw + LanceDB (βmemory-core) | 44.55% | 1,540 |
| OpenClaw (built-in memory) | 35.65% | 1,540 |
Category Breakdown (v5)
| Category | Description | Score |
|---|---|---|
| Cat 1 | Factual Recall | 37.50% (12/32) |
| Cat 2 | Temporal Reasoning | 62.16% (23/37) |
| Cat 3 | Commonsense Inference | 76.92% (10/13) |
| Cat 4 | Multi-hop Reasoning | 84.29% (59/70) |
| Total | 68.42% (104/152) |
Token Efficiency
| System | Avg Tokens / Question | Score | Score per 1K Tokens |
|---|---|---|---|
| Cortex Memory v5 | ~2,900 | 68.42% | 23.6 |
| OpenViking + OpenClaw (βmemory-core) | ~2,769 | 52.08% | 18.8 |
| OpenViking + OpenClaw (+memory-core) | ~1,363 | 51.23% | 37.6 |
| OpenClaw (built-in memory) | ~15,982 | 35.65% | 2.2 |
| OpenClaw + LanceDB (βmemory-core) | ~33,490 | 44.55% | 1.3 |
Cortex Memory achieves 11Γ fewer tokens than OpenClaw+LanceDB and 18Γ better score-per-token ratio.
Key Technical Advantages
- Intent-Driven Retrieval: Routing multi-hop queries to entity and relational memory scopes improves Cat 4 accuracy by +18.75pp
- Hierarchical L0/L1/L2 Architecture: Precision retrieval starting from ~100-token abstracts β you only pay for context you actually need
- Rust-based Implementation: High-performance, memory-safe core backed by Qdrant vector database
Evaluation Framework
The benchmark script is located in examples/locomo-evaluation, implementing a two-phase pipeline:
- Ingest β conversation sessions are ingested into Cortex Memory per-sample tenant
- QA β 152 questions answered via semantic retrieval + LLM generation
- Judge β LLM-as-a-Judge scores each answer as CORRECT / WRONG (binary, identical to OpenViking methodology)
For more details on running the evaluation, see the locomo-evaluation README and the full results in examples/locomo-evaluation/BENCHMARK.md.
π₯ Getting Started
Prerequisites
- Rust (version 1.86 or later)
- Qdrant vector database (version 1.7+)
- An OpenAI-compatible LLM API endpoint (for memory extraction and analysis)
- An OpenAI-compatible Embedding API endpoint (for vector search)
Installation
The simplest way to get started is to use the CLI and Service binaries, which can be installed via cargo.
# Install the CLI for command-line management
cargo install --path cortex-mem-cli
# Install the REST API Service for application integration
cargo install --path cortex-mem-service
# Install the MCP server for AI assistant integrations
cargo install --path cortex-mem-mcp
Configuration
Cortex Memory applications (cortex-mem-cli, cortex-mem-service, cortex-mem-mcp) are configured via a config.toml file. The CLI will look for this file in the current directory by default, or you can pass a path using the -c or --config flag.
Here is a sample config.toml with explanations:
# -----------------------------------------------------------------------------
# Qdrant Vector Database Configuration
# -----------------------------------------------------------------------------
[qdrant]
url = "http://localhost:6334" # URL of your Qdrant instance (gRPC port)
http_url = "http://localhost:6333" # HTTP URL for REST API
collection_name = "cortex-memory" # Base name for collections (tenant suffix added)
timeout_secs = 5 # Timeout for Qdrant operations
embedding_dim = 1536 # Embedding dimension (e.g., 1536 for text-embedding-3-small)
# -----------------------------------------------------------------------------
# LLM (Large Language Model) Configuration (for reasoning, extraction)
# -----------------------------------------------------------------------------
[llm]
api_base_url = "https://api.openai.com/v1" # Base URL of your LLM provider
api_key = "${OPENAI_API_KEY}" # API key (supports env variable)
model_efficient = "gpt-5-mini" # Model for extraction and classification
model_reasoning = "o1-preview" # Model for complex reasoning (optional)
temperature = 0.7 # Sampling temperature for LLM responses
max_tokens = 8192 # Max tokens for LLM generation
timeout_secs = 60 # Timeout for LLM requests
# -----------------------------------------------------------------------------
# Embedding Service Configuration
# -----------------------------------------------------------------------------
[embedding]
api_base_url = "https://api.openai.com/v1" # Base URL of your embedding provider
api_key = "${OPENAI_API_KEY}" # API key (supports env variable)
model_name = "text-embedding-3-small" # Name of the embedding model to use
batch_size = 32 # Number of texts to embed in a single batch
timeout_secs = 30 # Timeout for embedding requests
# -----------------------------------------------------------------------------
# Cortex Data Directory Configuration
# -----------------------------------------------------------------------------
[cortex]
data_dir = "./cortex-data" # Directory for storing memory files and sessions
π Usage
CLI (cortex-mem-cli)
The CLI provides a powerful interface for direct interaction with the memory system. All commands require a config.toml file, which can be specified with --config <path>. The --tenant flag allows multi-tenant isolation.
Add a Memory
Adds a new message to a session thread, automatically storing it in the memory system.
cortex-mem --config config.toml --tenant acme add --thread thread-123 --role user "The user is interested in Rust programming."
--thread <id>: (Required) The thread/session ID.--role <role>: Message role (user/assistant/system). Default: "user"content: The text content of the message (positional argument).
Search for Memories
Performs a semantic vector search across the memory store with weighted L0/L1/L2 scoring.
cortex-mem --config config.toml --tenant acme search "what are the user's hobbies?" --thread thread-123 --limit 10
query: The natural language query for the search.--thread <id>: Filter memories by thread ID.--limit <n>/-n: Maximum number of results. Default: 10--min-score <score>/-s: Minimum relevance score (0.0-1.0). Default: 0.4--scope <scope>: Search scope: "session", "user", or "agent". Default: "session"
List Memories
Retrieves a list of memories from a specific URI path.
cortex-mem --config config.toml --tenant acme list --uri "cortex://session" --include-abstracts
--uri <path>/-u: URI path to list (e.g., "cortex://session" or "cortex://user/preferences"). Default:cortex://session--include-abstracts: Include L0 abstracts in results.
Get a Specific Memory
Retrieves a specific memory by its URI.
cortex-mem --config config.toml --tenant acme get "cortex://session/thread-123/memory-456.md"
uri: The memory URI.--abstract-only/-a: Show L0 abstract instead of full content.--overview/-o: Show L1 overview instead of full content.
Delete a Memory
Removes a memory from the store by its URI.
cortex-mem --config config.toml --tenant acme delete "cortex://session/thread-123/memory-456.md"
Session Management
Manage conversation sessions.
# List all sessions
cortex-mem --config config.toml --tenant acme session list
# Create a new session
cortex-mem --config config.toml --tenant acme session create thread-456 --title "My Session"
# Close a session (triggers extraction, layer generation, and vector indexing)
cortex-mem --config config.toml --tenant acme session close thread-456
Layers and Stats
Manage layer files and display system statistics.
# Display system statistics
cortex-mem --config config.toml --tenant acme stats
# List available tenants
cortex-mem --config config.toml tenant list
# Show L0/L1 layer file coverage status
cortex-mem --config config.toml --tenant acme layers status
# Generate missing L0/L1 layer files
cortex-mem --config config.toml --tenant acme layers ensure-all
# Regenerate oversized L0 abstract files (> 2K characters)
cortex-mem --config config.toml --tenant acme layers regenerate-oversized
REST API (cortex-mem-service)
The REST API allows you to integrate Cortex Memory into any application, regardless of the programming language. The service runs on port 8085 by default.
Starting the Service
# Start the API server with default settings (port 8085)
cortex-mem-service --config config.toml --host 127.0.0.1 --port 8085
# Enable verbose logging
cortex-mem-service --config config.toml -h 127.0.0.1 -p 8085 --verbose
API Endpoints
Health Check
GET /health: Service liveness checkGET /health/ready: Readiness check (Qdrant, LLM connectivity)
Filesystem Operations
GET /api/v2/filesystem/list?uri=<path>: List directory contents.GET /api/v2/filesystem/read/<path>: Read file content.POST /api/v2/filesystem/write: Write content to a file.GET /api/v2/filesystem/stats?uri=<path>: Get directory statistics.
Session Management
GET /api/v2/sessions: List all sessions.POST /api/v2/sessions: Create a new session.POST /api/v2/sessions/:thread_id/messages: Add a message to a session.POST /api/v2/sessions/:thread_id/close: Close a session and trigger memory extraction.
Semantic Search
POST /api/v2/search: Perform semantic search across memories with weighted L0/L1/L2 scoring.
Automation
POST /api/v2/automation/extract/:thread_id: Trigger memory extraction for a thread.POST /api/v2/automation/index/:thread_id: Trigger vector indexing for a thread.POST /api/v2/automation/index-all: Index all threads.POST /api/v2/automation/sync: Manually trigger synchronization between filesystem and vector store.
Tenant Management
GET /api/v2/tenants/tenants: List all available tenants.POST /api/v2/tenants/tenants/switch: Switch active tenant context.GET /api/v2/tenants/{id}/stats: Get per-tenant storage metrics.
Example: Create a Session and Add Message
# Create a new session
curl -X POST http://localhost:8085/api/v2/sessions \
-H "Content-Type: application/json" \
-d '{
"thread_id": "thread-123",
"title": "Support Conversation"
}'
# Add a message to the session
curl -X POST http://localhost:8085/api/v2/sessions/thread-123/messages \
-H "Content-Type: application/json" \
-d '{
"role": "user",
"content": "I just upgraded to the premium plan."
}'
Example: Semantic Search
curl -X POST http://localhost:8085/api/v2/search \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: acme" \
-d '{
"query": "What is the user's current subscription?",
"thread": "thread-123",
"scope": "session",
"limit": 5,
"min_score": 0.5
}'
Example: Trigger Memory Extraction
# Extract memories from a session (typically called when session is closed)
curl -X POST http://localhost:8085/api/v2/automation/extract/thread-123 \
-H "Content-Type: application/json" \
-d '{ "auto_save": true }'
Model Context Protocol (MCP) Server (cortex-mem-mcp)
Cortex Memory provides an MCP server for integration with AI assistants like Claude Desktop, Cursor, or GitHub Copilot. The MCP server exposes memory tools through the stdio transport.
# Run the MCP server with configuration
cortex-mem-mcp --config config.toml --tenant acme
The MCP server exposes the following tools:
- store_memory: Store new facts or conversation summaries
- query_memory: Search memory with natural language
- list_memories: Enumerate available memories by URI prefix
- get_memory: Retrieve a specific memory by URI
- delete_memory: Remove a memory by URI
Configure your AI assistant to use the MCP server by adding it to your assistant's configuration:
π€ Contribute
We welcome all forms of contributions! Report bugs or submit feature requests through GitHub Issues.
Development Process
- Fork this project
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Create a Pull Request
πͺͺ License
This project is licensed under the MIT License. See the LICENSE file for details.


