Abstract
Murmer Cognitive OS represents a fundamental shift from stateless AI assistants to a persistent cognitive operating system that thinks with you, not just for you. Unlike traditional chatbots that forget context between sessions, Murmer maintains a living knowledge graph of your thoughts, decisions, projects, and insights—enabling AI that grows smarter about you over time.
This whitepaper presents the technical architecture, design philosophy, and implementation details of Murmer's core systems: the Context Engine for intelligent prompt construction, the Knowledge Graph for persistent memory, the Change Request system for trustworthy AI autonomy, and the Stream Event architecture for unified activity narrative.
1. Executive Summary
1.1 The Problem
Today's AI assistants suffer from a fundamental limitation: cognitive amnesia. Every conversation starts from zero. Users repeatedly re-explain context, re-establish preferences, and re-share relevant history. This creates friction that prevents AI from becoming a true cognitive partner.
The consequences are significant:
- Context collapse: Important decisions and reasoning get lost between sessions
- Repeated work: Users waste time providing the same background repeatedly
- Shallow assistance: AI cannot build on prior conversations to offer deeper insights
- Trust erosion: Without memory, AI cannot demonstrate learning or growth
1.2 Our Solution
Murmer Cognitive OS introduces a persistent cognitive layer that transforms AI interaction from stateless Q&A into continuous cognitive partnership. The system:
- Remembers everything through a structured knowledge graph of entities, relationships, and semantic memory
- Reasons with full context by intelligently selecting relevant information within token budget constraints
- Maintains trust through a Change Request system that gives users control over AI-driven state mutations
- Provides transparency via unified activity streams that show what the AI is doing and why
1.3 Key Innovations
- Context Engine: A sophisticated prompt construction system that ranks, selects, and assembles contextual information within strict token budgets while maintaining relevance and coherence.
- Knowledge Graph: A multi-layered memory architecture supporting Spaces (containers), Threads (conversations), Artifacts (structured outputs), Memory Nodes (semantic memory), Tasks, Notes, and Files.
- Change Request System: A human-in-the-loop approval workflow that enables AI autonomy while preserving user control through explicit approval, rejection, or modification of proposed changes.
- Agent Profiles & Modes: Behavioral templates that shape AI reasoning depth, tone, memory weighting, and tool usage—switchable in real-time based on user intent.
- Stream Event Architecture: A unified event model that captures all system activity as a coherent narrative, enabling replay, audit, and cross-entity activity awareness.
1.4 Results
- 75% reduction in context re-establishment time across sessions
- Sustained coherence across conversations spanning weeks and months
- User trust scores significantly higher than traditional chatbots due to CR transparency
- Mobile-first design enabling voice-first thought capture with sub-second latency
2. Motivation & Problem Definition
2.1 The Stateless AI Crisis
Modern AI assistants operate in a fundamentally broken paradigm: every interaction is isolated. Consider the typical user experience:
Day 1: "I'm planning a startup in the climate tech space. Here's my background, my goals, my constraints..."
Day 2: "Remember that startup idea? Oh wait, you don't. Let me re-explain everything..."
Day 30: "We discussed this three times already. Why don't you remember?"
This isn't a bug—it's the dominant architecture. ChatGPT, Claude, and other assistants maintain conversation history within a session but lose everything between sessions.
2.2 Why Memory Matters
Human cognition is fundamentally cumulative. We build mental models over time, refine them through experience, and apply accumulated wisdom to new situations. An AI that cannot do the same is permanently limited to shallow, context-poor responses.
Consider what true memory enables:
- Pattern recognition: "You tend to overthink technical decisions and under-invest in user research. This feels like the same pattern."
- Preference learning: "Based on your feedback on three prior documents, you prefer concise bullet points over narrative paragraphs."
- Contextual anchoring: "This connects to your insight from March about infrastructure costs. Want me to pull that in?"
- Relationship mapping: "Sarah mentioned this concern in your meeting notes last week. There may be organizational resistance here."
2.3 The Context Window Trap
Even with perfect memory, LLMs face a fundamental constraint: finite context windows. A 128K token window sounds large, but fills quickly:
- System prompt: 2-4K tokens
- Conversation history: 10-50K tokens
- Retrieved documents: 20-80K tokens
- User query: 500-2K tokens
Naive approaches that dump everything into context fail at scale. What's needed is intelligent selection—surfacing the right information at the right time within hard token constraints.
2.4 The Trust Problem
As AI systems become more capable of taking action, trust becomes paramount. Users need:
- Transparency: What is the AI doing and why?
- Control: Can I approve, reject, or modify AI decisions?
- Auditability: What happened and who authorized it?
- Reversibility: Can I undo AI-driven changes?
2.5 Design Requirements
| Requirement | Description |
|---|---|
| Persistent Memory | Knowledge survives sessions indefinitely |
| Intelligent Context | Right information surfaced within token budgets |
| User Control | Humans approve meaningful AI actions |
| Transparency | All AI reasoning and actions visible |
| Multi-Modal Input | Voice, text, and file capture supported |
| Mobile-First | Works seamlessly on mobile devices |
| Offline-Capable | Core functionality without connectivity |
3. System Overview
3.1 High-Level Architecture
Murmer Cognitive OS is structured as a layered system with clear separation of concerns:
┌─────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Mobile │ │ Web │ │ Desktop │ │
│ │ (Expo RN) │ │ (React) │ │ (Electron) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ COGNITIVE LAYER │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Context │ │ Change │ │ Agent │ │
│ │ Engine │ │ Requests │ │ Profiles │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Knowledge │ │ Stream │ │ Tool │ │
│ │ Graph │ │ Events │ │ Registry │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ DATA LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ PostgreSQL │ │ Vector │ │ Blob │ │
│ │ (Entities) │ │ Store │ │ Storage │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
3.2 Core Entities
Murmer's knowledge model centers on seven entity types organized hierarchically:
Space (Container)
├── Thread (Conversation)
│ ├── Message (User/AI turn)
│ └── Artifact (Structured output)
├── Task (Actionable item)
├── Note (Freeform capture)
├── File (Attachment)
└── Memory Node (Semantic memory)
Spaces serve as top-level containers—similar to projects or workspaces—that group related Threads, Tasks, and Artifacts.
Threads are the primary interaction surface: ongoing conversations between user and AI, potentially spanning days or months.
Artifacts are structured outputs: documents, code snippets, diagrams, plans—anything the AI produces that has lasting value beyond the conversation.
Memory Nodes are the semantic layer: extracted insights, decisions, preferences, and facts that transcend any single Thread.
3.3 Design Principles
Seven core principles guide all architectural decisions:
- Trust Through Transparency: Every AI action is visible and explainable
- Control Through CRs: Users approve meaningful state changes
- Context as First-Class: Intelligent context selection is central, not peripheral
- Memory as Graph: Relationships between entities are as important as entities themselves
- Event-Driven Narrative: All activity forms a coherent, replayable story
- Mobile-First: Core workflows optimized for mobile voice capture
- Graceful Degradation: System functions offline with eventual sync
System Architecture Overview
High-level architecture showing the relationship between mobile client, backend services, and data stores
Knowledge Graph Schema
Entity types and relationships in the personal knowledge graph
4. Context Engine
The Context Engine is Murmer's cognitive core—responsible for constructing optimal prompts within token constraints. It answers the question: "What information does the AI need to give the best response right now?"
4.1 Budget Allocation
Context is allocated across five pools with configurable budgets:
| Pool | Default Budget | Description |
|---|---|---|
| System | 2,000 tokens | System prompt, profile, instructions |
| STM | 8,000 tokens | Recent conversation (Short-Term Memory) |
| LTM | 6,000 tokens | Memory nodes (Long-Term Memory) |
| Artifacts | 8,000 tokens | Referenced artifacts and documents |
| Reserve | 8,000 tokens | Response generation buffer |
Total default context: 32,000 tokens
4.2 Relevance Scoring
Each candidate item receives a composite score:
relevance_score = (
α × semantic_similarity +
β × recency_weight +
γ × explicit_reference_boost +
δ × relationship_proximity +
ε × usage_frequency
)
Where:
- Semantic similarity: Embedding cosine distance to current query
- Recency weight: Exponential decay based on age (half-life configurable)
- Explicit reference: Boost for items explicitly mentioned in conversation
- Relationship proximity: Graph distance from current Thread/Space
- Usage frequency: How often item has been referenced historically
4.3 Selection Algorithm
def select_context(candidates, budget, profile):
# Apply profile-specific weight modifiers
weights = apply_profile_weights(profile)
# Score all candidates
scored = [(c, score(c, weights)) for c in candidates]
# Sort by score descending
scored.sort(key=lambda x: x[1], reverse=True)
# Greedy selection within budget
selected = []
used_tokens = 0
for candidate, score in scored:
tokens = estimate_tokens(candidate)
if used_tokens + tokens <= budget:
selected.append(candidate)
used_tokens += tokens
return selected
Context Engine Selection Flow
How the Context Engine selects relevant context for each interaction
5. Knowledge Graph
The Knowledge Graph provides Murmer's persistent memory layer—structured storage of entities and their relationships.
5.1 Memory Node Structure
Memory Nodes are the semantic layer—extracted insights that transcend individual conversations:
interface MemoryNode {
id: string;
spaceId: string;
content: string; // The memory content
summary: string; // One-line summary
category: MemoryCategory; // decision | insight | preference | fact | rule
importance: number; // 0.0 - 1.0
confidence: number; // 0.0 - 1.0
sourceThreadIds: string[]; // Threads that contributed
embedding: number[]; // Vector for semantic search
linkedEntities: EntityLink[];
expiresAt?: DateTime; // Optional TTL
}
5.2 Relationship Model
Entities connect through typed relationships:
type RelationType =
| 'references' // Explicit reference
| 'derived_from' // Created from source
| 'contradicts' // Conflicting information
| 'supersedes' // Newer version
| 'related_to' // Semantic relationship
| 'depends_on' // Dependency
| 'child_of'; // Hierarchical
5.3 Semantic Search
Memory retrieval uses hybrid search combining:
- Vector similarity: Embedding-based semantic search
- Full-text search: Keyword matching with ranking
- Graph traversal: Following relationship edges
- Temporal filtering: Recency-based relevance
6. Change Request System
The Change Request (CR) system enables AI autonomy while preserving user control. Every meaningful state mutation proposed by the AI flows through CR approval.
6.1 CR Lifecycle
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ PENDING │─────▶│APPROVED │─────▶│ APPLIED │─────▶│COMPLETED│
└─────────┘ └─────────┘ └─────────┘ └─────────┘
│ │
│ ┌─────────┐ │
└──────────▶│REJECTED │ │
│ └─────────┘ │
│ │
│ ┌─────────┐ │
└──────────▶│MODIFIED │─────────────┘
└─────────┘
6.2 CR Policies
Agent Profiles can configure CR behavior:
| Policy | Description |
|---|---|
require_approval | All CRs require explicit user approval |
auto_approve | Low-risk CRs auto-approve after delay |
disabled | CR system bypassed (dangerous) |
Auto-approve conditions (when enabled):
- Entity type is low-risk (notes, minor artifact updates)
- AI confidence > 0.9
- No conflicting recent user edits
- Within rate limits
6.3 Risk Stratification
function calculateRiskScore(cr: ChangeRequest): number {
let score = 0;
// Entity type risk
const entityRisk = {
'note': 0.1,
'task': 0.2,
'artifact': 0.4,
'memory_node': 0.5,
'thread': 0.6,
'space': 0.8,
};
score += entityRisk[cr.entityType] ?? 0.5;
// Operation risk
if (cr.operation === 'delete') score += 0.3;
if (cr.operation === 'update') score += 0.1;
// Inverse confidence
score += (1 - cr.confidence) * 0.3;
return Math.min(score, 1.0);
}
Change Request Approval Workflow
Human-in-the-loop approval system for knowledge graph modifications
7. Agent Profiles & Modes
Agent Profiles shape AI behavior—defining reasoning style, tone, and context strategy.
7.1 Default Profiles
Murmer ships with seven pre-configured profiles:
| Profile | Tone | Reasoning | Primary Use |
|---|---|---|---|
| Balanced Generalist | Neutral | Balanced | Default, general use |
| Build Strategist | Direct | Shallow | Implementation planning |
| Research Analyst | Formal | Deep | Research, analysis |
| Creative Explorer | Imaginative | Exploratory | Brainstorming, ideation |
| Critical Reviewer | Skeptical | Critical | Design review, validation |
| Socratic Facilitator | Questioning | Guided | Coaching, clarification |
| Agent Mode | Technical | Deterministic | Autonomous task execution |
7.2 Mode Application
Modes (applied profiles) can be set at:
- Thread level: Specific to a conversation
- Space level: Default for all Threads in Space
Inheritance: Thread Mode > Space Mode > System Default
7.3 Intent Detection
Murmer monitors user behavior to suggest mode switches:
Signals analyzed:
- Message structure and keywords
- Artifact editing patterns
- Task creation frequency
- Question vs. statement ratio
Example suggestion:
"Your last 5 messages indicate you're shifting into planning. Switch to Build Strategist Mode?"
8. Key Technical Challenges & Solutions
8.1 Token Budget Optimization
Problem: With 32K-128K token context windows, how do we select the most relevant information without exceeding limits or missing critical context?
Solution: Multi-signal relevance scoring with dynamic budget allocation.
Metrics:
- 98% of queries fit within budget on first pass
- Average context utilization: 78% of available tokens
- Relevance score correlation with user satisfaction: 0.82
8.2 Memory Staleness
Problem: How do we prevent outdated memories from polluting context or contradicting current understanding?
Solution: Multi-factor decay with explicit contradiction detection.
function calculateMemoryRelevance(memory, currentTime) {
const age = daysBetween(memory.createdAt, currentTime);
// Base decay: half-life of 90 days
const decayFactor = Math.pow(0.5, age / 90);
// Boost for recent access
const accessBoost = memory.lastAccessedAt
? Math.pow(0.5, daysBetween(memory.lastAccessedAt, currentTime) / 30)
: 0;
// Confidence-weighted importance
const baseScore = memory.importance * memory.confidence;
return baseScore * decayFactor * (1 + accessBoost * 0.3);
}
8.3 CR Approval Friction
Problem: Requiring approval for every change creates friction. But auto-approving everything eliminates user control.
Solution: Risk-stratified approval with intelligent batching. Low-risk, high-confidence changes auto-approve with a 5-second cancellation window. Medium-risk changes batch together. High-risk changes require explicit approval.
8.4 Mobile Performance
Problem: Mobile devices have limited CPU, memory, and battery.
Solution: Aggressive caching, background processing, and optimistic updates. Messages appear instantly while syncing in background.
8.5 Offline Resilience
Problem: Users expect to capture thoughts even without connectivity.
Solution: Local-first architecture with eventual sync. Offline queue persists messages and syncs when connectivity returns.
9. Evaluation & Results
Context Quality Metrics
We evaluated the Context Engine against baselines using human relevance judgments on 500 sample queries.
| Method | Precision@10 | Recall@10 | User Satisfaction |
|---|---|---|---|
| Naive (recent only) | 0.42 | 0.31 | 3.2/5 |
| Vector search only | 0.61 | 0.58 | 3.8/5 |
| Murmer Context Engine | 0.78 | 0.71 | 4.4/5 |
Memory Effectiveness
- Memory utilization rate: 67% of retrieved memories referenced in responses
- False positive rate: 8% of memories retrieved but irrelevant
- User-confirmed accuracy: 91% of surfaced memories were factually correct
CR System Metrics
| Metric | Value |
|---|---|
| CRs created per conversation | 2.3 avg |
| Approval rate | 89% |
| Time to approval | 4.2 seconds median |
| Rollback rate | 2.1% |
| Auto-approved (low-risk) | 34% |
User Study Results
We conducted a 4-week study with 50 participants comparing Murmer against traditional chatbots:
| Metric | Traditional | Murmer | Improvement |
|---|---|---|---|
| Context re-establishment time | 45s avg | 11s avg | 75% reduction |
| Cross-session coherence rating | 2.1/5 | 4.3/5 | 105% improvement |
| Trust in AI decisions | 2.8/5 | 4.1/5 | 46% improvement |
| Overall satisfaction | 3.4/5 | 4.4/5 | 29% improvement |
Qualitative feedback highlighted:
- "It actually remembers what we discussed last week"
- "The approval system makes me trust it more"
- "I can finally think out loud without re-explaining everything"
10. Future Work
Multi-Agent Collaboration
Current work extends Murmer to support multiple specialized agents within a single workspace. Each agent maintains its own profile while sharing the knowledge graph, enabling sophisticated multi-perspective analysis.
Proactive Memory
Moving beyond reactive retrieval to proactive memory surfacing:
- Scheduled synthesis: Daily/weekly reviews that identify patterns across conversations
- Contradiction alerts: Notify users when new information conflicts with stored memory
- Insight generation: Autonomous identification of non-obvious connections
Collaborative Workspaces
Extending the single-user model to team scenarios:
- Shared Spaces: Multiple users contributing to the same knowledge graph
- Permission models: Fine-grained control over who can read/write entities
- Merge conflict resolution: Handling simultaneous edits to shared artifacts
Advanced Reasoning
Exploring enhanced reasoning capabilities:
- Multi-step planning: Explicit plan generation with checkpoints
- Reflection loops: AI self-evaluation and correction
- External tool integration: Web search, code execution, API calls
Enterprise Features
Production-ready enterprise capabilities:
- SSO integration (SAML/OIDC)
- Audit compliance for regulatory requirements
- Regional data storage options
- Centralized profile and policy management
11. Conclusion
Murmer Cognitive OS represents a fundamental shift from stateless AI assistants to persistent cognitive partnership. By combining intelligent context selection, structured memory, and human-in-the-loop approval workflows, it creates AI that truly grows with its users.
The Context Engine ensures relevant information surfaces within token constraints. The Knowledge Graph provides semantic memory that transcends individual conversations. The Change Request system maintains trust through transparency and control. And the mobile-first architecture enables thought capture anywhere.
The evaluation results validate this approach: 75% reduction in context re-establishment, sustained coherence across weeks, and significantly higher user trust scores. Users report that Murmer feels less like a tool and more like a cognitive partner that remembers, learns, and adapts.
As AI capabilities continue to advance, the systems that bridge the gap between stateless assistance and true cognitive partnership will define the next generation of human-AI collaboration. Murmer demonstrates that this future is achievable today.