All Whitepapers
Published

Murmer Cognitive OS: A Personal AI Operating System with Persistent Memory and Contextual Reasoning

Technical architecture of a cognitive OS with persistent memory, intelligent context selection, and human-in-the-loop approval workflows for trustworthy AI autonomy.

Jordan Allen
December 2024
Version 1.0
Persistent MemoryKnowledge GraphContext EngineChange RequestsMobile-First AI

Abstract

Murmer Cognitive OS represents a fundamental shift from stateless AI assistants to a persistent cognitive operating system that thinks with you, not just for you. Unlike traditional chatbots that forget context between sessions, Murmer maintains a living knowledge graph of your thoughts, decisions, projects, and insights—enabling AI that grows smarter about you over time.

This whitepaper presents the technical architecture, design philosophy, and implementation details of Murmer's core systems: the Context Engine for intelligent prompt construction, the Knowledge Graph for persistent memory, the Change Request system for trustworthy AI autonomy, and the Stream Event architecture for unified activity narrative.

1. Executive Summary

1.1 The Problem

Today's AI assistants suffer from a fundamental limitation: cognitive amnesia. Every conversation starts from zero. Users repeatedly re-explain context, re-establish preferences, and re-share relevant history. This creates friction that prevents AI from becoming a true cognitive partner.

The consequences are significant:

  • Context collapse: Important decisions and reasoning get lost between sessions
  • Repeated work: Users waste time providing the same background repeatedly
  • Shallow assistance: AI cannot build on prior conversations to offer deeper insights
  • Trust erosion: Without memory, AI cannot demonstrate learning or growth

1.2 Our Solution

Murmer Cognitive OS introduces a persistent cognitive layer that transforms AI interaction from stateless Q&A into continuous cognitive partnership. The system:

  • Remembers everything through a structured knowledge graph of entities, relationships, and semantic memory
  • Reasons with full context by intelligently selecting relevant information within token budget constraints
  • Maintains trust through a Change Request system that gives users control over AI-driven state mutations
  • Provides transparency via unified activity streams that show what the AI is doing and why

1.3 Key Innovations

  1. Context Engine: A sophisticated prompt construction system that ranks, selects, and assembles contextual information within strict token budgets while maintaining relevance and coherence.
  2. Knowledge Graph: A multi-layered memory architecture supporting Spaces (containers), Threads (conversations), Artifacts (structured outputs), Memory Nodes (semantic memory), Tasks, Notes, and Files.
  3. Change Request System: A human-in-the-loop approval workflow that enables AI autonomy while preserving user control through explicit approval, rejection, or modification of proposed changes.
  4. Agent Profiles & Modes: Behavioral templates that shape AI reasoning depth, tone, memory weighting, and tool usage—switchable in real-time based on user intent.
  5. Stream Event Architecture: A unified event model that captures all system activity as a coherent narrative, enabling replay, audit, and cross-entity activity awareness.

1.4 Results

  • 75% reduction in context re-establishment time across sessions
  • Sustained coherence across conversations spanning weeks and months
  • User trust scores significantly higher than traditional chatbots due to CR transparency
  • Mobile-first design enabling voice-first thought capture with sub-second latency

2. Motivation & Problem Definition

2.1 The Stateless AI Crisis

Modern AI assistants operate in a fundamentally broken paradigm: every interaction is isolated. Consider the typical user experience:

Day 1: "I'm planning a startup in the climate tech space. Here's my background, my goals, my constraints..."

Day 2: "Remember that startup idea? Oh wait, you don't. Let me re-explain everything..."

Day 30: "We discussed this three times already. Why don't you remember?"

This isn't a bug—it's the dominant architecture. ChatGPT, Claude, and other assistants maintain conversation history within a session but lose everything between sessions.

2.2 Why Memory Matters

Human cognition is fundamentally cumulative. We build mental models over time, refine them through experience, and apply accumulated wisdom to new situations. An AI that cannot do the same is permanently limited to shallow, context-poor responses.

Consider what true memory enables:

  • Pattern recognition: "You tend to overthink technical decisions and under-invest in user research. This feels like the same pattern."
  • Preference learning: "Based on your feedback on three prior documents, you prefer concise bullet points over narrative paragraphs."
  • Contextual anchoring: "This connects to your insight from March about infrastructure costs. Want me to pull that in?"
  • Relationship mapping: "Sarah mentioned this concern in your meeting notes last week. There may be organizational resistance here."

2.3 The Context Window Trap

Even with perfect memory, LLMs face a fundamental constraint: finite context windows. A 128K token window sounds large, but fills quickly:

  • System prompt: 2-4K tokens
  • Conversation history: 10-50K tokens
  • Retrieved documents: 20-80K tokens
  • User query: 500-2K tokens

Naive approaches that dump everything into context fail at scale. What's needed is intelligent selection—surfacing the right information at the right time within hard token constraints.

2.4 The Trust Problem

As AI systems become more capable of taking action, trust becomes paramount. Users need:

  • Transparency: What is the AI doing and why?
  • Control: Can I approve, reject, or modify AI decisions?
  • Auditability: What happened and who authorized it?
  • Reversibility: Can I undo AI-driven changes?

2.5 Design Requirements

RequirementDescription
Persistent MemoryKnowledge survives sessions indefinitely
Intelligent ContextRight information surfaced within token budgets
User ControlHumans approve meaningful AI actions
TransparencyAll AI reasoning and actions visible
Multi-Modal InputVoice, text, and file capture supported
Mobile-FirstWorks seamlessly on mobile devices
Offline-CapableCore functionality without connectivity

3. System Overview

3.1 High-Level Architecture

Murmer Cognitive OS is structured as a layered system with clear separation of concerns:

┌─────────────────────────────────────────────────────────────────┐
│                        CLIENT LAYER                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │   Mobile    │  │     Web     │  │   Desktop   │              │
│  │  (Expo RN)  │  │   (React)   │  │  (Electron) │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     COGNITIVE LAYER                              │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐       │
│  │    Context    │  │    Change     │  │    Agent      │       │
│  │    Engine     │  │   Requests    │  │   Profiles    │       │
│  └───────────────┘  └───────────────┘  └───────────────┘       │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐       │
│  │   Knowledge   │  │    Stream     │  │     Tool      │       │
│  │     Graph     │  │    Events     │  │   Registry    │       │
│  └───────────────┘  └───────────────┘  └───────────────┘       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      DATA LAYER                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │  PostgreSQL │  │   Vector    │  │    Blob     │              │
│  │  (Entities) │  │   Store     │  │   Storage   │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└─────────────────────────────────────────────────────────────────┘

3.2 Core Entities

Murmer's knowledge model centers on seven entity types organized hierarchically:

Space (Container)
├── Thread (Conversation)
│   ├── Message (User/AI turn)
│   └── Artifact (Structured output)
├── Task (Actionable item)
├── Note (Freeform capture)
├── File (Attachment)
└── Memory Node (Semantic memory)

Spaces serve as top-level containers—similar to projects or workspaces—that group related Threads, Tasks, and Artifacts.

Threads are the primary interaction surface: ongoing conversations between user and AI, potentially spanning days or months.

Artifacts are structured outputs: documents, code snippets, diagrams, plans—anything the AI produces that has lasting value beyond the conversation.

Memory Nodes are the semantic layer: extracted insights, decisions, preferences, and facts that transcend any single Thread.

3.3 Design Principles

Seven core principles guide all architectural decisions:

  1. Trust Through Transparency: Every AI action is visible and explainable
  2. Control Through CRs: Users approve meaningful state changes
  3. Context as First-Class: Intelligent context selection is central, not peripheral
  4. Memory as Graph: Relationships between entities are as important as entities themselves
  5. Event-Driven Narrative: All activity forms a coherent, replayable story
  6. Mobile-First: Core workflows optimized for mobile voice capture
  7. Graceful Degradation: System functions offline with eventual sync

System Architecture Overview

Loading diagram...

High-level architecture showing the relationship between mobile client, backend services, and data stores

Knowledge Graph Schema

Loading diagram...

Entity types and relationships in the personal knowledge graph

4. Context Engine

The Context Engine is Murmer's cognitive core—responsible for constructing optimal prompts within token constraints. It answers the question: "What information does the AI need to give the best response right now?"

4.1 Budget Allocation

Context is allocated across five pools with configurable budgets:

PoolDefault BudgetDescription
System2,000 tokensSystem prompt, profile, instructions
STM8,000 tokensRecent conversation (Short-Term Memory)
LTM6,000 tokensMemory nodes (Long-Term Memory)
Artifacts8,000 tokensReferenced artifacts and documents
Reserve8,000 tokensResponse generation buffer

Total default context: 32,000 tokens

4.2 Relevance Scoring

Each candidate item receives a composite score:

relevance_score = (
    α × semantic_similarity +
    β × recency_weight +
    γ × explicit_reference_boost +
    δ × relationship_proximity +
    ε × usage_frequency
)

Where:

  • Semantic similarity: Embedding cosine distance to current query
  • Recency weight: Exponential decay based on age (half-life configurable)
  • Explicit reference: Boost for items explicitly mentioned in conversation
  • Relationship proximity: Graph distance from current Thread/Space
  • Usage frequency: How often item has been referenced historically

4.3 Selection Algorithm

def select_context(candidates, budget, profile):
    # Apply profile-specific weight modifiers
    weights = apply_profile_weights(profile)

    # Score all candidates
    scored = [(c, score(c, weights)) for c in candidates]

    # Sort by score descending
    scored.sort(key=lambda x: x[1], reverse=True)

    # Greedy selection within budget
    selected = []
    used_tokens = 0

    for candidate, score in scored:
        tokens = estimate_tokens(candidate)
        if used_tokens + tokens <= budget:
            selected.append(candidate)
            used_tokens += tokens

    return selected

Context Engine Selection Flow

Loading diagram...

How the Context Engine selects relevant context for each interaction

5. Knowledge Graph

The Knowledge Graph provides Murmer's persistent memory layer—structured storage of entities and their relationships.

5.1 Memory Node Structure

Memory Nodes are the semantic layer—extracted insights that transcend individual conversations:

interface MemoryNode {
  id: string;
  spaceId: string;
  content: string;           // The memory content
  summary: string;           // One-line summary
  category: MemoryCategory;  // decision | insight | preference | fact | rule
  importance: number;        // 0.0 - 1.0
  confidence: number;        // 0.0 - 1.0
  sourceThreadIds: string[]; // Threads that contributed
  embedding: number[];       // Vector for semantic search
  linkedEntities: EntityLink[];
  expiresAt?: DateTime;      // Optional TTL
}

5.2 Relationship Model

Entities connect through typed relationships:

type RelationType =
  | 'references'      // Explicit reference
  | 'derived_from'    // Created from source
  | 'contradicts'     // Conflicting information
  | 'supersedes'      // Newer version
  | 'related_to'      // Semantic relationship
  | 'depends_on'      // Dependency
  | 'child_of';       // Hierarchical

5.3 Semantic Search

Memory retrieval uses hybrid search combining:

  1. Vector similarity: Embedding-based semantic search
  2. Full-text search: Keyword matching with ranking
  3. Graph traversal: Following relationship edges
  4. Temporal filtering: Recency-based relevance

6. Change Request System

The Change Request (CR) system enables AI autonomy while preserving user control. Every meaningful state mutation proposed by the AI flows through CR approval.

6.1 CR Lifecycle

┌─────────┐      ┌─────────┐      ┌─────────┐      ┌─────────┐
│ PENDING │─────▶│APPROVED │─────▶│ APPLIED │─────▶│COMPLETED│
└─────────┘      └─────────┘      └─────────┘      └─────────┘
     │                                   │
     │           ┌─────────┐             │
     └──────────▶│REJECTED │             │
     │           └─────────┘             │
     │                                   │
     │           ┌─────────┐             │
     └──────────▶│MODIFIED │─────────────┘
                 └─────────┘

6.2 CR Policies

Agent Profiles can configure CR behavior:

PolicyDescription
require_approvalAll CRs require explicit user approval
auto_approveLow-risk CRs auto-approve after delay
disabledCR system bypassed (dangerous)

Auto-approve conditions (when enabled):

  • Entity type is low-risk (notes, minor artifact updates)
  • AI confidence > 0.9
  • No conflicting recent user edits
  • Within rate limits

6.3 Risk Stratification

function calculateRiskScore(cr: ChangeRequest): number {
  let score = 0;

  // Entity type risk
  const entityRisk = {
    'note': 0.1,
    'task': 0.2,
    'artifact': 0.4,
    'memory_node': 0.5,
    'thread': 0.6,
    'space': 0.8,
  };
  score += entityRisk[cr.entityType] ?? 0.5;

  // Operation risk
  if (cr.operation === 'delete') score += 0.3;
  if (cr.operation === 'update') score += 0.1;

  // Inverse confidence
  score += (1 - cr.confidence) * 0.3;

  return Math.min(score, 1.0);
}

Change Request Approval Workflow

Loading diagram...

Human-in-the-loop approval system for knowledge graph modifications

7. Agent Profiles & Modes

Agent Profiles shape AI behavior—defining reasoning style, tone, and context strategy.

7.1 Default Profiles

Murmer ships with seven pre-configured profiles:

ProfileToneReasoningPrimary Use
Balanced GeneralistNeutralBalancedDefault, general use
Build StrategistDirectShallowImplementation planning
Research AnalystFormalDeepResearch, analysis
Creative ExplorerImaginativeExploratoryBrainstorming, ideation
Critical ReviewerSkepticalCriticalDesign review, validation
Socratic FacilitatorQuestioningGuidedCoaching, clarification
Agent ModeTechnicalDeterministicAutonomous task execution

7.2 Mode Application

Modes (applied profiles) can be set at:

  • Thread level: Specific to a conversation
  • Space level: Default for all Threads in Space

Inheritance: Thread Mode > Space Mode > System Default

7.3 Intent Detection

Murmer monitors user behavior to suggest mode switches:

Signals analyzed:

  • Message structure and keywords
  • Artifact editing patterns
  • Task creation frequency
  • Question vs. statement ratio

Example suggestion:

"Your last 5 messages indicate you're shifting into planning. Switch to Build Strategist Mode?"

8. Key Technical Challenges & Solutions

8.1 Token Budget Optimization

Problem: With 32K-128K token context windows, how do we select the most relevant information without exceeding limits or missing critical context?

Solution: Multi-signal relevance scoring with dynamic budget allocation.

Metrics:

  • 98% of queries fit within budget on first pass
  • Average context utilization: 78% of available tokens
  • Relevance score correlation with user satisfaction: 0.82

8.2 Memory Staleness

Problem: How do we prevent outdated memories from polluting context or contradicting current understanding?

Solution: Multi-factor decay with explicit contradiction detection.

function calculateMemoryRelevance(memory, currentTime) {
  const age = daysBetween(memory.createdAt, currentTime);

  // Base decay: half-life of 90 days
  const decayFactor = Math.pow(0.5, age / 90);

  // Boost for recent access
  const accessBoost = memory.lastAccessedAt
    ? Math.pow(0.5, daysBetween(memory.lastAccessedAt, currentTime) / 30)
    : 0;

  // Confidence-weighted importance
  const baseScore = memory.importance * memory.confidence;

  return baseScore * decayFactor * (1 + accessBoost * 0.3);
}

8.3 CR Approval Friction

Problem: Requiring approval for every change creates friction. But auto-approving everything eliminates user control.

Solution: Risk-stratified approval with intelligent batching. Low-risk, high-confidence changes auto-approve with a 5-second cancellation window. Medium-risk changes batch together. High-risk changes require explicit approval.

8.4 Mobile Performance

Problem: Mobile devices have limited CPU, memory, and battery.

Solution: Aggressive caching, background processing, and optimistic updates. Messages appear instantly while syncing in background.

8.5 Offline Resilience

Problem: Users expect to capture thoughts even without connectivity.

Solution: Local-first architecture with eventual sync. Offline queue persists messages and syncs when connectivity returns.

9. Evaluation & Results

Context Quality Metrics

We evaluated the Context Engine against baselines using human relevance judgments on 500 sample queries.

MethodPrecision@10Recall@10User Satisfaction
Naive (recent only)0.420.313.2/5
Vector search only0.610.583.8/5
Murmer Context Engine0.780.714.4/5

Memory Effectiveness

  • Memory utilization rate: 67% of retrieved memories referenced in responses
  • False positive rate: 8% of memories retrieved but irrelevant
  • User-confirmed accuracy: 91% of surfaced memories were factually correct

CR System Metrics

MetricValue
CRs created per conversation2.3 avg
Approval rate89%
Time to approval4.2 seconds median
Rollback rate2.1%
Auto-approved (low-risk)34%

User Study Results

We conducted a 4-week study with 50 participants comparing Murmer against traditional chatbots:

MetricTraditionalMurmerImprovement
Context re-establishment time45s avg11s avg75% reduction
Cross-session coherence rating2.1/54.3/5105% improvement
Trust in AI decisions2.8/54.1/546% improvement
Overall satisfaction3.4/54.4/529% improvement

Qualitative feedback highlighted:

  • "It actually remembers what we discussed last week"
  • "The approval system makes me trust it more"
  • "I can finally think out loud without re-explaining everything"

10. Future Work

Multi-Agent Collaboration

Current work extends Murmer to support multiple specialized agents within a single workspace. Each agent maintains its own profile while sharing the knowledge graph, enabling sophisticated multi-perspective analysis.

Proactive Memory

Moving beyond reactive retrieval to proactive memory surfacing:

  • Scheduled synthesis: Daily/weekly reviews that identify patterns across conversations
  • Contradiction alerts: Notify users when new information conflicts with stored memory
  • Insight generation: Autonomous identification of non-obvious connections

Collaborative Workspaces

Extending the single-user model to team scenarios:

  • Shared Spaces: Multiple users contributing to the same knowledge graph
  • Permission models: Fine-grained control over who can read/write entities
  • Merge conflict resolution: Handling simultaneous edits to shared artifacts

Advanced Reasoning

Exploring enhanced reasoning capabilities:

  • Multi-step planning: Explicit plan generation with checkpoints
  • Reflection loops: AI self-evaluation and correction
  • External tool integration: Web search, code execution, API calls

Enterprise Features

Production-ready enterprise capabilities:

  • SSO integration (SAML/OIDC)
  • Audit compliance for regulatory requirements
  • Regional data storage options
  • Centralized profile and policy management

11. Conclusion

Murmer Cognitive OS represents a fundamental shift from stateless AI assistants to persistent cognitive partnership. By combining intelligent context selection, structured memory, and human-in-the-loop approval workflows, it creates AI that truly grows with its users.

The Context Engine ensures relevant information surfaces within token constraints. The Knowledge Graph provides semantic memory that transcends individual conversations. The Change Request system maintains trust through transparency and control. And the mobile-first architecture enables thought capture anywhere.

The evaluation results validate this approach: 75% reduction in context re-establishment, sustained coherence across weeks, and significantly higher user trust scores. Users report that Murmer feels less like a tool and more like a cognitive partner that remembers, learns, and adapts.

As AI capabilities continue to advance, the systems that bridge the gap between stateless assistance and true cognitive partnership will define the next generation of human-AI collaboration. Murmer demonstrates that this future is achievable today.

This document was authored by Jordan Allen. It represents original technical work on the Murmer system.