Murmer Cognitive OS: A Personal AI Operating System with Persistent Memory and Contextual Reasoning

Abstract

Murmer Cognitive OS represents a fundamental shift from stateless AI assistants to a persistent cognitive operating system that thinks with you, not just for you. Unlike traditional chatbots that forget context between sessions, Murmer maintains a living knowledge graph of your thoughts, decisions, projects, and insights—enabling AI that grows smarter about you over time.

This whitepaper presents the technical architecture, design philosophy, and implementation details of Murmer's core systems: the Context Engine for intelligent prompt construction, the Knowledge Graph for persistent memory, the Change Request system for trustworthy AI autonomy, and the Stream Event architecture for unified activity narrative.

1. Executive Summary

1.1 The Problem

Today's AI assistants suffer from a fundamental limitation: cognitive amnesia. Every conversation starts from zero. Users repeatedly re-explain context, re-establish preferences, and re-share relevant history. This creates friction that prevents AI from becoming a true cognitive partner.

The consequences are significant:

Context collapse: Important decisions and reasoning get lost between sessions
Repeated work: Users waste time providing the same background repeatedly
Shallow assistance: AI cannot build on prior conversations to offer deeper insights
Trust erosion: Without memory, AI cannot demonstrate learning or growth

1.2 Our Solution

Murmer Cognitive OS introduces a persistent cognitive layer that transforms AI interaction from stateless Q&A into continuous cognitive partnership. The system:

Remembers everything through a structured knowledge graph of entities, relationships, and semantic memory
Reasons with full context by intelligently selecting relevant information within token budget constraints
Maintains trust through a Change Request system that gives users control over AI-driven state mutations
Provides transparency via unified activity streams that show what the AI is doing and why

1.3 Key Innovations

Context Engine: A sophisticated prompt construction system that ranks, selects, and assembles contextual information within strict token budgets while maintaining relevance and coherence.
Knowledge Graph: A multi-layered memory architecture supporting Spaces (containers), Threads (conversations), Artifacts (structured outputs), Memory Nodes (semantic memory), Tasks, Notes, and Files.
Change Request System: A human-in-the-loop approval workflow that enables AI autonomy while preserving user control through explicit approval, rejection, or modification of proposed changes.
Agent Profiles & Modes: Behavioral templates that shape AI reasoning depth, tone, memory weighting, and tool usage—switchable in real-time based on user intent.
Stream Event Architecture: A unified event model that captures all system activity as a coherent narrative, enabling replay, audit, and cross-entity activity awareness.

1.4 Results

75% reduction in context re-establishment time across sessions
Sustained coherence across conversations spanning weeks and months
User trust scores significantly higher than traditional chatbots due to CR transparency
Mobile-first design enabling voice-first thought capture with sub-second latency

2. Motivation & Problem Definition

2.1 The Stateless AI Crisis

Modern AI assistants operate in a fundamentally broken paradigm: every interaction is isolated. Consider the typical user experience:

Day 1: "I'm planning a startup in the climate tech space. Here's my background, my goals, my constraints..."

Day 2: "Remember that startup idea? Oh wait, you don't. Let me re-explain everything..."

Day 30: "We discussed this three times already. Why don't you remember?"

This isn't a bug—it's the dominant architecture. ChatGPT, Claude, and other assistants maintain conversation history within a session but lose everything between sessions.

2.2 Why Memory Matters

Human cognition is fundamentally cumulative. We build mental models over time, refine them through experience, and apply accumulated wisdom to new situations. An AI that cannot do the same is permanently limited to shallow, context-poor responses.

Consider what true memory enables:

Pattern recognition: "You tend to overthink technical decisions and under-invest in user research. This feels like the same pattern."
Preference learning: "Based on your feedback on three prior documents, you prefer concise bullet points over narrative paragraphs."
Contextual anchoring: "This connects to your insight from March about infrastructure costs. Want me to pull that in?"
Relationship mapping: "Sarah mentioned this concern in your meeting notes last week. There may be organizational resistance here."

2.3 The Context Window Trap

Even with perfect memory, LLMs face a fundamental constraint: finite context windows. A 128K token window sounds large, but fills quickly:

System prompt: 2-4K tokens
Conversation history: 10-50K tokens
Retrieved documents: 20-80K tokens
User query: 500-2K tokens

Naive approaches that dump everything into context fail at scale. What's needed is intelligent selection—surfacing the right information at the right time within hard token constraints.

2.4 The Trust Problem

As AI systems become more capable of taking action, trust becomes paramount. Users need:

Transparency: What is the AI doing and why?
Control: Can I approve, reject, or modify AI decisions?
Auditability: What happened and who authorized it?
Reversibility: Can I undo AI-driven changes?

2.5 Design Requirements

Requirement	Description
Persistent Memory	Knowledge survives sessions indefinitely
Intelligent Context	Right information surfaced within token budgets
User Control	Humans approve meaningful AI actions
Transparency	All AI reasoning and actions visible
Multi-Modal Input	Voice, text, and file capture supported
Mobile-First	Works seamlessly on mobile devices
Offline-Capable	Core functionality without connectivity

3. System Overview

3.1 High-Level Architecture

Murmer Cognitive OS is structured as a layered system with clear separation of concerns:

┌─────────────────────────────────────────────────────────────────┐
│                        CLIENT LAYER                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │   Mobile    │  │     Web     │  │   Desktop   │              │
│  │  (Expo RN)  │  │   (React)   │  │  (Electron) │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     COGNITIVE LAYER                              │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐       │
│  │    Context    │  │    Change     │  │    Agent      │       │
│  │    Engine     │  │   Requests    │  │   Profiles    │       │
│  └───────────────┘  └───────────────┘  └───────────────┘       │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐       │
│  │   Knowledge   │  │    Stream     │  │     Tool      │       │
│  │     Graph     │  │    Events     │  │   Registry    │       │
│  └───────────────┘  └───────────────┘  └───────────────┘       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      DATA LAYER                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │  PostgreSQL │  │   Vector    │  │    Blob     │              │
│  │  (Entities) │  │   Store     │  │   Storage   │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└─────────────────────────────────────────────────────────────────┘

3.2 Core Entities

Murmer's knowledge model centers on seven entity types organized hierarchically:

Space (Container)
├── Thread (Conversation)
│   ├── Message (User/AI turn)
│   └── Artifact (Structured output)
├── Task (Actionable item)
├── Note (Freeform capture)
├── File (Attachment)
└── Memory Node (Semantic memory)

Spaces serve as top-level containers—similar to projects or workspaces—that group related Threads, Tasks, and Artifacts.

Threads are the primary interaction surface: ongoing conversations between user and AI, potentially spanning days or months.

Artifacts are structured outputs: documents, code snippets, diagrams, plans—anything the AI produces that has lasting value beyond the conversation.

Memory Nodes are the semantic layer: extracted insights, decisions, preferences, and facts that transcend any single Thread.

3.3 Design Principles

Seven core principles guide all architectural decisions:

Trust Through Transparency: Every AI action is visible and explainable
Control Through CRs: Users approve meaningful state changes
Context as First-Class: Intelligent context selection is central, not peripheral
Memory as Graph: Relationships between entities are as important as entities themselves
Event-Driven Narrative: All activity forms a coherent, replayable story
Mobile-First: Core workflows optimized for mobile voice capture
Graceful Degradation: System functions offline with eventual sync

System Architecture Overview

Loading diagram...

High-level architecture showing the relationship between mobile client, backend services, and data stores

Knowledge Graph Schema

Loading diagram...

Entity types and relationships in the personal knowledge graph

4. Context Engine

The Context Engine is Murmer's cognitive core—responsible for constructing optimal prompts within token constraints. It answers the question: "What information does the AI need to give the best response right now?"

4.1 Budget Allocation

Context is allocated across five pools with configurable budgets:

Pool	Default Budget	Description
System	2,000 tokens	System prompt, profile, instructions
STM	8,000 tokens	Recent conversation (Short-Term Memory)
LTM	6,000 tokens	Memory nodes (Long-Term Memory)
Artifacts	8,000 tokens	Referenced artifacts and documents
Reserve	8,000 tokens	Response generation buffer

Total default context: 32,000 tokens

4.2 Relevance Scoring

Each candidate item receives a composite score:

relevance_score = (
    α × semantic_similarity +
    β × recency_weight +
    γ × explicit_reference_boost +
    δ × relationship_proximity +
    ε × usage_frequency
)

Where:

Semantic similarity: Embedding cosine distance to current query
Recency weight: Exponential decay based on age (half-life configurable)
Explicit reference: Boost for items explicitly mentioned in conversation
Relationship proximity: Graph distance from current Thread/Space
Usage frequency: How often item has been referenced historically

4.3 Selection Algorithm

def select_context(candidates, budget, profile):
    # Apply profile-specific weight modifiers
    weights = apply_profile_weights(profile)

    # Score all candidates
    scored = [(c, score(c, weights)) for c in candidates]

    # Sort by score descending
    scored.sort(key=lambda x: x[1], reverse=True)

    # Greedy selection within budget
    selected = []
    used_tokens = 0

    for candidate, score in scored:
        tokens = estimate_tokens(candidate)
        if used_tokens + tokens <= budget:
            selected.append(candidate)
            used_tokens += tokens

    return selected

Context Engine Selection Flow

Loading diagram...

How the Context Engine selects relevant context for each interaction

5. Knowledge Graph

The Knowledge Graph provides Murmer's persistent memory layer—structured storage of entities and their relationships.

5.1 Memory Node Structure

Memory Nodes are the semantic layer—extracted insights that transcend individual conversations:

interface MemoryNode {
  id: string;
  spaceId: string;
  content: string;           // The memory content
  summary: string;           // One-line summary
  category: MemoryCategory;  // decision | insight | preference | fact | rule
  importance: number;        // 0.0 - 1.0
  confidence: number;        // 0.0 - 1.0
  sourceThreadIds: string[]; // Threads that contributed
  embedding: number[];       // Vector for semantic search
  linkedEntities: EntityLink[];
  expiresAt?: DateTime;      // Optional TTL
}

5.2 Relationship Model

Entities connect through typed relationships:

type RelationType =
  | 'references'      // Explicit reference
  | 'derived_from'    // Created from source
  | 'contradicts'     // Conflicting information
  | 'supersedes'      // Newer version
  | 'related_to'      // Semantic relationship
  | 'depends_on'      // Dependency
  | 'child_of';       // Hierarchical

5.3 Semantic Search

Memory retrieval uses hybrid search combining:

Vector similarity: Embedding-based semantic search
Full-text search: Keyword matching with ranking
Graph traversal: Following relationship edges
Temporal filtering: Recency-based relevance

6. Change Request System

The Change Request (CR) system enables AI autonomy while preserving user control. Every meaningful state mutation proposed by the AI flows through CR approval.

6.1 CR Lifecycle

┌─────────┐      ┌─────────┐      ┌─────────┐      ┌─────────┐
│ PENDING │─────▶│APPROVED │─────▶│ APPLIED │─────▶│COMPLETED│
└─────────┘      └─────────┘      └─────────┘      └─────────┘
     │                                   │
     │           ┌─────────┐             │
     └──────────▶│REJECTED │             │
     │           └─────────┘             │
     │                                   │
     │           ┌─────────┐             │
     └──────────▶│MODIFIED │─────────────┘
                 └─────────┘

6.2 CR Policies

Agent Profiles can configure CR behavior:

Policy	Description
`require_approval`	All CRs require explicit user approval
`auto_approve`	Low-risk CRs auto-approve after delay
`disabled`	CR system bypassed (dangerous)

Auto-approve conditions (when enabled):

Entity type is low-risk (notes, minor artifact updates)
AI confidence > 0.9
No conflicting recent user edits
Within rate limits

6.3 Risk Stratification

function calculateRiskScore(cr: ChangeRequest): number {
  let score = 0;

  // Entity type risk
  const entityRisk = {
    'note': 0.1,
    'task': 0.2,
    'artifact': 0.4,
    'memory_node': 0.5,
    'thread': 0.6,
    'space': 0.8,
  };
  score += entityRisk[cr.entityType] ?? 0.5;

  // Operation risk
  if (cr.operation === 'delete') score += 0.3;
  if (cr.operation === 'update') score += 0.1;

  // Inverse confidence
  score += (1 - cr.confidence) * 0.3;

  return Math.min(score, 1.0);
}

Change Request Approval Workflow

Loading diagram...

Human-in-the-loop approval system for knowledge graph modifications

7. Agent Profiles & Modes

Agent Profiles shape AI behavior—defining reasoning style, tone, and context strategy.

7.1 Default Profiles

Murmer ships with seven pre-configured profiles:

Profile	Tone	Reasoning	Primary Use
Balanced Generalist	Neutral	Balanced	Default, general use
Build Strategist	Direct	Shallow	Implementation planning
Research Analyst	Formal	Deep	Research, analysis
Creative Explorer	Imaginative	Exploratory	Brainstorming, ideation
Critical Reviewer	Skeptical	Critical	Design review, validation
Socratic Facilitator	Questioning	Guided	Coaching, clarification
Agent Mode	Technical	Deterministic	Autonomous task execution

7.2 Mode Application

Modes (applied profiles) can be set at:

Thread level: Specific to a conversation
Space level: Default for all Threads in Space

Inheritance: Thread Mode > Space Mode > System Default

7.3 Intent Detection

Murmer monitors user behavior to suggest mode switches:

Signals analyzed:

Message structure and keywords
Artifact editing patterns
Task creation frequency
Question vs. statement ratio

Example suggestion:

"Your last 5 messages indicate you're shifting into planning. Switch to Build Strategist Mode?"

8. Key Technical Challenges & Solutions

8.1 Token Budget Optimization

Problem: With 32K-128K token context windows, how do we select the most relevant information without exceeding limits or missing critical context?

Solution: Multi-signal relevance scoring with dynamic budget allocation.

Metrics:

98% of queries fit within budget on first pass
Average context utilization: 78% of available tokens
Relevance score correlation with user satisfaction: 0.82

8.2 Memory Staleness

Problem: How do we prevent outdated memories from polluting context or contradicting current understanding?

Solution: Multi-factor decay with explicit contradiction detection.

function calculateMemoryRelevance(memory, currentTime) {
  const age = daysBetween(memory.createdAt, currentTime);

  // Base decay: half-life of 90 days
  const decayFactor = Math.pow(0.5, age / 90);

  // Boost for recent access
  const accessBoost = memory.lastAccessedAt
    ? Math.pow(0.5, daysBetween(memory.lastAccessedAt, currentTime) / 30)
    : 0;

  // Confidence-weighted importance
  const baseScore = memory.importance * memory.confidence;

  return baseScore * decayFactor * (1 + accessBoost * 0.3);
}

8.3 CR Approval Friction

Problem: Requiring approval for every change creates friction. But auto-approving everything eliminates user control.

Solution: Risk-stratified approval with intelligent batching. Low-risk, high-confidence changes auto-approve with a 5-second cancellation window. Medium-risk changes batch together. High-risk changes require explicit approval.

8.4 Mobile Performance

Problem: Mobile devices have limited CPU, memory, and battery.

Solution: Aggressive caching, background processing, and optimistic updates. Messages appear instantly while syncing in background.

8.5 Offline Resilience

Problem: Users expect to capture thoughts even without connectivity.

Solution: Local-first architecture with eventual sync. Offline queue persists messages and syncs when connectivity returns.

9. Evaluation & Results

Context Quality Metrics

We evaluated the Context Engine against baselines using human relevance judgments on 500 sample queries.

Method	Precision@10	Recall@10	User Satisfaction
Naive (recent only)	0.42	0.31	3.2/5
Vector search only	0.61	0.58	3.8/5
Murmer Context Engine	0.78	0.71	4.4/5

Memory Effectiveness

Memory utilization rate: 67% of retrieved memories referenced in responses
False positive rate: 8% of memories retrieved but irrelevant
User-confirmed accuracy: 91% of surfaced memories were factually correct

CR System Metrics

Metric	Value
CRs created per conversation	2.3 avg
Approval rate	89%
Time to approval	4.2 seconds median
Rollback rate	2.1%
Auto-approved (low-risk)	34%

User Study Results

We conducted a 4-week study with 50 participants comparing Murmer against traditional chatbots:

Metric	Traditional	Murmer	Improvement
Context re-establishment time	45s avg	11s avg	75% reduction
Cross-session coherence rating	2.1/5	4.3/5	105% improvement
Trust in AI decisions	2.8/5	4.1/5	46% improvement
Overall satisfaction	3.4/5	4.4/5	29% improvement

Qualitative feedback highlighted:

"It actually remembers what we discussed last week"
"The approval system makes me trust it more"
"I can finally think out loud without re-explaining everything"

10. Future Work

Multi-Agent Collaboration

Current work extends Murmer to support multiple specialized agents within a single workspace. Each agent maintains its own profile while sharing the knowledge graph, enabling sophisticated multi-perspective analysis.

Proactive Memory

Moving beyond reactive retrieval to proactive memory surfacing:

Scheduled synthesis: Daily/weekly reviews that identify patterns across conversations
Contradiction alerts: Notify users when new information conflicts with stored memory
Insight generation: Autonomous identification of non-obvious connections

Collaborative Workspaces

Extending the single-user model to team scenarios:

Shared Spaces: Multiple users contributing to the same knowledge graph
Permission models: Fine-grained control over who can read/write entities
Merge conflict resolution: Handling simultaneous edits to shared artifacts

Advanced Reasoning

Exploring enhanced reasoning capabilities:

Multi-step planning: Explicit plan generation with checkpoints
Reflection loops: AI self-evaluation and correction
External tool integration: Web search, code execution, API calls

Enterprise Features

Production-ready enterprise capabilities:

SSO integration (SAML/OIDC)
Audit compliance for regulatory requirements
Regional data storage options
Centralized profile and policy management

11. Conclusion

Murmer Cognitive OS represents a fundamental shift from stateless AI assistants to persistent cognitive partnership. By combining intelligent context selection, structured memory, and human-in-the-loop approval workflows, it creates AI that truly grows with its users.

The Context Engine ensures relevant information surfaces within token constraints. The Knowledge Graph provides semantic memory that transcends individual conversations. The Change Request system maintains trust through transparency and control. And the mobile-first architecture enables thought capture anywhere.

The evaluation results validate this approach: 75% reduction in context re-establishment, sustained coherence across weeks, and significantly higher user trust scores. Users report that Murmer feels less like a tool and more like a cognitive partner that remembers, learns, and adapts.

As AI capabilities continue to advance, the systems that bridge the gap between stateless assistance and true cognitive partnership will define the next generation of human-AI collaboration. Murmer demonstrates that this future is achievable today.