Personalized Real Estate Buying Experience: An AI-Powered Matching System

Abstract

The traditional real estate search experience—filtering by price, bedrooms, and location—fails to capture what actually makes a home feel right. Buyers struggle to articulate preferences beyond surface-level criteria, while listings remain opaque collections of photos and bullet points that don't answer the questions that matter most.

This whitepaper presents an AI-powered real estate matching system that reimagines property discovery through conversational AI, multi-modal semantic search, and continuous taste learning. The system replaces rigid filter-and-browse with natural language understanding, learns buyer preferences from both explicit feedback and implicit behavior, and delivers personalized recommendations that improve with every interaction.

1. Executive Summary

1.1 The Problem

Home buying is broken at the discovery layer. Current platforms force buyers into a filter-first paradigm that:

Reduces homes to checkboxes: Bedrooms, bathrooms, price—missing the nuances that define livability
Ignores contextual needs: "Near good coffee shops" or "quiet streets for evening walks" have no filter
Fails to learn: Swiping left on 50 homes teaches the system nothing about why
Treats all buyers identically: A remote worker and a young family see the same listings

1.2 Our Solution

This system creates a buying experience that mirrors working with a knowledgeable human agent who:

Understands natural language: "I want a mid-century modern home with natural light and space for a home office"
Learns your taste: Every interaction refines the model of what you're looking for
Sees beyond the listing: Multi-modal understanding of images, descriptions, and location context
Proactively matches: New listings are scored against your learned preferences automatically

1.3 Key Innovations

Conversational Search Agent: Natural language interface built on Gemini 2.0 Flash with structured tool calling for search execution
Multi-Modal Embeddings: Four distinct embedding spaces capturing description, amenity, location, and visual characteristics
Hybrid Retrieval: Elasticsearch combining BM25 lexical search with kNN vector similarity
Taste Learning Engine: Continuous preference modeling from explicit ratings, implicit behavior, and conversational cues
Mastra.ai Orchestration: Agent workflow framework managing tool execution, memory, and state

1.4 Results

3.2x improvement in time-to-relevant-listing vs. traditional filter search
78% of users found their eventual choice within the first 10 recommendations
Semantic understanding correctly interprets 89% of natural language property queries
Taste model convergence within 5-7 interactions for most users

2. Motivation & Problem Definition

2.1 The Filter Paradigm Failure

Every major real estate platform—Zillow, Redfin, Realtor.com—operates on the same fundamental model: expose a set of structured filters, let users narrow down, present paginated results. This approach made sense when listings were sparse and search technology limited. It fails in the modern context for several reasons:

Filters can't capture preference nuance. A buyer might want "natural light" but there's no filter for that. They want "a neighborhood that feels walkable" but walkability scores are crude proxies. They want "modern but warm, not sterile"—no filter exists for aesthetic temperature.

Users don't know their filters upfront. Preferences emerge through exposure. A buyer thinks they need 4 bedrooms until they see a brilliantly designed 3-bedroom. They think they want new construction until they fall for a renovated craftsman. Filters lock in assumptions prematurely.

Filter combinations explode to nothing. Stack enough filters and you get zero results. Users then start removing constraints, losing track of what matters most. The system offers no guidance on which filters to relax.

2.2 The Information Asymmetry

Listings are optimized for legal compliance and broad appeal, not for answering buyer questions:

What Buyers Want to Know	What Listings Say
Will this home work for remote work?	"4 bed / 3 bath"
Is the kitchen actually functional?	"Updated kitchen with granite counters"
What's the neighborhood like at night?	"Great location!"
Will my furniture fit?	"Spacious living room"
Is this a good investment long-term?	"Motivated seller!"

This gap forces buyers to visit properties in person to answer basic questions that could be resolved with better information architecture.

2.3 The Learning Gap

Current platforms waste enormous signal. Every swipe, every lingered-on photo, every discarded listing contains information about preference. Yet:

Explicit feedback is rarely captured
Implicit behavior (time on listing, photo sequence) is ignored
Cross-session learning is minimal or nonexistent
Taste evolution over time isn't modeled

The result: a user who has viewed 200 listings gets the same experience as a new visitor.

2.4 Design Requirements

Requirement	Description
Natural Language Understanding	Process complex, unstructured queries
Multi-Modal Matching	Match across text, images, and location
Continuous Learning	Improve with every interaction
Explainable Recommendations	Users understand why a home was suggested
Real-Time Performance	Sub-second response times for search
Scale to Millions	Handle full MLS inventory efficiently

3. System Overview

3.1 High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      CLIENT LAYER                                │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                 Conversational UI                         │   │
│  │        (Chat Interface + Property Cards)                  │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    AGENT LAYER (Mastra.ai)                       │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐       │
│  │    Search     │  │    Taste      │  │   Listing     │       │
│  │    Agent      │  │   Learning    │  │   Analysis    │       │
│  │  (Gemini 2.0) │  │    Engine     │  │    Agent      │       │
│  └───────────────┘  └───────────────┘  └───────────────┘       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    RETRIEVAL LAYER                               │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Elasticsearch Hybrid Search                   │   │
│  │         (BM25 + kNN Vector Similarity)                    │   │
│  └─────────────────────────────────────────────────────────┘   │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐       │
│  │  Description  │  │   Amenity     │  │   Location    │       │
│  │   Embeddings  │  │  Embeddings   │  │  Embeddings   │       │
│  └───────────────┘  └───────────────┘  └───────────────┘       │
│  ┌───────────────┐  ┌───────────────┐                           │
│  │    Image      │  │    User       │                           │
│  │   Embeddings  │  │  Preference   │                           │
│  └───────────────┘  └───────────────┘                           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      DATA LAYER                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │   Listing   │  │    User     │  │  Interaction │              │
│  │   Database  │  │  Profiles   │  │    Events    │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└─────────────────────────────────────────────────────────────────┘

3.2 Core Components

Search Agent: The conversational interface powered by Gemini 2.0 Flash. Interprets natural language queries, manages multi-turn dialogue, and orchestrates tool calls for search execution.

Taste Learning Engine: Builds and maintains preference vectors from user feedback. Combines explicit ratings (likes, dislikes) with implicit signals (view duration, photo engagement, return visits).

Listing Analysis Agent: Enriches raw listing data with semantic annotations. Extracts style, condition, layout quality, and neighborhood characteristics from photos and descriptions.

Hybrid Retrieval: Elasticsearch cluster combining traditional BM25 scoring with dense vector similarity across multiple embedding spaces.

3.3 Workflow Overview

User initiates search via natural language: "Show me modern homes with good natural light under $800K"
Search Agent parses intent and extracts structured criteria + semantic preferences
Query embedding generated for semantic matching
Hybrid retrieval executes against Elasticsearch with combined scoring
User preference vector applied to re-rank results for personalization
Results presented with explanations: "Matched because: Modern aesthetic (92%), Natural light (87%), Under budget"
User feedback captured to update taste model

System Architecture Overview

Loading diagram...

End-to-end architecture from property ingestion to personalized recommendations

4. Multi-Modal Embedding Architecture

The system uses four distinct embedding spaces, each capturing different aspects of property matching:

4.1 Description Embeddings

Generated from listing descriptions using a fine-tuned sentence transformer. Captures:

Architectural style and aesthetic language
Condition and update status
Lifestyle fit (family-friendly, entertainer's dream, etc.)
Unique selling points and differentiators

// Example embedding generation
const descriptionEmbedding = await embeddingModel.encode({
  text: listing.description,
  model: 'text-embedding-3-large',
  dimensions: 1024
});

4.2 Amenity Embeddings

Structured feature encoding that goes beyond binary presence/absence:

// Amenity encoding captures quality and context
{
  "pool": { "present": true, "type": "in-ground", "condition": "updated" },
  "kitchen": { "style": "modern", "appliances": "high-end", "layout": "open" },
  "garage": { "spaces": 2, "type": "attached", "features": ["ev-charger"] }
}

The amenity embedding space allows queries like "good kitchen for serious cooking" to match listings with professional-grade appliances and functional layouts.

4.3 Location Embeddings

Captures neighborhood characteristics beyond lat/lng:

Walkability context: Nearby amenities, coffee shops, restaurants, parks
School quality: Ratings, distance, specialized programs
Commute patterns: Transit access, typical commute times to business districts
Neighborhood character: Quiet residential, urban vibrant, suburban family

// Location feature extraction
const locationFeatures = await enrichLocation({
  coordinates: listing.coordinates,
  sources: ['yelp', 'walkscore', 'census', 'transit']
});

const locationEmbedding = encodeLocationFeatures(locationFeatures);

4.4 Image Embeddings

Visual understanding using CLIP-based models to capture:

Architectural style (modern, traditional, mid-century, craftsman)
Interior design aesthetic (minimalist, cozy, luxurious, dated)
Light quality and spaciousness
Condition and maintenance level
View quality and outdoor space

// Multi-image embedding aggregation
const imageEmbeddings = await Promise.all(
  listing.photos.map(photo => clipModel.encode(photo))
);

// Weighted aggregation favoring hero images
const visualEmbedding = aggregateImageEmbeddings(imageEmbeddings, {
  heroWeight: 2.0,
  kitchenWeight: 1.5,
  exteriorWeight: 1.3
});

4.5 Embedding Fusion

Final listing representation combines all four spaces with learned weights:

listing_vector = (
  α × description_embedding +
  β × amenity_embedding +
  γ × location_embedding +
  δ × image_embedding
)

// Where weights are user-specific based on stated priorities
// e.g., visual-first buyers have higher δ

Multi-Modal Embedding Architecture

Loading diagram...

Four specialized embedding types capture different aspects of property semantics

5. Hybrid Retrieval System

5.1 Why Hybrid?

Pure vector search excels at semantic similarity but fails on exact matches. Pure lexical search handles keywords but misses conceptual relevance. Real estate queries demand both:

Query Type	Best Approach
"123 Main Street"	Lexical (exact match)
"Modern homes with natural light"	Vector (semantic)
"3 bed craftsman in Wallingford"	Hybrid (both)

5.2 Elasticsearch Configuration

The index schema supports both dense vectors and traditional text fields:

{
  "mappings": {
    "properties": {
      "description": { "type": "text", "analyzer": "english" },
      "address": { "type": "text", "analyzer": "standard" },
      "price": { "type": "long" },
      "bedrooms": { "type": "integer" },
      "description_vector": {
        "type": "dense_vector",
        "dims": 1024,
        "index": true,
        "similarity": "cosine"
      },
      "amenity_vector": {
        "type": "dense_vector",
        "dims": 512,
        "index": true,
        "similarity": "cosine"
      },
      "location_vector": {
        "type": "dense_vector",
        "dims": 256,
        "index": true,
        "similarity": "cosine"
      },
      "image_vector": {
        "type": "dense_vector",
        "dims": 768,
        "index": true,
        "similarity": "cosine"
      }
    }
  }
}

5.3 Query Construction

Queries combine boolean filters, BM25 scoring, and kNN vector search:

{
  "query": {
    "bool": {
      "filter": [
        { "range": { "price": { "lte": 800000 } } },
        { "range": { "bedrooms": { "gte": 2 } } }
      ],
      "should": [
        {
          "match": {
            "description": {
              "query": "modern natural light",
              "boost": 1.0
            }
          }
        }
      ]
    }
  },
  "knn": [
    {
      "field": "description_vector",
      "query_vector": [0.12, -0.34, ...],
      "k": 50,
      "num_candidates": 200,
      "boost": 2.0
    },
    {
      "field": "image_vector",
      "query_vector": [0.56, 0.12, ...],
      "k": 50,
      "num_candidates": 200,
      "boost": 1.5
    }
  ]
}

5.4 Score Fusion

BM25 and kNN scores are normalized and combined using Reciprocal Rank Fusion (RRF):

function reciprocalRankFusion(rankings: RankedList[], k: number = 60): ScoredResult[] {
  const scores = new Map();

  for (const ranking of rankings) {
    for (let i = 0; i < ranking.results.length; i++) {
      const docId = ranking.results[i].id;
      const rrfScore = 1 / (k + i + 1);
      scores.set(docId, (scores.get(docId) || 0) + rrfScore);
    }
  }

  return Array.from(scores.entries())
    .sort((a, b) => b[1] - a[1])
    .map(([id, score]) => ({ id, score }));
}

Hybrid Retrieval with RRF

Loading diagram...

Combining BM25 keyword search with vector similarity using Reciprocal Rank Fusion

6. Taste Learning Engine

6.1 Signal Collection

The taste model ingests multiple feedback channels:

Signal Type	Weight	Example
Explicit Positive	1.0	User clicks "Love it" or saves listing
Explicit Negative	-0.8	User clicks "Not for me"
Extended View	0.3	User spends 30+ seconds on listing
Photo Deep-Dive	0.4	User views 5+ photos
Return Visit	0.6	User returns to same listing
Quick Dismiss	-0.2	User views <3 seconds
Conversational Cue	0.5	"I love this style" in chat

6.2 Preference Vector Update

User preference is maintained as a weighted vector in the same embedding space as listings:

function updatePreferenceVector(
  currentPreference: number[],
  listingVector: number[],
  signal: FeedbackSignal
): number[] {
  const weight = signalWeights[signal.type];
  const learningRate = 0.1;
  const decayFactor = 0.95; // Slight decay to allow preference evolution

  return currentPreference.map((val, i) => {
    const delta = (listingVector[i] - val) * weight * learningRate;
    return val * decayFactor + delta;
  });
}

6.3 Preference Dimensions

Rather than a single preference vector, we maintain separate preference dimensions:

Style preference: Modern vs. traditional, minimalist vs. ornate
Space preference: Open floor plan vs. defined rooms, indoor vs. outdoor focus
Location preference: Urban vs. suburban, walkable vs. car-dependent
Condition preference: Move-in ready vs. fixer potential
Value preference: Premium finishes vs. good bones

6.4 Personalized Re-Ranking

After hybrid retrieval, results are re-ranked by preference alignment:

function personalizedRerank(
  results: SearchResult[],
  userPreference: UserPreference
): SearchResult[] {
  return results
    .map(result => ({
      ...result,
      personalizedScore: (
        result.retrievalScore * 0.6 +
        cosineSimilarity(result.vector, userPreference.vector) * 0.4
      )
    }))
    .sort((a, b) => b.personalizedScore - a.personalizedScore);
}

6.5 Cold Start Handling

For new users, we employ several strategies:

Onboarding questions: Brief preference survey during signup
Popularity fallback: New users see generally well-liked listings
Explicit first feedback: Prompt for reaction on first 3 listings
Similar user bootstrapping: Initialize from users with similar stated preferences

Taste Learning Engine

Loading diagram...

Continuous learning from explicit and implicit user signals

7. Conversational Search Agent

7.1 Agent Architecture

The search agent is built on Mastra.ai's agent framework with Gemini 2.0 Flash as the reasoning engine:

const searchAgent = new Agent({
  name: 'PropertySearchAgent',
  model: google('gemini-2.0-flash'),
  instructions: `You are a knowledgeable real estate search assistant.
    Help users find their perfect home by understanding their needs,
    asking clarifying questions, and presenting relevant properties.

    When searching, extract both structured criteria (price, beds, location)
    and semantic preferences (style, feel, lifestyle fit).

    Explain why each property matches the user's needs.`,
  tools: {
    searchProperties,
    getListingDetails,
    saveToFavorites,
    updatePreferences,
    getNeighborhoodInfo
  }
});

7.2 Intent Classification

User messages are classified into intent categories:

Intent	Example	Action
Search	"Show me modern homes in Capitol Hill"	Execute hybrid search
Refine	"Actually, make that under $700K"	Modify current search
Clarify	"What's the neighborhood like?"	Provide context
Compare	"How does this compare to the last one?"	Side-by-side analysis
Feedback	"I love this style but need more space"	Update preferences + refine

7.3 Tool Calling

The agent uses structured tool calls for search execution:

const searchProperties = createTool({
  id: 'search_properties',
  description: 'Search for properties matching criteria',
  inputSchema: z.object({
    query: z.string().describe('Natural language search query'),
    filters: z.object({
      minPrice: z.number().optional(),
      maxPrice: z.number().optional(),
      minBeds: z.number().optional(),
      maxBeds: z.number().optional(),
      propertyTypes: z.array(z.string()).optional(),
      neighborhoods: z.array(z.string()).optional(),
    }).optional(),
    semanticPreferences: z.array(z.string()).optional(),
    limit: z.number().default(10)
  }),
  execute: async ({ query, filters, semanticPreferences, limit }) => {
    const queryEmbedding = await generateQueryEmbedding(query);
    const results = await hybridSearch({
      embedding: queryEmbedding,
      filters,
      semanticBoosts: semanticPreferences,
      limit
    });
    return formatResultsForAgent(results);
  }
});

7.4 Conversation Memory

The agent maintains session context for multi-turn refinement:

interface SearchSession {
  currentCriteria: SearchCriteria;
  viewedListings: string[];
  feedbackHistory: FeedbackEvent[];
  conversationSummary: string;
  lastSearchResults: SearchResult[];
}

This allows natural refinement: "Show me more like the second one, but with a bigger yard."

7.5 Explanation Generation

Each result includes a personalized explanation:

// Example explanation
{
  "listingId": "12345",
  "matchScore": 0.89,
  "explanation": {
    "summary": "Strong match for your modern aesthetic preference",
    "matchReasons": [
      { "factor": "Architectural style", "score": 0.94, "detail": "Clean lines and open floor plan match your stated preference" },
      { "factor": "Natural light", "score": 0.88, "detail": "South-facing windows and skylights" },
      { "factor": "Location", "score": 0.82, "detail": "Walkable to coffee shops you'd like" }
    ],
    "considerations": [
      "Smaller yard than your typical preference",
      "Street parking only"
    ]
  }
}

Conversational Search Agent

Loading diagram...

Mastra.ai agent architecture with Gemini 2.0 Flash

8. Key Technical Challenges & Solutions

8.1 Embedding Quality for Real Estate

Problem: Generic embedding models don't capture real estate-specific semantics. "Updated kitchen" and "renovated kitchen" should be near-synonyms; "cozy" might mean "small."

Solution: Domain-specific fine-tuning using contrastive learning on listing pairs. We collected 50K listing pairs with known similarity relationships and fine-tuned the base embedding model.

// Contrastive pairs examples
{ anchor: "Modern farmhouse with shiplap walls",
  positive: "Contemporary country home with wood paneling",
  negative: "Traditional colonial with formal dining" }

{ anchor: "Chef's kitchen with Viking range",
  positive: "Gourmet kitchen with professional appliances",
  negative: "Galley kitchen with basic appliances" }

8.2 Image Understanding at Scale

Problem: Processing millions of listing photos with CLIP-style models is computationally expensive.

Solution: Tiered processing pipeline:

Tier 1: Fast classification (exterior/interior/kitchen/bathroom) for all images
Tier 2: Full embedding for hero images only (first 5 photos)
Tier 3: On-demand deep analysis when user requests detail

8.3 Preference Drift

Problem: User preferences change during the search process. Early feedback may not reflect evolved taste.

Solution: Time-weighted preference updates with explicit phase detection:

function getTimeWeight(eventAge: Duration): number {
  const hoursSinceEvent = eventAge.toHours();

  // Recent events weighted much higher
  if (hoursSinceEvent < 24) return 1.0;
  if (hoursSinceEvent < 72) return 0.8;
  if (hoursSinceEvent < 168) return 0.5;
  return 0.3;
}

8.4 Balancing Exploration vs. Exploitation

Problem: Pure preference matching creates filter bubbles. Users miss potentially great options outside their stated preferences.

Solution: Controlled exploration injection:

10-15% of results are "stretch" recommendations outside typical matches
Stretch results are explicitly labeled: "Outside your usual preferences, but..."
Positive feedback on stretch results significantly updates preference model

8.5 Real-Time Inventory Updates

Problem: MLS data updates frequently. Listings go pending, prices change, new properties hit market.

Solution: Streaming ingestion with embedding queue:

// New listing pipeline
mlsStream
  .filter(event => event.type === 'NEW_LISTING')
  .map(event => enrichListingData(event.listing))
  .map(enriched => generateAllEmbeddings(enriched))
  .forEach(indexed => notifyMatchingUsers(indexed));

Users with matching preferences receive proactive notifications for new listings that score above threshold.

9. Evaluation & Results

9.1 Search Quality Metrics

Evaluated against traditional filter-based search on 1,000 user search sessions:

Metric	Filter Search	AI Search	Improvement
Time to first relevant result	4.2 min	1.3 min	3.2x faster
Listings viewed before shortlist	47	12	74% reduction
User-rated relevance (1-5)	3.1	4.3	39% higher
Search refinement iterations	6.8	2.4	65% reduction

9.2 Semantic Understanding Accuracy

Tested on 500 natural language queries with human-labeled intent:

Query Type	Accuracy
Style/aesthetic preferences	91%
Lifestyle requirements	87%
Location/neighborhood	94%
Complex multi-factor	82%
Overall	89%

9.3 Taste Learning Convergence

Measured how quickly the preference model aligns with user's true preferences:

After 3 interactions: 62% alignment with eventual preferences
After 5 interactions: 78% alignment
After 10 interactions: 91% alignment

Most users reach stable preference models within 5-7 interactions.

9.4 User Satisfaction

Post-session survey results (n=200):

Question	Score (1-5)
"The system understood what I was looking for"	4.4
"Recommendations improved over time"	4.2
"I found properties I wouldn't have found with filters"	4.6
"I would use this over traditional search"	4.5

9.5 Qualitative Feedback

"It actually understood 'modern but warm'—that's never worked before"
"I didn't know I wanted a courtyard until it showed me one"
"The explanations helped me understand my own preferences better"
"Finally, a search that learns instead of making me start over"

10. Conclusion

This system demonstrates that the future of real estate search lies not in more filters, but in deeper understanding. By combining conversational AI, multi-modal embeddings, hybrid retrieval, and continuous taste learning, we've created a property discovery experience that mirrors the intuition of a skilled human agent.

The results validate the approach: 3.2x faster time-to-relevance, 74% fewer listings viewed before shortlisting, and user satisfaction scores significantly higher than traditional filter-based search. More importantly, users report discovering properties they never would have found through conventional means.

Key technical contributions include:

A multi-modal embedding architecture that captures the full dimensionality of what makes a home desirable
Hybrid retrieval combining the precision of structured queries with the nuance of semantic search
A taste learning engine that builds accurate preference models from minimal explicit feedback
Explainable recommendations that help users understand—and refine—their own preferences

The real estate industry has long been ripe for AI transformation. This system represents a meaningful step toward that future: technology that doesn't just process listings faster, but fundamentally understands what home buyers are looking for—even when they can't fully articulate it themselves.