How Ormah Works
Search and Ranking
Content verified · 2026-04-07
Ormah search is a hybrid pipeline that combines:
- FTS5 keyword retrieval
- vector similarity retrieval
- Reciprocal Rank Fusion
- post-retrieval score shaping
- optional graph-based spreading activation
Main Search Path
Code: src/ormah/embeddings/hybrid_search.py, src/ormah/engine/memory_engine.py
flowchart TB
QUERY[query] --> FTS[FTS5 retrieval]
QUERY --> VEC[vector retrieval]
FTS --> RRF[weighted RRF]
VEC --> RRF
RRF --> BLEND[blend RRF with raw similarity]
BLEND --> TITLE[title match boost]
TITLE --> CONF[confidence factor]
CONF --> BOOSTS[tier factor + recency + access boosts]
BOOSTS --> SPACE[space scoring]
SPACE --> ACT[optional spreading activation]
ACT --> FORMAT[formatted or structured results]
Candidate Pool Size
Ormah does not retrieve only the final limit immediately. It first gathers a larger pool of possible matches, then filters, reranks, and trims that pool down to the final result set.
By default, search retrieves up to 3 x limit candidates from each retrieval path.
When temporal filters like created_after or created_before are present, search widens that pool to 10 x limit. This gives the post-filter enough recent candidates to work with after older matches are removed.
Example: if limit=10, a normal query considers up to 30 initial matches from FTS and vector search, while a temporal query considers up to 100.
This widening is tied to temporal filtering, not to question detection.
FTS + Vector Retrieval
FTS
FTS search uses sanitized token queries and can inject about_self when identity-style tokens are present in the query.
Vector
Vector search encodes the query, retrieves nearest neighbors, and drops candidates below similarity_threshold.
Question Queries
Question-like queries still get special weighting:
- FTS weight scaled by
question_fts_weight_scale - vector weight scaled by
question_vector_weight_scale - similarity blend weight increased via
question_similarity_blend_weight - title match boost disabled for question queries
But the candidate-pool multiplier stays tied to temporal filters, not to question mode.
Blend and Score Shaping
1. RRF + raw similarity blend
Ormah first normalizes the fused RRF score, then blends it with raw vector similarity. This matters because RRF preserves ranking agreement between retrievers, but discards score magnitude.
For nodes that have vector similarity, the score becomes:
final_score = (1 - similarity_blend_weight) * normalized_rrf + similarity_blend_weight * raw_sim
Before raw vector similarity is blended back in, Ormah applies a long-document penalty:
- it looks up
length(content)for candidate nodes - if
content_len > length_penalty_threshold, it scales raw similarity by:
penalty = max(0.1, length_penalty_threshold / content_len)
raw_sim *= penalty
Current default:
length_penalty_threshold = 300
Why this exists:
- long documents often get middling similarity to many different queries because their embeddings average over multiple topics
- without this penalty, broad architecture docs can outrank short, specific memories too easily
This penalty affects the raw vector similarity contribution, not BM25 / FTS ranking directly.
If a result is FTS-only and has no vector similarity, Ormah does not blend. It dampens the RRF score instead.
2. Title boost
Title overlap can increase score for non-question queries.
3. Confidence factor
Current multiplicative enrichment includes:
confidence_factor = 0.4 + 0.6 * confidence
adjusted_score = base_score * confidence_factor
There is no separate multiplicative importance_factor in the current hybrid search implementation.
4. Tier, recency, and access
Current behavior:
- tier boost is implemented as a multiplicative factor on the adjusted score
- recency is an additive proportional bonus
- access is an additive proportional bonus using
log1p(count) / log1p(20)
That means older docs describing tier as purely additive and access normalized by 50 are stale.
Space Scoring
After hybrid search, MemoryEngine._apply_space_scores() rescales results:
- same project space: full score
- global (
space is None):space_boost_global - other space:
space_boost_other
Current defaults:
space_boost_global = 1.0space_boost_other = 0.6
Spreading Activation
Code: src/ormah/engine/memory_engine.py:_spread_activation()
Search results can be enriched by traversing graph edges outward from the top seed hits.
Important implementation details:
- top
activation_seed_counthits are used as seeds - up to
activation_max_per_seedneighbors are added per seed - base activation uses
seed_score * edge_weight * edge_type_factor * activation_decay
Result labels
Activated results are labeled:
source="activated"for normal edge traversalsource="conflict"when reached via acontradictsedge
Older docs that describe these as source="graph" are inaccurate.
Edge-Type Factors
Current edge-type factors include:
supports = 1.0related_to = 0.7contradicts = 0.4
Note that the 0.4 here is an activation factor used during search enrichment. It is not the stored edge weight written by the conflict detector.
Search Example
Prompt: what database does ormah use?
- question mode is detected
- FTS finds nodes mentioning database choices
- vector search finds semantically similar architecture notes
- RRF merges the ranked lists
- raw similarity is blended back in
- title match boost is disabled because this is a question
- confidence, tier, recency, and access adjust scores
- current project and global memories are favored over other spaces
- spreading activation may add directly connected supporting nodes
Mental model: search is not "FTS then vector then graph". It is a fused ranking pipeline with graph enrichment added on top.
Code Anchors
src/ormah/embeddings/hybrid_search.pysrc/ormah/index/graph.pysrc/ormah/engine/memory_engine.py