Dimension collapse without semantic loss. Cross-lingual re-ranking. Active rejection of garbage. The structural flaw in vector search, exploited.
Enterfrom arbiter_engine import rank
r = rank(
"selective COX-2 inhibition without GI toxicity",
["sulfonamide at C-5", "methyl sulfone", "carboxylic acid"]
)
print(r.top.text) # sulfonamide at C-5
print(r.top.score) # 0.659
# Iterate all results
for text, score in r:
print(f"{score:.3f} {text}")
Vector search has a structural flaw. High dimensions cost too much. Low dimensions lose semantics.
ARBITER exploits the gap. 72 dimensions. Cross-lingual intact. Polysemy resolved.
This is the product.
Standard embeddings: 768-1536 dimensions, 400MB-2GB models, cloud-only. ARBITER: 72 dimensions, 26MB learned weights (3.8GB full deploy), runs anywhere. The impossible part? Cross-lingual and polysemy disambiguation survive the collapse.
Retrieval gets 1000 candidates fast. Re-ranking picks the best 10. The problem: re-ranking is expensive. The solution: 72-dim vectors that preserve semantic quality.
The 97% Precision@3 means enterprises don't choose between speed (re-rank 100) and quality (re-rank 1000). They get both.
Same word appears everywhere. Only one answer is right. ARBITER knows. Negative scores mean active rejection—not just ranked low, but pushed out.
You have a thought. You translate it to keywords. You get results. You translate back to meaning. You refine. Translate again. ARBITER removes the translation step.
The doctor's query gets the doctor's answer.
Same candidates. Same model. Same cost.
Re-ranking is disambiguation.
Disambiguation works everywhere.
From the battlefield to the boardroom.
The same primitive that fixes your vector search also does this—
Zero pharmaceutical training. Sub-second response. The modification that became a three billion dollar drug.
OMIN system. Multi-layer air defense. Sub-second allocation recommendations across interceptor types and threat priorities.
Same word. Completely different threat level. Full context in the query. No security-specific training. 20/20 validated.
Single CJK character. No context. No translation API. Semantic geometry doesn't care what script you write in.
Same candidates. Different context. Complete ranking reversal.
No signup. No API key. Just paste and run.
11 query categories. 110 candidates. The one failure: "Java island garbage collection" scored 0.800—a maliciously constructed keyword trap. The 97% is the headline.
Hard Negative = lowest irrelevant score. Negative scores = active rejection. All tests reproducible via public API.
Your vector database costs scale with dimensions. ARBITER cuts 286GB to 26.8GB while maintaining 97% precision. Same queries. Better results.
Slots between retrieval and generation. Three lines of code. No infrastructure changes.
For funded startups. Enterprise pricing after $1M ARR.
Apply for Startup LicenseStarting at $500,000 annually. Self‑hosted. Air‑gapped. 3.8GB footprint runs on your infrastructure—data center, edge, or secure facility.
Vector DB customers: 90% storage reduction, guaranteed by 72-dim fidelity.
Search engines: Billions in compute savings via cheaper, faster re-ranking.
Defense/Edge: Multi-sensor fusion. Entity disambiguation under uncertainty. C2 decision support. Air-gapped deployment.
Currently under evaluation for defense and life sciences applications.
Contact for Strategic Briefing