Tiered Inference Pipeline

Moderation that is fast, multilingual, and explains itself.

95% of traffic resolved in under 10ms by a cheap classifier. Ambiguous cases get escalated to a local LLM for nuanced judgment. Scores, reasoning, and model version on every call.

System Operational

HushSafe Console

REQS/S

847

P95

8ms

BLOCKED

2.4%

hate

0.12

harassment

0.78

sexual

0.03

violence

0.08

threat

0.44

Decision: BLOCKtier_reached: 2

<10ms

P95 Latency

Tier 1

Four Tiers. One API Call.

95% of traffic never touches the expensive model. Only genuinely ambiguous cases get escalated — giving you Hive-level accuracy at 1/10th the compute cost.

Tier 0

Deterministic Shortcut

<1ms · ~60% of traffic

Unicode normalization, leet-speak expansion, emoji mapping, hard blocklist, and response caching. Bypasses GPU entirely.

Tier 1

Fast Classifier

5–15ms · ~35% of traffic

multilingual-toxic-xlm-roberta, INT8 quantized on ONNX Runtime. 7 sigmoid heads — hate, harassment, sexual, self-harm, violence, spam, threat. 8 languages native.

Tier 2

Main Model

15–30ms · ~4% of traffic

Multilingual E5-small + multi-head classifier. Context-aware — handles "kill this level" vs "kill you".

Tier 3

LLM Judge

100–250ms · ~1% of traffic

Llama 3.1 8B INT4 via vLLM. Structured JSON output with grammar-constrained decoding. Handles conversation context for grooming detection.

Technical Differentiators

Built Different. Priced Fair.

Six capabilities that separate HushSafe from every incumbent — with transparent pricing on each.

Premium

Conversation Context

Send last N messages for grooming escalation and evolving harassment detection. Hive can't do this.

Growth+

Custom Vocabulary

Upload platform-specific phrases — gaming terms, clinical language — as embeddings injected into inference. 30-minute turnaround.

Scale

Per-Customer Fine-Tuning

Your feedback trains a LoRA adapter (~4MB) hot-swapped at inference. Your content, your calibration, same API.

All Plans

Explanation-Grade Output

Every decision includes machine-readable reason + plain-English explanation. Critical for support, appeals, and compliance.

Self-serve

Data-Residency Toggle

Single API flag: "region": "eu" | "us" | "in". Routes to that region's model. Data never leaves that geography.

Transparency

Open Eval Harness

Published accuracy benchmarks on public leaderboard. Monthly re-runs on golden set + HateCheck. Hive never publishes numbers.

POST /v1/moderate — Response

{
  "request_id": "req_01HX...",
  "decision": "BLOCK",
  "scores": {
    "hate": 0.12,
    "harassment": 0.78,
    "sexual": 0.03,
    "self_harm": 0.00,
    "violence": 0.08,
    "spam": 0.00,
    "threat": 0.44
  },
  "primary_category": "harassment",
  "confidence": 0.78,
  "tier_reached": 2,
  "model_version": "tier2-v4.2.1",
  "explanation": "High confidence for harassment category. Pattern consistent with targeted personal attack.",
  "latency_ms": 18,
  "billable": true
}

API Response

Every decision explains itself.

Explanation Field

Plain-English reasoning for every decision. Template-based at Tier 2, LLM-generated at Tier 3. Worth 2x the price for any team answering "why was my message blocked?" tickets.

Full Transparency

Scores for all 7 categories, the exact model version, tier reached, and latency — all in one response. Hive returns only numbers. You return numbers + reasoning.

View full API documentation

Pricing

Pay for what you use. Nothing more.

60% of traffic never touches GPU. That cost structure is your savings.

Starter

Free1K requests/day

Try the playground. No card required.

1,000 free requests per day
All 7 category scores
Explanation on every call
Community support
p95 < 50ms latency

Growth

$0.000425per request

Hive-level accuracy at ~1/10th the cost.

Unlimited requests
Conversation context mode
Custom vocabulary injection
Webhook integrations
Priority support
Data-residency toggle
99.9% uptime SLA

Scale

$299per month

Per-customer fine-tuning. Dedicated infra.

Everything in Growth
Per-customer LoRA fine-tuning
Dedicated model adapter
Multi-region routing
Open eval harness access
Dedicated account manager
Custom SLA

Design Partners

Be the first to moderate smarter.

Join 47 teams on the design partner waitlist. Early access, co-shaped pricing.

Gaming platform

12M messages/day, currently on Hive

Need: Cost reduction + subtle toxicity catch rate

Mental health app

Needs self-harm detection in 6 languages

Need: Multilingual + nuanced context understanding

B2B SaaS

EU data residency + compliance audit trail

Need: GDPR-compliant, per-request reasoning output

AccuracyCostSpeedCoverage

Why HushSafe

The architecture is the advantage.

10× cheaper than Hive

60% of traffic bypasses GPU entirely. 35% hits the cheapest model. Your cost floor is $0.00004 average per request.

Multilingual by default

Multilingual E5-small handles 8+ languages natively. No per-language model required — one pipeline for global content.

Explains every decision

Scores, reasoning, model version, and latency on every call. Your support team and compliance auditors will thank you.

Improves with your feedback

Customer feedback loop auto-trains LoRA adapters weekly. Your data, your calibration — deployed via canary rollout.

Get Started

Start moderating in 5 minutes.

One API endpoint. Seven category scores. Plain-English explanations. Free to start.

No credit card required · 1K free requests/day