Hatch Scoring

How the six signals work, how they're weighted, and why preliminary scores get re-weighted.

The six signals

Signal	Weight	Source	Status
Meme	25%	Anthropic Claude — tool-use on pitch + ticker	✅ live
Creator	20%	Bitquery — wallet age, tx count, rug history	⛔ stub (key pending)
Image	15%	Anthropic Claude Vision — 5-band rubric	✅ live
Name	10%	Deterministic — memorability, phonetics	✅ live
Social	15%	X handle lookup + heuristics	✅ live (lightweight)
Risk	15%	GoPlus — honeypot, tax, blacklist, owner rights	⛔ stub (key pending)

Weights sum to 100%. Changing them bumps the prompt version and breaks comparability with historical scores.

The aggregate

When all six are live

aggregate = Σ (signal_score × weight)

When one or more are stubbed (preliminary)

Re-weighted over live signals only:

live_weight_total = Σ weight for signals where stub = false
aggregate = Σ (live_signal_score × weight / live_weight_total)

This is the honest-aggregate rule. A 50%-real number shared at 100% confidence is worse than no number.

Bands

Band	Aggregate	Meaning
🟢 Green	≥ 70	Strong. Multiple signals green, no red flags.
🟡 Amber	45–69	Mixed. Iterate on the weakest signal.
🔴 Red	< 45	Weak. Likely to stall or rug without intervention.

Bands map directly to seed-LP tier decisions (Sprint E.1) — green tokens get the most seed, red get the least (or none).

Preliminary flag

confidence: 'preliminary' when any signal has stub: true.

What a preliminary flag blocks

On-chain attestation (publisher refuses regardless of env).
Leaderboard inclusion (/leaderboards/today).
Percentile denominator (preliminary rows don't count in the cohort).
Public creator feed top stats.

What it does NOT block

Sharing the score URL — the "Preliminary" badge travels with the OG image.
Enrollment + scheduling (creator can still sign a commitment).
Re-scoring once keys land.

Why re-score is a new UUID

The re-score button replays the stored submission and returns a new UUID. This is deliberate:

Historical share URLs keep pointing at the original result.
Old OG images stay cacheable.
The re-scored row enters the percentile denominator on its own merits.

The scoring request

POST /v1/score
Content-Type: application/json

{
  "name": "Yolk",
  "symbol": "YOLK",
  "description": "Breakfast token. Unserious about price, serious about eggs.",
  "imageUrl": "https://cdn.fourmeme.com/yolk.png",
  "xHandle": "@yolktoken",
  "creatorAddress": "0x1234..."
}

Response shape

{
  "id": "a1b2c3d4-...",
  "aggregate": 67,
  "band": "amber",
  "hasStubs": true,
  "confidence": "preliminary",
  "stubbedSignals": ["creator", "risk"],
  "signals": {
    "meme": { "score": 78, "reason": "Original food-meme hook.", "stub": false },
    "creator": { "score": 50, "reason": "Stub — awaiting Bitquery key.", "stub": true },
    "image": { "score": 82, "reason": "Bright yolk on clean background.", "stub": false },
    "name": { "score": 75, "reason": "Short, phonetic, easy to type.", "stub": false },
    "social": { "score": 55, "reason": "New handle, low follower count.", "stub": false },
    "risk": { "score": 60, "reason": "Stub — awaiting GoPlus key.", "stub": true }
  },
  "explanation": {
    "summary": "Strong meme + image carry the aggregate; social is the weakest live signal.",
    "contributions": [
      { "signal": "meme", "weight": 0.25, "score": 78, "contribution": 19.5 },
      ...
    ]
  },
  "promptVersion": "meme@1.0.0",
  "createdAt": "2026-04-18T12:34:56Z"
}

See SDK types for the exhaustive schema.

How each signal works

Meme (Claude tool-use)

Prompt at apps/api/src/modules/scoring/prompts/meme-v1.0.0.ts.

Asks Claude to emit emit_meme_score(score, reason, confidence) given the pitch + ticker. Rubric: 90 for genuinely fresh memes; 50 for copycats; <30 for pure bot-bait.

Creator (Bitquery, pending)

Will query the wallet's BNB history: first tx age, total tx count, tokens launched, rug incidents. Until the key lands, returns a 50 stub with a clear reason string.

Image (Claude Vision)

SSRF-hardened fetcher pulls the image — https-only, DNS private-IP blocklist, 5 MiB cap, 10s timeout, redirect: error. See ADR 0005.

Prompt asks Claude Vision to score on visibility, readability, brand coherence. 5-band rubric from "unreadable" (20) to "exceptional" (90+).

Name (deterministic)

Rules:

Length penalty above 12 chars.
Alphabetic ratio, hyphen/underscore friendliness.
Consonant cluster penalty (anti-tonguetwister).
Capitalization consistency.

Pure functions in apps/api/src/modules/scoring/signals/name.ts.

Social (deterministic + X lookup)

Handle shape + count of recent mentions. Lightweight — doesn't hit X API. Promoted to a full enrichment signal in a future sprint.

Risk (GoPlus, pending)

Will query the contract ABI + GoPlus' honeypot/tax/blacklist endpoint. Red-band risk (<45) will gate attestation regardless of aggregate.

Prompt versioning

Prompts live at apps/api/src/modules/scoring/prompts/<name>-<semver>.ts and are registered in prompts/registry.ts. Each row records the promptVersion used so we can replay, A/B, and maintain comparability.

Bumping a prompt version (e.g., meme@1.1.0) is a judgment call:

Patch (1.0.1) — wording tweaks that don't change the rubric.
Minor (1.1.0) — added nuance, broadly compatible.
Major (2.0.0) — rubric changed, scores not comparable with 1.x.

Active prompt is pinned in registry.LATEST — old versions stay in the registry for replay.

Cost envelope

Meme signal — ~$0.002 per call (Sonnet 4.5 tool-use).
Image signal — ~$0.005 per call (Sonnet 4.5 vision).
Name / social / creator-stub / risk-stub — $0 (deterministic / stub).

Full budget: ~$0.007 per submission. Tracked in cost-tracker.ts per prompt version so we can tell when a model swap hurts the budget.

Tests

Signal-level: signals/signals.test.ts — 17 cases
Service-level: service.test.ts — 12 cases
SSRF-safe fetcher: image-fetcher.test.ts — 13 cases
Anthropic client (retries, timeouts, breaker): 6 cases
Explain determinism: 4 cases

Run: pnpm --filter @hatch/api test.