Hatch Scoring
How the six signals work, how they're weighted, and why preliminary scores get re-weighted.
The six signals
| Signal | Weight | Source | Status |
|---|---|---|---|
| Meme | 25% | Anthropic Claude — tool-use on pitch + ticker | ✅ live |
| Creator | 20% | Bitquery — wallet age, tx count, rug history | ⛔ stub (key pending) |
| Image | 15% | Anthropic Claude Vision — 5-band rubric | ✅ live |
| Name | 10% | Deterministic — memorability, phonetics | ✅ live |
| Social | 15% | X handle lookup + heuristics | ✅ live (lightweight) |
| Risk | 15% | GoPlus — honeypot, tax, blacklist, owner rights | ⛔ stub (key pending) |
Weights sum to 100%. Changing them bumps the prompt version and breaks comparability with historical scores.
The aggregate
When all six are live
aggregate = Σ (signal_score × weight)
When one or more are stubbed (preliminary)
Re-weighted over live signals only:
live_weight_total = Σ weight for signals where stub = false
aggregate = Σ (live_signal_score × weight / live_weight_total)
This is the honest-aggregate rule. A 50%-real number shared at 100% confidence is worse than no number.
Bands
| Band | Aggregate | Meaning |
|---|---|---|
| 🟢 Green | ≥ 70 | Strong. Multiple signals green, no red flags. |
| 🟡 Amber | 45–69 | Mixed. Iterate on the weakest signal. |
| 🔴 Red | < 45 | Weak. Likely to stall or rug without intervention. |
Bands map directly to seed-LP tier decisions (Sprint E.1) — green tokens get the most seed, red get the least (or none).
Preliminary flag
confidence: 'preliminary' when any signal has stub: true.
What a preliminary flag blocks
- On-chain attestation (publisher refuses regardless of env).
- Leaderboard inclusion (
/leaderboards/today). - Percentile denominator (preliminary rows don't count in the cohort).
- Public creator feed top stats.
What it does NOT block
- Sharing the score URL — the "Preliminary" badge travels with the OG image.
- Enrollment + scheduling (creator can still sign a commitment).
- Re-scoring once keys land.
Why re-score is a new UUID
The re-score button replays the stored submission and returns a new UUID. This is deliberate:
- Historical share URLs keep pointing at the original result.
- Old OG images stay cacheable.
- The re-scored row enters the percentile denominator on its own merits.
The scoring request
POST /v1/score
Content-Type: application/json
{
"name": "Yolk",
"symbol": "YOLK",
"description": "Breakfast token. Unserious about price, serious about eggs.",
"imageUrl": "https://cdn.fourmeme.com/yolk.png",
"xHandle": "@yolktoken",
"creatorAddress": "0x1234..."
}
Response shape
{
"id": "a1b2c3d4-...",
"aggregate": 67,
"band": "amber",
"hasStubs": true,
"confidence": "preliminary",
"stubbedSignals": ["creator", "risk"],
"signals": {
"meme": { "score": 78, "reason": "Original food-meme hook.", "stub": false },
"creator": { "score": 50, "reason": "Stub — awaiting Bitquery key.", "stub": true },
"image": { "score": 82, "reason": "Bright yolk on clean background.", "stub": false },
"name": { "score": 75, "reason": "Short, phonetic, easy to type.", "stub": false },
"social": { "score": 55, "reason": "New handle, low follower count.", "stub": false },
"risk": { "score": 60, "reason": "Stub — awaiting GoPlus key.", "stub": true }
},
"explanation": {
"summary": "Strong meme + image carry the aggregate; social is the weakest live signal.",
"contributions": [
{ "signal": "meme", "weight": 0.25, "score": 78, "contribution": 19.5 },
...
]
},
"promptVersion": "meme@1.0.0",
"createdAt": "2026-04-18T12:34:56Z"
}
See SDK types for the exhaustive schema.
How each signal works
Meme (Claude tool-use)
Prompt at apps/api/src/modules/scoring/prompts/meme-v1.0.0.ts.
Asks Claude to emit emit_meme_score(score, reason, confidence) given the
pitch + ticker. Rubric: 90 for genuinely fresh memes; 50 for copycats;
<30 for pure bot-bait.
Creator (Bitquery, pending)
Will query the wallet's BNB history: first tx age, total tx count, tokens launched, rug incidents. Until the key lands, returns a 50 stub with a clear reason string.
Image (Claude Vision)
SSRF-hardened fetcher pulls the image — https-only, DNS private-IP
blocklist, 5 MiB cap, 10s timeout, redirect: error. See
ADR 0005.
Prompt asks Claude Vision to score on visibility, readability, brand coherence. 5-band rubric from "unreadable" (20) to "exceptional" (90+).
Name (deterministic)
Rules:
- Length penalty above 12 chars.
- Alphabetic ratio, hyphen/underscore friendliness.
- Consonant cluster penalty (anti-tonguetwister).
- Capitalization consistency.
Pure functions in apps/api/src/modules/scoring/signals/name.ts.
Social (deterministic + X lookup)
Handle shape + count of recent mentions. Lightweight — doesn't hit X API. Promoted to a full enrichment signal in a future sprint.
Risk (GoPlus, pending)
Will query the contract ABI + GoPlus' honeypot/tax/blacklist endpoint. Red-band risk (<45) will gate attestation regardless of aggregate.
Prompt versioning
Prompts live at apps/api/src/modules/scoring/prompts/<name>-<semver>.ts
and are registered in prompts/registry.ts. Each row records the
promptVersion used so we can replay, A/B, and maintain comparability.
Bumping a prompt version (e.g., meme@1.1.0) is a judgment call:
- Patch (
1.0.1) — wording tweaks that don't change the rubric. - Minor (
1.1.0) — added nuance, broadly compatible. - Major (
2.0.0) — rubric changed, scores not comparable with 1.x.
Active prompt is pinned in registry.LATEST — old versions stay in the
registry for replay.
Cost envelope
- Meme signal — ~$0.002 per call (Sonnet 4.5 tool-use).
- Image signal — ~$0.005 per call (Sonnet 4.5 vision).
- Name / social / creator-stub / risk-stub — $0 (deterministic / stub).
Full budget: ~$0.007 per submission. Tracked in
cost-tracker.ts
per prompt version so we can tell when a model swap hurts the budget.
Tests
- Signal-level:
signals/signals.test.ts— 17 cases - Service-level:
service.test.ts— 12 cases - SSRF-safe fetcher:
image-fetcher.test.ts— 13 cases - Anthropic client (retries, timeouts, breaker): 6 cases
- Explain determinism: 4 cases
Run: pnpm --filter @hatch/api test.