# SelfCheckGPT
If a model knows something, its answers should be consistent.
The method: generate N samples (typically 20) for the same prompt. Score consistency using Natural Language Inference. Sentences scoring above 0.35 get flagged as potential hallucinations.
This is a zero-resource, black-box approach. You only need access to the model's outputs, not its internals. Works with API-based models where you can't inspect weights or activations.
The logic is simple. When a model has real knowledge, repeated sampling converges. When it's confabulating, responses drift. Inconsistency is the signal.
Tradeoff: computational overhead. You're running 20x the inference to verify one output. Worth it for high-stakes applications. Overkill for casual use.
Published by Manakul et al. (2023).
---
Links:
- [[AI Verification]]
- [[Semantic Entropy]]
- [[LLM-as-Judge]]
- [[Hallucination Detection]]
---
#deeptech #kp