SelfCheckGPT - Mind Palace

# SelfCheckGPT If a model knows something, its answers should be consistent. The method: generate N samples (typically 20) for the same prompt. Score consistency using Natural Language Inference. Sentences scoring above 0.35 get flagged as potential hallucinations. This is a zero-resource, black-box approach. You only need access to the model's outputs, not its internals. Works with API-based models where you can't inspect weights or activations. The logic is simple. When a model has real knowledge, repeated sampling converges. When it's confabulating, responses drift. Inconsistency is the signal. Tradeoff: computational overhead. You're running 20x the inference to verify one output. Worth it for high-stakes applications. Overkill for casual use. Published by Manakul et al. (2023). --- Links: - [[AI Verification]] - [[Semantic Entropy]] - [[LLM-as-Judge]] - [[Hallucination Detection]] --- #deeptech #kp