Ask a model to pick a random number between 1 and 6. Nearly every model says 4, every time.
A benchmark for surfacing opinion tendencies and outliers across LLMs.