Methodology

About

As more people turn to LLMs for advice, it seems useful to understand inherent preferences that the models have. Even with open-weight models we don’t have open training data. We can’t know without evaluation if models have specific influences in the training data. This is an exploration of model preferences when asked for advice.

There are many different aspects of AI Alignment. Advice is a small portion but is an interesting place to start.

How It Works

This is not a “benchmark” like other AI benchmarks. These questions don’t have a “right” answer. Your preference for an answer is based on your experiences and beliefs. So we don’t score results as right or wrong. But we do note outliers.

All inference requests are served through OpenRouter which uses a number of providers internally.

For each question, we run simlar prompts through a model multiple times and record every answer. Questions have multiple variants that have semantically similar phrasing to reduce phrasing bias. Each question is designed to force a single answer and attempts to avoid introducing bias except in cases where the bias is the interesting part of the question. For example, “Should I quit my job?” is not an interesting question because it requires more information.

We then normalize each response down to a canonical short value (e.g. “yes”, “no”, “cat”, “dog”) using a separate AI call, and tally the distribution. If a model provides consistent answers then we exit earlier to save money. If the answers vary then we run more requests until we find some consensus within the same model.

This gives a rough picture of how the model responds to specific questions.

Refusal and Hedging

Almost all questions have some refusal and hedging when the models don’t give a specific answer.

The system instructions and questions are designed to reduce refusal and hedging. We ask for direct and concise answers without followup questions. But some models will still refuse. And, for some questions refusal is probably the “correct” answer.

Limitations & Caveats

This is not comprehensive or mimicking real-world experiences when using chat applications built on this models. A few important caveats:

Take everything here as a rough data point, not a definitive measure. Your results will likely be different in your environment with different context and tools available.