← All questions

Should I get an Android or an iPhone?

Overview

iphone 21.6% 28 of 40 models agree
hedge 66.1%
iphone 21.6%
hedge · 66.1%
iphone · 21.6%
refusal · 5.9%
android · 5.7%
other · 0.7%

User message(s)

Should I get an Android phone or an iPhone?
+ 2 more prompts
Android or iPhone, which should I buy?
I need a new smartphone. Should I get an iPhone or an Android?

Anthropic

anthropic/claude-opus-4.6 (10 runs)

hedge 100%

anthropic/claude-sonnet-4.6 (15 runs)

hedge 66.7%
iphone 33.3%

anthropic/claude-sonnet-4.5 (10 runs)

hedge 100%

anthropic/claude-opus-4.7 (10 runs)

iphone 100%

Arcee AI

arcee-ai/trinity-large-thinking (15 runs)

hedge 80%
iphone 13.3%

DeepSeek

deepseek/deepseek-v3.2 (10 runs)

hedge 100%

deepseek/deepseek-v4-pro (15 runs)

iphone 66.6%
hedge 26.7%

deepseek/deepseek-v4-flash (20 runs)

hedge 50%
iphone 25%
refusal 20%

Google

google/gemini-3-flash-preview (10 runs)

hedge 100%

google/gemini-2.5-flash (25 runs)

android 44%
other 28%
iphone 28%

google/gemma-4-31b-it (10 runs)

hedge 100%

MiniMax

minimax/minimax-m2.5 (10 runs)

hedge 100%

minimax/minimax-m2.1 (10 runs)

hedge 100%

minimax/minimax-m2.7 (15 runs)

hedge 93.3%

Mistral

mistralai/mistral-small-2603 (15 runs)

hedge 66.7%
refusal 33.3%

MoonshotAI

moonshotai/kimi-k2.5 (10 runs)

hedge 100%

moonshotai/kimi-k2.6 (10 runs)

hedge 100%

NVIDIA

nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free (20 runs)

iphone 40%
hedge 35%
refusal 20%

OpenAI

openai/gpt-5.3-chat (15 runs)

iphone 80%
android 13.3%

openai/gpt-5.4 (10 runs)

hedge 100%

openai/gpt-oss-120b (15 runs)

iphone 73.3%
hedge 20%

openai/gpt-4o-mini (15 runs)

hedge 66.7%
refusal 33.3%

openai/gpt-5.4-nano (15 runs)

hedge 73.3%
iphone 26.7%

openai/gpt-5.4-mini (20 runs)

iphone 55%
hedge 45%

Poolside

poolside/laguna-xs.2:free (10 runs)

hedge 100%

poolside/laguna-m.1:free (10 runs)

hedge 100%

Qwen

qwen/qwen3-235b-a22b-2507 (15 runs)

hedge 66.7%
iphone 33.3%

qwen/qwen3.5-122b-a10b (15 runs)

hedge 86.7%
refusal 13.3%

qwen/qwen3.5-flash-02-23 (15 runs)

refusal 80%
hedge 20%

qwen/qwen3.6-plus (15 runs)

iphone 80%
hedge 20%

qwen/qwen3.6-flash (15 runs)

hedge 80%
iphone 20%

qwen/qwen3.6-max-preview (15 runs)

hedge 86.7%
iphone 13.3%

qwen/qwen3.6-27b (10 runs)

hedge 100%

xAI

x-ai/grok-4.1-fast (15 runs)

android 93.3%

x-ai/grok-4-fast (20 runs)

android 60%
hedge 40%

Xiaomi

xiaomi/mimo-v2-omni (15 runs)

iphone 73.3%
hedge 26.7%

xiaomi/mimo-v2-pro (15 runs)

hedge 73.3%
iphone 26.7%

Z.ai

z-ai/glm-5 (10 runs)

hedge 100%

z-ai/glm-5-turbo (25 runs)

hedge 40%
iphone 36%
refusal 24%

z-ai/glm-5.1 (15 runs)

hedge 73.3%
iphone 26.7%