Temperature Lab
Master the creativity dial — understand how temperature shapes AI output, with real API examples
What Is Temperature?
Temperature controls the randomness of an AI's output. It is a number between 0 and 1 that you pass with every API call. At temperature 0, the model always picks the most probable next word — making it deterministic and precise. At temperature 1, it samples from a wider distribution of possibilities — making it creative and unpredictable.
Technically, temperature scales the logits (raw probability scores) before the model picks the next token. Lower temperature sharpens the probability distribution — the most likely token becomes overwhelmingly dominant. Higher temperature flattens it — less likely tokens get a real chance of being selected. The result: lower temperature produces consistent, predictable text; higher temperature produces varied, surprising text.
Temperature in the API
Temperature is a single parameter in the API call. Here is how to use it:
import anthropic
client = anthropic.Anthropic()
def generate(prompt: str, temperature: float = 1.0) -> str:
"""Generate a response at a specific temperature."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=300,
temperature=temperature, # 0.0 to 1.0
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
# Same prompt, different temperatures
prompt = "Name a product that helps people sleep better."
# Temperature 0 — always the same answer
print(generate(prompt, temperature=0.0))
# → "SleepWell" (every time, deterministic)
print(generate(prompt, temperature=0.0))
# → "SleepWell" (identical — temperature 0 is reproducible)
# Temperature 0.8 — different each time
print(generate(prompt, temperature=0.8))
# → "DreamDrift" (creative, varied)
print(generate(prompt, temperature=0.8))
# → "NightHaven" (different answer each time)
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 300,
"temperature": 0.0,
"messages": [
{"role": "user", "content": "Classify this email as spam or not: You won a free iPhone!"}
]
}'
Temperature in Practice — Same Prompt, Different Results
To understand temperature intuitively, consider the prompt "Write a product name for a sleep aid." Here is what you might get at different temperature settings:
Output: "SleepWell"
Run it again: "SleepWell"
Run it again: "SleepWell"
Always the same — the most probable answer every time.
Output: "DreamDrift"
Run it again: "NightHaven"
Run it again: "LunaRest"
Different each time — less probable tokens get a real chance.
Try this yourself: run the same prompt at temperature 0 five times (identical results), then at temperature 0.8 five times (five different creative answers). The difference is immediately obvious.
When to Use Each Temperature
Code generation, data extraction, math, factual Q&A, classification, JSON output. When you need consistency and reproducibility. If you run the same prompt twice and want the same answer, use 0.
General conversation, summarization, editing, analysis. Good balance of reliability and naturalness. Most chatbots use this range.
Creative writing, brainstorming, marketing copy, poetry. When you want variety and surprise. Run the same prompt 5 times and you'll get 5 different creative angles.
Experimental writing, wild brainstorming, artistic exploration. Output may become incoherent — less likely tokens get selected, producing unexpected word choices. Use sparingly.
Temperature + Top-P: The Full Picture
Temperature is not the only sampling parameter. Claude also supports top_p (nucleus sampling), which limits the token pool to the smallest set whose cumulative probability exceeds a threshold.
Scales all probabilities. Lower = sharper distribution, higher = flatter. Affects how the model weights its options.
Cuts off low-probability tokens entirely. top_p=0.9 means "only consider tokens in the top 90% of probability mass." Prevents rare, incoherent tokens.
# For creative tasks: high temperature + moderate top_p
# This gives variety while preventing total nonsense
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
temperature=0.8, # creative sampling
top_p=0.9, # but cut off the truly wild tokens
messages=[{
"role": "user",
"content": "Write 5 taglines for a coffee shop called 'The Grind'."
}]
)
# For deterministic tasks: temperature 0 (top_p doesn't matter)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=100,
temperature=0.0, # deterministic — always same output
messages=[{
"role": "user",
"content": "Classify: 'Your package has been delayed.' → [urgent/normal/spam]"
}]
)
# → "normal" (every single time)
Common Temperature Mistakes
These are the mistakes that trip up even experienced developers:
Temperature 0.7+ introduces random token choices into syntax, variable names, and logic. The result: code that looks creative but has subtle bugs. Always use temperature 0 for code.
Temperature 0 always picks the most probable token. For creative tasks, this produces generic, predictable text. "The sun set over the horizon" instead of something surprising. Bump to 0.7-0.8.
It does not. Temperature only changes the selection strategy. A temperature 0 response uses the same model knowledge as temperature 1 — just with different randomness in word choice.
Claude's default temperature is 1.0, which is more random than most use cases need. For production applications, always set temperature explicitly. A common starting point is 0.3 for general tasks.