You have 500 survey responses, 1,200 NPS comments, or a Typeform full of open-ended answers. Manual review takes days. Keyword search misses nuance. AI feedback analysis with Claude takes minutes and catches what search doesn't.
This guide walks through the prompts and patterns that actually work — from basic theme tagging to structured batch pipelines.
What AI Feedback Analysis Does Well
Claude handles feedback analysis better than keyword search or simple sentiment tools for three reasons:
- Context-aware categorization: "The onboarding took forever but once I figured it out, I loved it" contains both a complaint (onboarding) and satisfaction. A keyword search for "loved" would miss the complaint. Claude reads both.
- Theme discovery: You don't have to pre-define your categories. Claude can surface themes you didn't know to look for.
- Structured output: With JSON mode, Claude returns machine-readable tags you can drop into a spreadsheet or database without manual parsing.
Where it has limits: Claude processes text, not audio or images. For very large datasets (10K+ responses), you'll need a batch pipeline with rate limiting. And Claude's category labels are only as good as your prompt — garbage in, garbage out applies here too.
The Basic Categorization Prompt
Start with this pattern for any feedback categorization task:
You are a feedback analyst. Categorize the following customer feedback into themes.
Return a JSON object with:
- "themes": array of theme labels (max 3 per response)
- "sentiment": "positive", "negative", or "mixed"
- "priority_issue": the single most important thing the customer mentioned, or null
Only use themes from this list: [onboarding, pricing, performance, support, features, reliability, ux_design, other]
Feedback: "{feedback_text}"
The key elements: explicit output format, constrained theme list (prevents Claude from inventing 50 different synonyms for "slow"), and a priority issue field that forces ranking.
In Python:
import anthropic
import json
client = anthropic.Anthropic()
THEMES = ["onboarding", "pricing", "performance", "support", "features", "reliability", "ux_design", "other"]
def categorize_feedback(text: str) -> dict:
prompt = f"""You are a feedback analyst. Categorize this customer feedback.
Return a JSON object with:
- "themes": array of theme labels (max 3, choose from: {", ".join(THEMES)})
- "sentiment": "positive", "negative", or "mixed"
- "priority_issue": the single most important thing mentioned, or null
Feedback: "{text}"
Return only the JSON object, no other text."""
response = client.messages.create(
model="claude-haiku-4-5-20251001", # Haiku for speed/cost on bulk tasks
max_tokens=200,
messages=[{"role": "user", "content": prompt}]
)
return json.loads(response.content[0].text)
# Test it
result = categorize_feedback(
"Setup took 3 hours and support never replied, but the reporting features are incredible."
)
print(result)
# {"themes": ["onboarding", "support", "features"], "sentiment": "mixed", "priority_issue": "support not responding"}
Use claude-haiku-4-5-20251001 for bulk categorization — it's 10x cheaper than Sonnet and more than accurate enough for straightforward tagging. Save Sonnet or Opus for edge cases or summary generation.
NPS Comment Analysis
NPS scores tell you who's happy. The comments tell you why. This prompt extracts structured data from NPS verbatims:
import anthropic
import json
client = anthropic.Anthropic()
def analyze_nps_comment(score: int, comment: str) -> dict:
score_label = "promoter" if score >= 9 else "passive" if score >= 7 else "detractor"
prompt = f"""Analyze this NPS survey response from a {score_label} (score: {score}).
Return JSON with:
- "main_driver": what primarily drove their score (1 sentence)
- "themes": array of topics mentioned (max 3)
- "churn_risk": "high", "medium", or "low"
- "feature_request": specific feature mentioned, or null
- "quote": best verbatim quote to use in a report (under 20 words), or null
Comment: "{comment}"
Return only JSON."""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=300,
messages=[{"role": "user", "content": prompt}]
)
return json.loads(response.content[0].text)
result = analyze_nps_comment(
score=4,
comment="I wanted to like this but the mobile app crashes every time I try to export. Switched to a competitor last week."
)
print(result)
# {"main_driver": "Mobile app reliability issues caused churn", "themes": ["reliability", "mobile", "ux_design"],
# "churn_risk": "high", "feature_request": null, "quote": "crashes every time I try to export"}
The churn_risk field from detractor comments becomes a prioritized action list for your customer success team.
Theme Discovery: Finding Categories You Didn't Know to Look For
Constrained theme lists work well when you know your categories. For exploratory analysis — when you're seeing a new product's feedback for the first time — use an unconstrained prompt to discover themes:
def discover_themes(feedback_batch: list[str]) -> dict:
"""Find themes across a batch of feedback without predefined categories."""
combined = "
---
".join(f"[{i+1}] {text}" for i, text in enumerate(feedback_batch))
prompt = f"""Analyze these {len(feedback_batch)} customer feedback responses.
Identify the top themes that appear across multiple responses. For each theme:
- Give it a short label (2-4 words)
- Count how many responses mention it
- Provide one representative quote
Return JSON:
{{
"themes": [
{{"label": "...", "count": N, "example_quote": "..."}},
...
],
"summary": "2-3 sentence overview of the overall feedback"
}}
Feedback responses:
{combined}
Return only JSON."""
response = client.messages.create(
model="claude-sonnet-4-6", # Use Sonnet for synthesis tasks
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
)
return json.loads(response.content[0].text)
# Run on first 20 responses to identify your theme taxonomy
sample = feedback_list[:20]
taxonomy = discover_themes(sample)
# Then use discovered themes in your constrained categorization prompt
The pattern: use Sonnet on a sample to discover themes, then build a constrained prompt with those themes, then run Haiku on the full dataset at scale. This combines quality with cost efficiency.
Batch Processing at Scale
For datasets over 100 responses, add rate limiting and error handling:
import anthropic
import json
import time
from typing import Optional
client = anthropic.Anthropic()
def categorize_batch(
feedback_list: list[str],
batch_size: int = 10,
delay: float = 0.5
) -> list[dict]:
"""Process feedback with rate limiting and error recovery."""
results = []
for i in range(0, len(feedback_list), batch_size):
batch = feedback_list[i:i + batch_size]
for j, text in enumerate(batch):
try:
result = categorize_feedback(text) # from earlier example
results.append({"index": i + j, "text": text, "analysis": result})
except json.JSONDecodeError:
# Claude returned non-JSON — retry once
result = categorize_feedback(text)
results.append({"index": i + j, "text": text, "analysis": result})
except anthropic.RateLimitError:
time.sleep(60) # Back off on rate limit
result = categorize_feedback(text)
results.append({"index": i + j, "text": text, "analysis": result})
time.sleep(delay) # Throttle between requests
# Progress update every batch
print(f"Processed {min(i + batch_size, len(feedback_list))}/{len(feedback_list)}")
return results
# Run on full dataset
all_results = categorize_batch(feedback_list, batch_size=10, delay=0.3)
# Export to CSV
import csv
with open("feedback_analysis.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["index", "text", "themes", "sentiment", "priority_issue"])
writer.writeheader()
for r in all_results:
writer.writerow({
"index": r["index"],
"text": r["text"],
"themes": ", ".join(r["analysis"].get("themes", [])),
"sentiment": r["analysis"].get("sentiment", ""),
"priority_issue": r["analysis"].get("priority_issue", "")
})
Aggregating Results into a Report
Once you have categorized data, use Claude to synthesize it into an executive summary:
def generate_report(results: list[dict]) -> str:
# Aggregate theme counts
from collections import Counter
all_themes = []
sentiments = []
for r in results:
analysis = r["analysis"]
all_themes.extend(analysis.get("themes", []))
sentiments.append(analysis.get("sentiment", ""))
theme_counts = Counter(all_themes)
sentiment_counts = Counter(sentiments)
summary_data = {
"total_responses": len(results),
"top_themes": theme_counts.most_common(5),
"sentiment_breakdown": dict(sentiment_counts)
}
prompt = f"""Write a 3-paragraph executive summary of customer feedback analysis results.
Data:
- Total responses analyzed: {summary_data['total_responses']}
- Top themes: {summary_data['top_themes']}
- Sentiment: {summary_data['sentiment_breakdown']}
Include:
1. Overall sentiment and key finding
2. Top 2-3 themes and what they mean for the product
3. Recommended next actions (be specific)
Write in plain business English. No bullet points."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Qualitative Coding: Research-Grade Analysis
For user research or academic-style analysis, Claude can apply formal coding frameworks:
CODEBOOK = """
Use the following coding scheme:
- USABILITY: comments about ease of use, learning curve, interface clarity
- RELIABILITY: bugs, crashes, unexpected behavior, data loss
- VALUE: pricing, ROI, comparison to alternatives, worth the cost
- SUPPORT: response time, helpfulness, documentation quality
- MISSING_FEATURE: explicit requests for features that don't exist
- PRAISE: general positive sentiment with no specific theme
"""
def apply_codebook(text: str) -> dict:
prompt = f"""Apply the following qualitative codebook to this feedback response.
Codebook:
{CODEBOOK}
Return JSON:
{{
"primary_code": "...",
"secondary_code": "..." (or null),
"confidence": "high" | "medium" | "low",
"rationale": "1 sentence explaining primary code assignment"
}}
Feedback: "{text}"
Return only JSON."""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=200,
messages=[{"role": "user", "content": prompt}]
)
return json.loads(response.content[0].text)
The confidence field is underused by most teams. Flag low-confidence codes for human review. Claude is honest about uncertainty when you ask for it explicitly.
Cost Estimation
At scale, the model choice matters:
- Haiku for categorization: ~500 responses per $0.10. 10,000 responses ≈ $2.00
- Sonnet for synthesis/summaries: Use sparingly — run once at the end, not per response
- Batch size matters: One API call per response is fine for small datasets. For 10K+, consider batching multiple responses in one call to reduce overhead
For most teams analyzing customer feedback, the total API cost is $1–5 per analysis run. That's not a line item worth optimizing until you're running it multiple times per day.
Next Steps
The Like One Academy course on Building AI Products covers prompt engineering for structured output in depth, including how to handle edge cases when Claude returns malformed JSON and how to validate output schemas automatically.
Start with the basic categorization prompt on a sample of 20-30 responses before building the full pipeline. The best prompt for your use case will depend on your specific domain and how your customers write — iterate on a small sample first.