What Claude model should I use for feedback analysis?

Use claude-haiku-4-5-20251001 for bulk categorization tasks — it's accurate enough for tagging and 10x cheaper than Sonnet. Use claude-sonnet-4-6 for synthesis tasks like generating executive summaries or discovering themes from a sample. This combination gives you cost efficiency at scale with quality where it matters.

How many feedback responses can Claude analyze at once?

For individual categorization (one response per API call), Claude has no practical limit. For batch analysis where you include multiple responses in one prompt, keep batches under 20-30 responses to maintain accuracy. For datasets over 1,000 responses, build a pipeline with rate limiting — the Anthropic API allows burst requests but throttles sustained high volume.

How do I get Claude to return consistent JSON output?

Specify the exact JSON structure in your prompt with field names and types. End the prompt with 'Return only JSON, no other text.' For critical production use, wrap the API call in a try/except for json.JSONDecodeError and retry once on failure. Claude returns valid JSON reliably when the structure is clearly specified, but the retry catches rare edge cases.

Can Claude discover new themes I didn't define in advance?

Yes. Use an unconstrained discovery prompt on a sample of 20-30 responses to let Claude surface themes organically. Once you have the theme taxonomy, switch to a constrained prompt with those themes for consistent bulk categorization. This two-phase approach combines discovery quality with consistency at scale.

Is Claude's feedback analysis accurate enough for business decisions?

For bulk categorization and theme identification, Claude reaches 85-95% accuracy on well-defined categories. For edge cases or nuanced coding, confidence drops. The confidence field in your prompts helps: flag low-confidence results for human review. For research-grade or high-stakes analysis, treat Claude as a first-pass tool and have humans validate a sample.

Claude Feedback Analysis: Survey Data Guide

Categorize customer feedback, themes, and sentiment with Claude. Step-by-step Python prompts for NPS analysis, survey coding, and user research at scale.

You have 500 survey responses, 1,200 NPS comments, or a Typeform full of open-ended answers. Manual review takes days. Keyword search misses nuance. AI feedback analysis with Claude takes minutes and catches what search doesn't.

This guide walks through the prompts and patterns that actually work — from basic theme tagging to structured batch pipelines.

What AI Feedback Analysis Does Well

Claude handles feedback analysis better than keyword search or simple sentiment tools for three reasons:

Context-aware categorization: "The onboarding took forever but once I figured it out, I loved it" contains both a complaint (onboarding) and satisfaction. A keyword search for "loved" would miss the complaint. Claude reads both.
Theme discovery: You don't have to pre-define your categories. Claude can surface themes you didn't know to look for.
Structured output: With JSON mode, Claude returns machine-readable tags you can drop into a spreadsheet or database without manual parsing.

Where it has limits: Claude processes text, not audio or images. For very large datasets (10K+ responses), you'll need a batch pipeline with rate limiting. And Claude's category labels are only as good as your prompt — garbage in, garbage out applies here too.

The Basic Categorization Prompt

Start with this pattern for any feedback categorization task:

You are a feedback analyst. Categorize the following customer feedback into themes.

Return a JSON object with:
- "themes": array of theme labels (max 3 per response)
- "sentiment": "positive", "negative", or "mixed"
- "priority_issue": the single most important thing the customer mentioned, or null

Only use themes from this list: [onboarding, pricing, performance, support, features, reliability, ux_design, other]

Feedback: "{feedback_text}"

The key elements: explicit output format, constrained theme list (prevents Claude from inventing 50 different synonyms for "slow"), and a priority issue field that forces ranking.

In Python:

import anthropic
import json

client = anthropic.Anthropic()

THEMES = ["onboarding", "pricing", "performance", "support", "features", "reliability", "ux_design", "other"]

def categorize_feedback(text: str) -> dict:
    prompt = f"""You are a feedback analyst. Categorize this customer feedback.

Return a JSON object with:
- "themes": array of theme labels (max 3, choose from: {", ".join(THEMES)})
- "sentiment": "positive", "negative", or "mixed"  
- "priority_issue": the single most important thing mentioned, or null

Feedback: "{text}"

Return only the JSON object, no other text."""

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # Haiku for speed/cost on bulk tasks
        max_tokens=200,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return json.loads(response.content[0].text)

# Test it
result = categorize_feedback(
    "Setup took 3 hours and support never replied, but the reporting features are incredible."
)
print(result)
# {"themes": ["onboarding", "support", "features"], "sentiment": "mixed", "priority_issue": "support not responding"}

Use claude-haiku-4-5-20251001 for bulk categorization — it's 10x cheaper than Sonnet and more than accurate enough for straightforward tagging. Save Sonnet or Opus for edge cases or summary generation.

NPS Comment Analysis

NPS scores tell you who's happy. The comments tell you why. This prompt extracts structured data from NPS verbatims:

import anthropic
import json

client = anthropic.Anthropic()

def analyze_nps_comment(score: int, comment: str) -> dict:
    score_label = "promoter" if score >= 9 else "passive" if score >= 7 else "detractor"
    
    prompt = f"""Analyze this NPS survey response from a {score_label} (score: {score}).

Return JSON with:
- "main_driver": what primarily drove their score (1 sentence)
- "themes": array of topics mentioned (max 3)  
- "churn_risk": "high", "medium", or "low"
- "feature_request": specific feature mentioned, or null
- "quote": best verbatim quote to use in a report (under 20 words), or null

Comment: "{comment}"

Return only JSON."""

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=300,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return json.loads(response.content[0].text)

result = analyze_nps_comment(
    score=4,
    comment="I wanted to like this but the mobile app crashes every time I try to export. Switched to a competitor last week."
)
print(result)
# {"main_driver": "Mobile app reliability issues caused churn", "themes": ["reliability", "mobile", "ux_design"], 
#  "churn_risk": "high", "feature_request": null, "quote": "crashes every time I try to export"}

The churn_risk field from detractor comments becomes a prioritized action list for your customer success team.

Theme Discovery: Finding Categories You Didn't Know to Look For

Constrained theme lists work well when you know your categories. For exploratory analysis — when you're seeing a new product's feedback for the first time — use an unconstrained prompt to discover themes:

def discover_themes(feedback_batch: list[str]) -> dict:
    """Find themes across a batch of feedback without predefined categories."""
    combined = "
---
".join(f"[{i+1}] {text}" for i, text in enumerate(feedback_batch))
    
    prompt = f"""Analyze these {len(feedback_batch)} customer feedback responses.

Identify the top themes that appear across multiple responses. For each theme:
- Give it a short label (2-4 words)
- Count how many responses mention it
- Provide one representative quote

Return JSON:
{{
  "themes": [
    {{"label": "...", "count": N, "example_quote": "..."}},
    ...
  ],
  "summary": "2-3 sentence overview of the overall feedback"
}}

Feedback responses:
{combined}

Return only JSON."""

    response = client.messages.create(
        model="claude-sonnet-4-6",  # Use Sonnet for synthesis tasks
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return json.loads(response.content[0].text)

# Run on first 20 responses to identify your theme taxonomy
sample = feedback_list[:20]
taxonomy = discover_themes(sample)
# Then use discovered themes in your constrained categorization prompt

The pattern: use Sonnet on a sample to discover themes, then build a constrained prompt with those themes, then run Haiku on the full dataset at scale. This combines quality with cost efficiency.

Batch Processing at Scale

For datasets over 100 responses, add rate limiting and error handling:

import anthropic
import json
import time
from typing import Optional

client = anthropic.Anthropic()

def categorize_batch(
    feedback_list: list[str],
    batch_size: int = 10,
    delay: float = 0.5
) -> list[dict]:
    """Process feedback with rate limiting and error recovery."""
    results = []
    
    for i in range(0, len(feedback_list), batch_size):
        batch = feedback_list[i:i + batch_size]
        
        for j, text in enumerate(batch):
            try:
                result = categorize_feedback(text)  # from earlier example
                results.append({"index": i + j, "text": text, "analysis": result})
            except json.JSONDecodeError:
                # Claude returned non-JSON — retry once
                result = categorize_feedback(text)
                results.append({"index": i + j, "text": text, "analysis": result})
            except anthropic.RateLimitError:
                time.sleep(60)  # Back off on rate limit
                result = categorize_feedback(text)
                results.append({"index": i + j, "text": text, "analysis": result})
            
            time.sleep(delay)  # Throttle between requests
        
        # Progress update every batch
        print(f"Processed {min(i + batch_size, len(feedback_list))}/{len(feedback_list)}")
    
    return results

# Run on full dataset
all_results = categorize_batch(feedback_list, batch_size=10, delay=0.3)

# Export to CSV
import csv
with open("feedback_analysis.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["index", "text", "themes", "sentiment", "priority_issue"])
    writer.writeheader()
    for r in all_results:
        writer.writerow({
            "index": r["index"],
            "text": r["text"],
            "themes": ", ".join(r["analysis"].get("themes", [])),
            "sentiment": r["analysis"].get("sentiment", ""),
            "priority_issue": r["analysis"].get("priority_issue", "")
        })

Aggregating Results into a Report

Once you have categorized data, use Claude to synthesize it into an executive summary:

def generate_report(results: list[dict]) -> str:
    # Aggregate theme counts
    from collections import Counter
    all_themes = []
    sentiments = []
    
    for r in results:
        analysis = r["analysis"]
        all_themes.extend(analysis.get("themes", []))
        sentiments.append(analysis.get("sentiment", ""))
    
    theme_counts = Counter(all_themes)
    sentiment_counts = Counter(sentiments)
    
    summary_data = {
        "total_responses": len(results),
        "top_themes": theme_counts.most_common(5),
        "sentiment_breakdown": dict(sentiment_counts)
    }
    
    prompt = f"""Write a 3-paragraph executive summary of customer feedback analysis results.

Data:
- Total responses analyzed: {summary_data['total_responses']}
- Top themes: {summary_data['top_themes']}
- Sentiment: {summary_data['sentiment_breakdown']}

Include:
1. Overall sentiment and key finding
2. Top 2-3 themes and what they mean for the product
3. Recommended next actions (be specific)

Write in plain business English. No bullet points."""

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.content[0].text

Qualitative Coding: Research-Grade Analysis

For user research or academic-style analysis, Claude can apply formal coding frameworks:

CODEBOOK = """
Use the following coding scheme:
- USABILITY: comments about ease of use, learning curve, interface clarity
- RELIABILITY: bugs, crashes, unexpected behavior, data loss
- VALUE: pricing, ROI, comparison to alternatives, worth the cost
- SUPPORT: response time, helpfulness, documentation quality
- MISSING_FEATURE: explicit requests for features that don't exist
- PRAISE: general positive sentiment with no specific theme
"""

def apply_codebook(text: str) -> dict:
    prompt = f"""Apply the following qualitative codebook to this feedback response.

Codebook:
{CODEBOOK}

Return JSON:
{{
  "primary_code": "...",
  "secondary_code": "..." (or null),
  "confidence": "high" | "medium" | "low",
  "rationale": "1 sentence explaining primary code assignment"
}}

Feedback: "{text}"
Return only JSON."""
    
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=200,
        messages=[{"role": "user", "content": prompt}]
    )
    return json.loads(response.content[0].text)

The confidence field is underused by most teams. Flag low-confidence codes for human review. Claude is honest about uncertainty when you ask for it explicitly.

Cost Estimation

At scale, the model choice matters:

Haiku for categorization: ~500 responses per $0.10. 10,000 responses ≈ $2.00
Sonnet for synthesis/summaries: Use sparingly — run once at the end, not per response
Batch size matters: One API call per response is fine for small datasets. For 10K+, consider batching multiple responses in one call to reduce overhead

For most teams analyzing customer feedback, the total API cost is $1–5 per analysis run. That's not a line item worth optimizing until you're running it multiple times per day.

Next Steps

The Like One Academy course on Building AI Products covers prompt engineering for structured output in depth, including how to handle edge cases when Claude returns malformed JSON and how to validate output schemas automatically.

Start with the basic categorization prompt on a sample of 20-30 responses before building the full pipeline. The best prompt for your use case will depend on your specific domain and how your customers write — iterate on a small sample first.