Boosting Claude Code performance

with prompt learning

Simple, Effective, Data-Driven Prompt Optimization

Arize Builders Meetup NYC - 2026-03-12

A quick shout-out

What are we doing today?

  • Prompt Learning on Claude Code
  • SWE-Bench Lite: up to 11% improvement
  • No fine-tuning. No new tools. No architecture changes.

A tweet

The Memento problem

System prompts are bigger than you think

CLAUDE.md is yours

Most people's CLAUDE.md

is terrible

A benchmark to measure against

SWE-Bench Lite

  • 300 real GitHub issues
  • Popular open-source Python repositories
  • Ground-truth patches + test suites

Why SWE-Bench?

Why SWE-Bench is hard

Our starting point

What is Prompt Learning?

Reinforcement Learning

Standard RL:

effective but expensive

  • Sample inefficient
  • Slow and expensive
  • Opaque: what do weight changes mean?
  • Overkill when LLMs are already great

Prompt Learning: the same loop, different algorithm

Why English feedback beats a score

What the meta-prompt does

We did this with Cline first

Why GPT-4.1 for Cline?

The Cline optimization loop

Cline results

The GPT-4.1 story

Now: Claude Code

Part 1: Rollouts

Part 2: Generate English feedback

Evals make all the difference

Part 3: Meta-prompting

The before: an empty file

The after: twenty rules

Rule one

Fix code at the correct hierarchy level
so all code paths benefit, not just downstream consumers.

Rule two

Maintain backward compatibility and consistency
with test expectations in error/warning behavior.

Rule three

Warn before raising errors
when deprecating usage
to allow user code transitions.

Rule four

Ensure correct dependency and execution order
in combined or chained operations.

The pattern in these rules

Cross-repo results

The Django result

20% better!

The honest framing

What this is really doing

But that's overfit!

Your git history as a training set

Sample efficiency

What you can do right now

The manual version

Even without closed issues

What makes a good rule

Applies to all coding agents

  • Cursor: .cursorrules
  • Cline: .clinerules
  • Windsurf: .windsurfrules
  • Claude Code: CLAUDE.md

Claude Code already does this for itself

The open source

github.com/Arize-ai/prompt-learning

Six rules to take away

  • 1. Your CLAUDE.md is underutilized
  • 2. Let your failures tell you what to write
  • 3. Repo-specific beats generic
  • 4. Your git history is a training set
  • 5. The automation is optional
  • 6. This works on any coding agent

A bigger picture:

Self-improving softwarae

Thank you

These slides slides.com/seldo/prompt-learning

 

Follow me on BlueSky 🦋 @seldo.com

Boosting Claude Code performance with prompt learning

By Laurie Voss

Boosting Claude Code performance with prompt learning

  • 4