I ran a small experiment on one repository, one task, and three AI agents.
Repository: https://github.com/contributte/apitte-skeleton
Task: improve README for newcomers (what it is, how to start, how to use it)
Quick comparison
Agent | PR | Commits | Diff size | Character |
Copilot | 3 | +387 / -200 | Broad rewrite, very comprehensive | |
Claude | 1 | +212 / -215 | Structured and onboarding-focused | |
Codex | 1 | +54 / -124 | Minimal and easy to review |
Copilot output
Copilot produced the largest rewrite with many added sections, examples, and navigation improvements.
Strong when you want a full docs refresh in one pass, weaker when you want a tight diff.

Claude output
Claude delivered a balanced rewrite: clear intro, quick-start flow, endpoint overview, and practical commands.
This felt closest to a “ready for team review” docs draft.

Codex output
Codex made the most conservative pass: short, focused, less noisy changes.
Great for maintainability and faster review cycles, but less ambitious in scope.

Takeaway
For README generation, model choice matters less than prompt constraints.
If the prompt is open-ended, the diff explodes.
If the prompt is strict, the output becomes mergeable.
My default constraints now:
- keep the diff small
- preserve section order unless asked
- prefer clarity over volume
- stay markdown-lint friendly
All three PRs were useful drafts. The winner is not “the smartest model”, but the one that best fits your review budget and documentation goals.
