Lead
Hi, I’m Pomarano.
This is Part 3 of my AI Agent Study Notes.
Series index Building Your Own AI Agents · Part 1 · Part 2
In Part 2 we designed copy + proofread. Here we built and ran them.
Focus: difference with vs without a proofreading agent.
Like the harness article, we split conditions, use the same task and rubric, and report results you can reproduce.
- Japanese version: 校正エージェントあり/なし(第3回)*
Overview of this part
flowchart TB
subgraph B["Condition B — copy + proofread"]
B1["Copy agent"]
B2["Draft md"]
B3["Proofread agent<br/>6-item check · fix"]
B4["Human final review<br/>~2 min"]
B1 --> B2 --> B3 --> B4
end
subgraph A["Condition A — copy only"]
A1["Copy agent"]
A2["Draft md"]
A3["Human scores 6 items<br/>~8 min edits"]
A1 --> A2 --> A3
end
classDef agent fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1a1a1a
classDef human fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#1a1a1a
class A1 agent
class A2 agent
class A3 human
class B1 agent
class B2 agent
class B3 agent
class B4 human
Experiment goal and setup
Goals
- Compare Condition A (copy only) vs Condition B (copy → proofread) on checklist pass rate and human edit time
- Show numerically where “rules in spec” alone is not enough
Conditions
Repository: pomarano/x_auto_writing
| Item | Content |
|---|---|
| Copy spec | x-shuuchaku-agent-spec.md |
| Copy prompt | automation/x-daily/prompt.md |
| Proofread spec | x-proofread-agent-spec.md |
| Proofread prompt | automation/x-proofread/prompt.md |
| Environment | Cursor Agent (manual start) |
| Sample size | 5 runs per condition |
| Scoring | 6-item checklist from Part 2 (human grader) |
| Raw log | x-proofread-experiment-log.md |
This post summarizes the log. See the log for per-run scores and edit times.
Condition A — copy agent only
2-1. Procedure
- Give Agent
automation/x-daily/prompt.md - Wait until
social/x-drafts/YYYY-MM-DD.mdexists - Do not run proofreading
- Human scores 6 items and edits for posting (record time)
2-2. What we expected — and saw
The first real file (2026-06-03) had 248 characters — far over the 140-char spec.
(Japanese draft — attachment / clinging theme, ~248 chars) Teaching + practice content was usable, but not within X's Japanese limit. Practice step: when irritated, separate facts from mental labels. #内省テック
Under Condition A, you fix this every run.
The 2026-06-05 draft (A2) repeated the pattern: 217 characters, teaching split across two sentences, one long practice sentence.
Even with “140 chars” and “one teaching + one practice” in spec, without a verification layer, rules slip — the starting point for this experiment.
Condition B — copy + proofreading agent
3-1. Procedure
- Same as A — generate copy
- Run
automation/x-proofread/prompt.md(specify date) - Proofread updates body, appends proofread section, updates
char_count - Human final review (record time)
3-2. Escalation rules
- Proofread tries at most one fix pass
- If theme duplicates last 30 days → do not change topic; set
needs_human: true - Do not touch files with
status: posted
Results
4-1. Summary
| Metric | A (copy only) | B (copy + proofread) |
|---|---|---|
| Checklist pass rate | avg 2.4 / 6 (40%) | avg 5.6 / 6 (93%) after proofread |
| Over limit (first output) | 5 / 5 | 5 / 5 at copy → 0 / 5 after proofread |
| Structure violations | 3 / 5 | 0 / 5 after proofread |
| Human edit time | avg 8 min | avg 2 min |
| Post as-is OK | 0 / 5 | 4 / 5 |
For B, pass rate is scored after proofread. For A, right after copy.
4-2. Condition A — runs (excerpt)
| # | Date | Pass | Main violations |
|---|---|---|---|
| A1 | 2026-06-03 | 2/6 | 248 chars, long paragraphs |
| A2 | 2026-06-05 | 3/6 | 217 chars, structure |
| A3 | 2026-06-10 | 2/6 | 156 chars, 2 hashtags |
| A4 | 2026-06-12 | 3/6 | 148 chars, split practice |
| A5 | 2026-06-14 | 2/6 | 162 chars, near-banned phrasing |
4-3. Condition B — runs (excerpt)
| # | Date | Before | After | Main fixes |
|---|---|---|---|---|
| B1 | 2026-06-03 | 2/6 | 6/6 | 248→128 chars, two-sentence reshape |
| B2 | 2026-06-05 | 3/6 | 5/6 | 217→132 chars |
| B3 | 2026-06-10 | 2/6 | 6/6 | 156→125 chars, tag cleanup |
| B4 | 2026-06-12 | 3/6 | 6/6 | practice merged to one sentence |
| B5 | 2026-06-14 | 2/6 | 4/6 | rephrase; duplicate theme → human |
After B1 (~128 chars):
「執着」とは、物事に張り付く見方だと言われます。今日はイライラしたとき、事実と頭の評価を分けて書き出し、評価の行だけ眺めてみる。#内省テック
How to read the results
Biggest gains
| Item | Finding |
|---|---|
| Length | Copy-only almost always over limit; proofread consistently ≤ 140 |
| Structure | Reshaping to one teaching + one practice sentence |
| Human time | 8 min → 2 min — shift from rewriting to checking |
Where proofread is not enough
| Item | Finding |
|---|---|
| Theme duplication | Needs human or re-run copy |
| Factual / teaching accuracy | Not a rule violation — human eyes |
| Cost | Two agent runs — acceptable for personal use, not free |
Same lesson as the harness post: put verification on its own layer — works for multi-agent too.
Implementation notes
| Piece | Content |
|---|---|
| Run | Cursor Agent + prompt (manual or Actions) |
| Storage | Repo .md for history and reproducibility |
| Semi-auto | Human posts; no X API auto-post |
| Extension | GitHub Actions + email per X semi-auto article |
Actions is orthogonal to design: nail two-agent split and verification first; schedule later.
Proofread spec: x-proofread-agent-spec.md; prompt: automation/x-proofread/prompt.md. Same pattern as copy — rules in spec, thin prompt (Part 2).
GitHub Actions (extension)
You can run the copy agent daily via GitHub Actions + Cursor SDK (operations article).
This series is about split roles and proofreading effect. Actions is when to start; whether to proofread is separate.
| Stage | Content |
|---|---|
| Part 3 | Manual A/B comparison |
| Operations | Automate copy; proofread manual or chained |
| Full pipeline | Copy → proofread → email → human |
Running the comparison first makes it easier to judge whether Actions is worth it.
Summary
- Condition A: ~40% checklist pass, ~8 min human edits
- Condition B: 93% after proofread, ~2 min — big wins on length and structure
- Proofreading is not universal — duplication and facts stay with humans
- Part 4 wraps the series — Japanese Part 4

コメント