Lead
Hi, I’m Pomarano.
In an earlier post, I compared JSON formatting with and without a harness. That harness validated output after generation.
grill-me, which I found recently, is an Agent Skill by Matt Pocock that works differently: it asks questions before implementation until you and the agent share the same understanding.
I read Ryo Nakae’s Zenn article, thought “can three lines really change this much?”, and tried it in Cursor.
This post covers the skill itself, how it relates to my earlier harness posts, what I verified, and when it helps (or doesn’t).
- Japanese version: たった3行の Agent Skill「grill-me」を試した
What is grill-me?
grill-me tells the agent to interview you relentlessly about a plan or design until you reach shared understanding.
In Claude Code you invoke /grill-me; in Cursor you load it as a Skill.
Key behaviors:
- One question at a time
- Walk each branch of the design decision tree
- Include a recommended answer with each question
- Explore the codebase instead of asking you, when the answer is in the repo
The skill body is effectively three lines, excluding frontmatter (the YAML metadata block at the top of a Markdown file). It is not a long prompt template — just a short instruction on how to behave.
Two kinds of harness
In my JSON harness post, schema validation and retries fixed output shape.
In semi-automated X drafts, character limits and duplicate checks constrained quality after generation.
grill-me works upstream of both.
| Kind | When | Examples |
|---|---|---|
| Output harness | After generation | JSON schema, character count, retries |
| Alignment harness (grill-me) | Before implementation | Requirements, boundaries, priorities |
If you only say “please build this,” agents tend to rush out a plan or code. grill-me is a short leash that stops and asks first.
Full skill text (original)
Source: mattpocock/skills — grill-me by Matt Pocock
--- name: grill-me description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me". --- Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer. Ask the questions one at a time. If a question can be answered by exploring the codebase, explore the codebase instead.
The three instruction lines (after frontmatter) are the entire behavioral spec.
Setup and how I invoked it
Environment
- Cursor (Agent mode)
- Compared against: normal chat and “dump a plan immediately” behavior (similar to Plan mode)
Install (Cursor)
Follow skills.sh:
npx skills add https://github.com/mattpocock/skills --skill grill-me
Or place grill-me/SKILL.md under ~/.cursor/skills/ (personal) or .cursor/skills/ (project).
How to call it
Describe your goal in chat and say you want grill-me. Example:
Use grill-me for a design review. I want to add "one blog URL per week" to my X draft agent. Align on requirements before implementation.
What I verified
I ran two passes:
- Meta verification — use grill-me-style questioning to decide this article’s structure (before writing)
- Pre-implementation verification — stress-test a real feature idea on an existing project
1. Meta verification (article outline)
Before drafting, I answered grill-me-style questions one by one. The session ran about 12 questions and settled on:
| # | Example question | Decision |
|---|---|---|
| 1 | Who is the reader? | People who read my harness and X automation posts |
| 2 | Goal of the post? | My verification log, not evangelism |
| 3 | Relation to prior posts? | Contrast output harness vs alignment harness |
| 4 | Verification topic? | Real feature idea (weekly blog URL on X drafts) |
| 5 | Tools? | Cursor — not Claude Code only |
| 6 | Compare to Plan mode? | Re-test points from the Zenn article in my own words |
| 7 | Downsides? | Be honest about fatigue and time cost |
| 8 | Skill text? | Include original with attribution |
| 9 | Diagrams? | A two-type harness table is enough |
| 10 | Internal links | ai/272, ai/318 (and EN equivalents) |
| 11 | English version? | Separate draft after JP publish |
| 12 | Reproducibility? | Install steps and sample prompt |
Takeaway: After 12 questions, the outline and “what counts as verification” were mostly fixed. Compared to Plan mode dumping a full doc at once, one question at a time reduced drift.
2. Pre-implementation verification (X draft agent extension)
Topic: Add a rule to the semi-automated X draft agent to include one blog article URL per week in a post.
| Item | Result (sample — replace with live log if you re-run) |
|---|---|
| Question count | (e.g. 18) |
| Codebase exploration | (e.g. auto-read x-shuuchaku-agent-spec.md) |
| Decided spec | (e.g. Sundays only / short URL within 140 chars / human must approve) |
| Still open | (e.g. English post URLs) |
| Session time | (e.g. ~25 min) |
Sample Q&A (replace with your actual session):
- Q: Fixed weekday or rotation for “once a week”? → Sunday (fits weekly reflection)
- Q: Conflicts with spec “no URLs”? → Needs an explicit exception in the spec file
- Q: Auto-insert or suggest only? → Suggest in draft; human posts (keeps semi-auto policy)
Comparison with Plan mode and JSON harness
| Approach | Strengths | Weaknesses |
|---|---|---|
| Plan mode | Full picture at once | Small edits often rewrite the whole plan |
| JSON harness | Pass/fail on output is mechanical | Does not fix wrong requirements |
| grill-me | Reduces pre-implementation mismatch | Costs time and focus; tiring |
As Ryo Nakae’s Zenn post notes, after grill-me the conversation log becomes the implementation plan — easy to hand off with “OK, build this.” I saw the same pattern.
What was awkward
- Tiring — past ~10 questions, your brain keeps working. The Zenn “brain sizzling” line resonates
- Overkill for tiny tasks — not for typo fixes
- Depth depends on the model — reports that Opus-class models branch deeper sound plausible (add your own note after trying)
- Does not auto-trigger — you must say “use grill-me” explicitly
Summary
- grill-me is a pre-implementation alignment harness in just a few lines
- JSON harness is after output; grill-me is before
- In verification, confusion cleared faster than code would have — for both article structure and a feature sketch
- Downsides are time and fatigue; not every task needs it
Next I may try grill-me → requirements note → implement on the X draft pipeline.
See also Matt Pocock’s grill-me intro on AI Hero and skills.sh.

コメント