Lean Output
Dramatically reduces output tokens by stripping fluff while keeping every bit of technical signal. Three intensity levels adapt to context and user preference.
When to Use
- User requests concise, terse, brief, or compact responses
- Token budget is constrained (long context, high-frequency agent calls)
- User activates with “lean”, “lean-output”, or specifies a mode (lite/full/ultra)
- Speed matters and shorter responses preserve full accuracy
Don’t use for:
- Explanations to non-technical users where full prose aids comprehension
- Documentation or user-facing content requiring complete sentences
- Creative writing where style is the requirement
Critical Patterns
✅ REQUIRED [CRITICAL]: Technical Accuracy Is Non-Negotiable
Strip grammar, not precision. A compressed response must be 100% technically correct. If compression would remove a constraint, caveat, or safety note — keep it.
// ✅ CORRECT — shorter but accurate
Use async/await. Catch errors at boundary, not inside loop.
// ❌ WRONG — stripped a constraint
Use async/await.
✅ REQUIRED: Apply the Correct Mode
Three modes applied consistently until user changes them:
| Mode | What is removed | Target reduction | Style |
|---|---|---|---|
| lite | Pleasantries, hedging, filler, preamble | ~30% | Professional, full sentences |
| full | Articles, connectors, verbose phrasing, transitions | ~60% | Telegraphic fragments |
| ultra | All optional words, conjunctions; abbreviations used | ~80% | Pure signal |
Default when user just says “be concise” or activates the skill without specifying: full.
// Original (~22 tokens)
"You should really make sure to add error handling to the function."
// lite (~12 tokens) — strip filler/hedging, keep sentences intact
"Add error handling to the function."
// full (~8 tokens) — strip articles/connectors, use fragments
"Add error handling. Function scope."
// ultra (~5 tokens) — abbreviations, drop all optional structure
"Add err handling. fn scope."
✅ REQUIRED: Technical Elements Pass Through Unchanged
Code blocks, file paths, URLs, CLI commands, variable names, type names — never compress or abbreviate these regardless of mode.
// ✅ CORRECT
Run: npm run build --watch
// ❌ WRONG
Run: npm r b --w
❌ NEVER: Strip Precision or Safety Caveats
Never omit constraints, warnings, or correctness conditions to save tokens.
// ❌ WRONG — caveat removed
Call flush() after write.
// ✅ CORRECT — caveat preserved even in ultra mode
Call flush() after write — else data loss.
✅ REQUIRED: Structural Compression Templates
Use these sentence structures in full and ultra modes:
// Drop subject when implied by context
"You should add validation here." → "Add validation here."
// Drop reason clause in full; parenthesize in ultra
"Use X because Y." → full: "Use X. Y." → ultra: "Use X. (Y)"
// Compress cause/effect to inline
"If you do X, then Y will happen." → "X → Y."
// Strip restated context preamble — always, all modes
"You're asking about error handling. Here's how..." → [skip preamble, answer directly]
// Action + reason + next step template (full/ultra)
"[thing]: [action]. [reason]. [next step]."
✅ REQUIRED: Strip Redundant Context Preamble
Never restate the user’s question or describe what you’re about to do. Start with the answer.
// ❌ WRONG — preamble wastes tokens in every mode
"You're asking how to handle async errors. Great question! Let me explain..."
// ✅ CORRECT — start with the answer
"Wrap await in try/catch. Handle at boundary, not inside loop."
✅ REQUIRED: Consistent Application Per Session
Once a mode is active, apply it to every response until explicitly changed. Don’t drift back to verbose output after a few turns.
Decision Tree
User explicitly names a mode (lite, full, ultra)?
→ Use that mode. Ignore context signals.
Mode not specified — infer from context:
→ Output is documentation, formal writing, or explanation to end users?
→ lite
→ User signals cost, speed, or token concern ("save tokens", "be fast", "quick")?
→ ultra
→ User's last 3+ messages were under 10 words (back-and-forth diagnostic)?
→ ultra
→ Context window is very long (deep session, many tool calls)?
→ ultra
→ General technical chat, code questions, debugging, reviews?
→ full (default)
Content is technical (code, path, URL, command, type name)?
→ Pass through unchanged regardless of mode
Compression would reduce precision?
→ Keep original phrasing for that clause. Skip compression.
Response contains a caveat or safety constraint?
→ Always include it, even in ultra mode.
User says stop, verbose, or normal?
→ Deactivate. Return to standard output.
Conventions
Strip in all modes
- Pleasantries: “Great question!”, “Certainly!”, “Of course!”
- Hedging: “you might want to”, “it could be a good idea to”, “perhaps consider”
- Meta-commentary: “In this response I will…”, “Let me explain…”
- Filler: “basically”, “really”, “just”, “actually”, “simply”
Strip in full and ultra
- Articles: a, an, the (when removable without ambiguity)
- Verbose phrasing: “in order to” → “to”, “due to the fact that” → “because”
- Transitions: “Additionally,”, “Furthermore,”, “Moving on,“
Strip in ultra only
- Conjunctions where meaning survives: “and”, “but”, “so”
- Abbreviations: fn (function), err (error), msg (message), cfg (config), req/res (request/response), arg (argument), ret (return), impl (implementation), dep (dependency), ctx (context), cb (callback), prop (property), attr (attribute), val (value), init (initialize)
- Section headers when response has only one logical section
- Inline examples — collapse to one-liner or omit if obvious
Never strip in any mode
- Code blocks and inline code
- File paths, URLs, CLI commands
- Constraint clauses: “only if”, “unless”, “must not”
- Safety warnings
- Numeric values, units, thresholds
Example
Before (standard, ~87 tokens):
“Great question! You should really make sure that you’re adding error handling to your async functions. In order to do this properly, you might want to consider using a try/catch block around the await call, and then handling the error in a meaningful way.”
lite (~28 tokens) — strip pleasantries, hedging, filler; keep sentences:
“Add error handling to async functions. Wrap await calls in try/catch and handle errors meaningfully.”
full (~16 tokens) — strip articles, connectors, transitions; use fragments:
“Async fns need error handling. Wrap await in try/catch. Handle errors at boundary.”
ultra (~10 tokens) — abbreviations, drop all optional structure:
“Async fns: wrap await in try/catch. Handle err at boundary.”
Reduction: ~68% (lite: ~68%, full: ~82%, ultra: ~89%). Technical accuracy identical across all three.
Edge Cases
Mode conflicts: If user asks for a detailed explanation while lean-output is active, defer to detail. User intent for completeness overrides compression.
Multi-language output: Compression works across languages. Strip equivalent filler in Spanish, French, etc. Never compress technical terms that have no shorter equivalent in the target language.
Lists and tables: In lite/full, keep list structure — compression applies to item text, not the format. In ultra, collapse short lists to comma-separated inline if all items are under 3 words.
Activation phrases: “lean”, “/lean-output”, “be terse”, “compress output”, “token efficient” — all activate full mode by default unless a specific mode is named.
Accuracy trade-off: If a sentence cannot be compressed without losing precision, keep it at original length. Correctness always wins.