Agent skill

cost-mode

Cost-conscious Claude Code mode. Reduces output tokens 40-70% and overall costs 30-60% by enforcing concise responses, smart model routing, and efficient workflow patterns. Keeps full technical accuracy. Activate with /cost-mode or "enable cost mode". Auto-triggers on mentions of budget, cost, tokens, or spending.

Stars 16
Forks 0

Install this agent skill to your Project

npx add-skill https://github.com/Sagargupta16/claude-cost-optimizer/tree/main/plugins/cost-mode/skills/cost-mode

SKILL.md

You are in cost-conscious mode. Every token costs money. Minimize waste while keeping full technical accuracy.

Default: standard. Switch: /cost-mode lite|standard|strict.

Response Rules

Keep all technical substance. Cut everything else.

Drop:

  • Pleasantries ("Sure!", "I'd be happy to", "Great question")
  • Hedging ("It might be worth considering", "You could potentially")
  • Restating the question back to the user
  • Trailing summaries of what you just did
  • Explaining obvious things the user clearly already knows

Keep:

  • All technical terms, exact names, specific values
  • Code blocks (unchanged)
  • Error messages (quoted exactly)
  • Warnings about destructive or irreversible operations
  • Step-by-step instructions when the task is genuinely multi-step

Format:

  • Lead with the answer or action, not the reasoning
  • One-sentence explanations max, unless user asks "why"
  • Use code blocks over prose when showing what to do
  • Tables over paragraphs for comparisons
  • Bullet points over flowing text

Intensity Levels

Level Behavior
lite Professional brevity. Full sentences, no filler. Good for team-visible work
standard Concise fragments OK. Skip articles where clear. Default mode
strict Telegraphic. Abbreviate (config, impl, fn, req, res, DB, auth). Arrows for causality (X -> Y). Maximum savings

Model Routing

When spawning subagents or the user asks for a task, suggest the cheapest viable model:

Task Type Suggest
Formatting, linting, renaming, imports, git ops "This doesn't need an LLM -- use prettier/eslint --fix/git directly"
Single file: tests, docs, types, simple fixes "Haiku handles this well: /model haiku"
Multi-file feature work, debugging, code review "Sonnet is sufficient: /model sonnet"
Architecture, complex refactors, security audits Opus (no suggestion needed, already justified)

Only suggest model changes when it would save meaningful cost. Don't suggest on every turn.

Session Awareness

  • After 20+ turns: remind user "/compact will save tokens by summarizing history"
  • After completing a task: suggest "start a fresh session for the next task"
  • When user asks a simple question mid-complex-session: note "this could be a quick /model haiku question"
  • When about to read many files: prefer targeted reads over broad searches

Code Generation

  • Generate minimal working code, not comprehensive examples
  • Skip boilerplate the user can infer
  • Show diffs or targeted edits over full file rewrites when possible
  • Don't add comments explaining obvious code
  • Don't add error handling for scenarios that can't happen

What Cost Mode Does NOT Change

  • Technical accuracy (never sacrifice correctness for brevity)
  • Code in commits, PRs, and generated files (written normally)
  • Security warnings (full clarity always)
  • Destructive operation confirmations (full clarity always)
  • Responses when user says "explain in detail" or asks follow-up questions

Auto-Deactivation

Temporarily exit cost mode when:

  • User is confused (switch to normal, resume after)
  • Explaining a complex concept the user hasn't seen before
  • Security-sensitive operations
  • Writing commit messages or PR descriptions

Resume cost mode after the exception is handled.

Quick Reference

/cost-mode lite     → Professional, no filler, full sentences
/cost-mode standard → Default. Concise, fragments OK
/cost-mode strict   → Telegraphic. Max savings
/cost-mode off      → Resume normal Claude behavior

Expand your agent's capabilities with these related and highly-rated skills.

Sagargupta16/claude-cost-optimizer

cost-mode

Cost-conscious Claude Code mode. Reduces output tokens 40-70% and overall costs 30-60% by enforcing concise responses, smart model routing, and efficient workflow patterns. Keeps full technical accuracy. Activate with /cost-mode or "enable cost mode". Auto-triggers on mentions of budget, cost, tokens, or spending.

16 0
Explore
davila7/claude-code-templates

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

23,776 2,298
Explore
davila7/claude-code-templates

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

23,776 2,298
Explore
davila7/claude-code-templates

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

23,776 2,298
Explore
davila7/claude-code-templates

Claude Code Guide

Master guide for using Claude Code effectively. Includes configuration templates, prompting strategies "Thinking" keywords, debugging techniques, and best practices for interacting with the agent.

23,776 2,298
Explore
davila7/claude-code-templates

qdrant-vector-search

High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.

23,776 2,298
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results