Agent skill

methodology

Analyze captured HTTP traffic, design CLI architecture, and implement the Python CLI package. Covers Phase 2 of the pipeline: parse raw-traffic.json, identify protocol type, map endpoints, design Click command groups, implement with parallel subagents. TRIGGER when: "analyze traffic", "design CLI", "implement CLI", "build CLI from network traffic", "generate API wrapper", "reverse engineer web API", "start Phase 2", raw-traffic.json exists and capture is complete, or after the capture skill finishes. DO NOT trigger for: traffic recording (use capture), test writing (use testing), or quality checks (use standards).

View SKILL.md on GitHub Repository

Stars 137

Forks 29

Install this agent skill to your Project

npx add-skill https://github.com/ItamarZand88/CLI-Anything-WEB/tree/main/cli-anything-web-plugin/skills/methodology

SKILL.md

CLI-Anything-Web Methodology (Phase 2)

Analyze captured traffic, design the CLI command structure, and implement the complete Python CLI package. This skill owns the core transformation from raw HTTP traffic to a production-ready CLI.

Prerequisites (Hard Gate)

Do NOT start unless:

raw-traffic.json exists (with WRITE operations, or read-only GET-only traffic)
Auth state was captured during Phase 1 (if the site requires auth)

If raw-traffic.json is missing or has no WRITE operations, invoke the capture skill first.

Exception for read-only sites: If the site is genuinely read-only (search engine, dashboard, analytics viewer with no create/update/delete), the trace may contain only GET requests. In this case, note "read-only site — no write operations" in <APP>.md and proceed. The generated CLI will have read-only commands (list, get, search) but no create/update/delete commands. This is valid.

No-auth sites: If the target site requires no authentication (public API, no login needed), the "Auth state captured" prerequisite does not apply. Note "no-auth site" in <APP>.md and proceed.

Step A: Analyze (API Discovery)

Goal: Map raw traffic to a structured API model.

Process:

Read traffic-analysis.json first (if it exists alongside raw-traffic.json). This file is auto-generated by parse-trace.py or mitmproxy-capture.py → analyze-traffic.py and contains pre-detected protocol type, auth pattern, endpoint grouping, GraphQL operations, batchexecute RPC IDs, and suggested CLI commands. Use it as a starting point — verify its findings and fill in anything marked "unknown" by reading raw-traffic.json manually.

Enhanced analysis (v1.3.0, when captured via mitmproxy-capture.py):
- request_sequence: Timeline-ordered requests with auth flow detection (login → token → API calls)
- session_lifecycle: Cookie inventory, auth cookie identification, session pattern (cookie_auth/token_refresh/no_session)
- endpoint_sizes: Response body size classification per endpoint (small/medium/large) and total data transferred These fields are only present when mitmproxy-capture.py was used. If missing (has_timestamps: false), rely on manual analysis.
If traffic-analysis.json doesn't exist, run the analyzer:
bash
```
python ${CLAUDE_PLUGIN_ROOT}/scripts/analyze-traffic.py \
  <app>/traffic-capture/raw-traffic.json --summary
```
Parse raw-traffic.json (for details the analyzer couldn't extract)
Group requests by base path (e.g., /api/v1/boards/, /api/v1/items/)
For each endpoint group, identify:
- HTTP method (GET/POST/PUT/DELETE/PATCH)
- URL pattern (extract path parameters like :id)
- Query parameters and their types
- Request body schema (JSON fields, types, required/optional)
- Response body schema
- Authentication method (Bearer token, cookie, API key)
- Rate limiting signals (429 responses, retry-after headers)

Identify RPC protocol type -- classify the API transport:

Protocol	Detection Signal	Client Pattern
REST	Resource URLs (`/api/v1/boards/:id`), standard HTTP methods	`client.py` with method-per-endpoint
GraphQL	Single `/graphql` endpoint, `query`/`mutation` in body	`client.py` with query templates
gRPC-Web	`application/grpc-web` content type, binary payloads	Proto-based client
Google batchexecute	`batchexecute` in URL, `f.req=` body, `)]}'\n` prefix	`rpc/` subpackage (see `references/google-batchexecute.md`)
Custom RPC	Single endpoint, method name in body, proprietary encoding	Custom codec module
Public REST API	Documented `/api/` endpoints, OpenAPI spec, JSON responses	Standard `client.py` with httpx
Plain HTML (no framework)	No SPA root, no framework globals, data in `<table>`/`<div>`	`client.py` with httpx + BeautifulSoup4

This determines client architecture in Step B -- REST uses simple client.py, non-REST protocols need a dedicated rpc/ subpackage with encoder/decoder/types.

Detect data model:
- Entity types (boards, items, users, projects...)
- Relationships (board has many items, item belongs to board)
- ID formats (UUID, numeric, slug)
Detect auth pattern:
- Cookie-based sessions
- Bearer/JWT tokens
- OAuth refresh flow
- API key headers
- Browser-delegated auth: tokens embedded in page JavaScript (e.g., WIZ_global_data), not in HTTP headers. Requires CDP for initial cookies, HTTP for token extraction. See references/auth-strategies.md "Browser-Delegated Auth" section.
- No auth / public access: fully public API, no login required. CLI may optionally support API key auth for write operations (e.g., dev.to).
Write <APP>.md -- software-specific SOP document

Output: <APP>.md with API map, data model, auth scheme.

References: traffic-patterns.md, google-batchexecute.md, ssr-patterns.md

Step B: Implement (Code Generation)

Study Existing CLIs First (Critical for Accuracy)

Before implementing, read an existing CLI that uses the same protocol as your target. These are battle-tested implementations that solved the same problems you'll face.

Protocol	Reference CLI	Key files to read
Google batchexecute	`notebooklm/agent-harness/cli_web/notebooklm/`	`core/rpc/encoder.py`, `core/rpc/decoder.py`, `core/client.py`, `core/auth.py`
GraphQL + WAF	`booking/agent-harness/cli_web/booking/`	`core/client.py` (curl_cffi + GraphQL), `core/auth.py` (WAF tokens)
HTML scraping	`futbin/agent-harness/cli_web/futbin/`	`core/client.py` (httpx + BS4), `commands/players.py`
HTML + Cloudflare	`producthunt/agent-harness/cli_web/producthunt/`	`core/client.py` (curl_cffi impersonate)
REST API	`unsplash/agent-harness/cli_web/unsplash/`	`core/client.py`, `commands/photos.py`
Simple HTML	`gh-trending/agent-harness/cli_web/gh_trending/`	Minimal structure example

How to use reference CLIs:

Read the reference CLI's core/client.py — understand the request/response pattern
Read core/auth.py — copy the login_browser() pattern exactly for Google apps
Read core/rpc/ (for batchexecute) — understand encoder/decoder, DO NOT reinvent
Read commands/ — see how Click commands are structured, how --json works
Read utils/helpers.py — see handle_errors(), _resolve_cli(), repl patterns

For batchexecute apps specifically, the notebooklm CLI is your bible:

Copy the encoder/decoder architecture (don't reinvent the batchexecute wire format)
Copy the auth token extraction pattern (CSRF, session ID, build label)
Copy the cookie domain priority logic (critical for Israeli/international users)
Adapt the RPC method IDs and param structures to your target app

The agent implementing the CLI MUST read these files before writing code. Use the Agent tool to dispatch a research agent that reads the reference implementation while you design the command structure.

Design Before You Code

Before writing any code, note the command structure in <APP>.md (10 minutes max):

Map each API endpoint group to a Click command group:
- /api/v1/boards/* → boards command group
- /api/v1/items/* → items command group
Map CRUD operations to subcommands (GET list → list, GET single → get, POST → create, PUT/PATCH → update, DELETE → delete)
Note auth design: auth login, auth status, auth refresh; credentials at ~/.config/cli-web-<app>/auth.json
Note REPL design: bare command enters REPL, branded banner via repl_skin.py

Goal: Generate the complete Python CLI package.

Package Structure

See HARNESS.md "Generated CLI Structure" for the complete package template. Key points: cli_web/ namespace (NO __init__.py), <app>/ sub-package (HAS __init__.py), core/, commands/, utils/, tests/ directories.

Step B.0: Scaffold Core Modules

Run the scaffold generator script to create all boilerplate files:

bash

python ${CLAUDE_PLUGIN_ROOT}/scripts/scaffold-cli.py <app>/agent-harness \
  --app-name <app> \
  --protocol <rest|graphql|html-scraping|batchexecute> \
  --http-client <httpx|curl_cffi> \
  --auth-type <none|cookie|api-key|google-sso> \
  --resources <comma-separated-resources> \
  [--has-polling] [--has-context] [--has-partial-ids]

This generates exceptions.py, client.py skeleton, helpers.py, config.py, output.py, the CLI entry point with REPL, setup.py, conftest.py, repl_skin.py, and (for batchexecute) the rpc/ subpackage.

Fallback: If the script is unavailable, read ${CLAUDE_PLUGIN_ROOT}/skills/boilerplate/SKILL.md and follow its instructions to scaffold manually.

After scaffolding, review the generated files and customize client.py with actual endpoint methods from <APP>.md.

Implementation Rules

exceptions.py -- implement first. Required types: AppError (base), AuthError(recoverable), RateLimitError(retry_after), NetworkError, ServerError(status_code), NotFoundError. See references/exception-hierarchy-example.py for the complete template.
client.py -- HTTP client with exception mapping and auth retry:
- HTTP library choice:
  - httpx (default) — for most sites (REST, GraphQL, batchexecute)
  - curl_cffi — for Cloudflare-protected sites. Uses Chrome TLS fingerprint impersonation to bypass bot detection without cookies or auth:
    python
```
from curl_cffi import requests as curl_requests
resp = curl_requests.get(url, impersonate="chrome")
```
    Use curl_cffi when Phase 1 detects Cloudflare (cf-ray header, challenge page). Add curl_cffi, beautifulsoup4 to setup.py instead of httpx.
- Centralized auth header/cookie injection
- Automatic JSON parsing with response body verification
- Status code → exception mapping: 401/403→AuthError, 404→NotFoundError, 429→RateLimitError, 5xx→ServerError
- Auth retry: On AuthError(recoverable=True), refresh tokens and retry once
- Exponential backoff for rate limits (see references/polling-backoff-example.py)
- For apps with 3+ resource types: split into namespaced sub-clients (client.notebooks.list(), client.sources.add())
- See references/client-architecture-example.py for the full pattern
auth.py -- handles token storage, refresh, expiry. Implementation depends on auth type:

For no-auth sites: DO NOT create auth.py, session.py, or auth command groups. These files are dead code for public APIs and confuse users. The CLI should have NO auth-related files or commands. The only exception is if the site has optional auth (e.g., API key for write operations) — in that case, implement a minimal auth module.

For browser-delegated auth (Google, Microsoft, etc.): Full playwright-cli login flow with cookie domain priority for international users.

See references/auth-strategies.md for all patterns (browser login, cookie priority, API key, env var, context commands). Store cookies at ~/.config/cli-web-<app>/auth.json with chmod 600.
Anti-bot resilient client construction (when detected in Phase 2):
- Extract session tokens via CDP first (cookies), then HTTP GET + HTML parsing (CSRF, session IDs)
- Never hardcode build labels (bl), session IDs (f.sid), or CSRF tokens -- extract dynamically at runtime
- Replicate same-origin headers captured during Phase 1 traffic (e.g., x-same-domain: 1 for Google apps)
- Implement auto-retry on 401/403: re-fetch homepage -> re-extract tokens -> retry once
- See references/google-batchexecute.md for the complete Google pattern
RPC codec subpackage (for non-REST protocols like batchexecute): When the API uses a non-REST protocol, add core/rpc/ with:
- types.py -- method ID enum, URL constants
- encoder.py -- request encoding (protocol-specific format)
- decoder.py -- response decoding (strip prefix, parse chunks, extract results) The client.py still exists but delegates encoding/decoding to rpc/.
Progress feedback -- Use rich>=13.0 spinners for operations >2s (suppress in --json mode). See references/rich-output-example.py.
JSON error output -- --json mode errors are JSON too, not plain text. Standard codes: AUTH_EXPIRED, RATE_LIMITED, NOT_FOUND, SERVER_ERROR, NETWORK_ERROR. Implement via utils/output.py json_error().
All commands use handle_errors(json_mode) context manager — centralizes error handling, exit codes (1=user, 2=system, 130=interrupt), and JSON errors. See references/helpers-module-example.py.
Generation commands support --wait, --retry N, --output path — for agent-scriptable end-to-end workflows. See references/polling-backoff-example.py.

Windows UTF-8 fix — Add at the top of <app>_cli.py before any imports that print:

python

import sys
if sys.stdout.encoding and sys.stdout.encoding.lower() not in ("utf-8", "utf8"):
    try: sys.stdout.reconfigure(encoding="utf-8", errors="replace")
    except AttributeError: pass

HTML table parsers MUST extract ALL visible columns — not just name/price, because missing fields in --json output make the CLI useless for filtering and analysis. If the site shows version, club, nation, stats, skills, weak foot — parse all of them. Empty fields in --json output = incomplete parser.
Entry point: cli-web-<app> via setup.py console_scripts
Namespace: cli_web.*
Copy repl_skin.py from plugin for consistent REPL experience
utils/helpers.py -- shared CLI helpers (generate for every CLI):
- resolve_partial_id(partial, items) — prefix-match UUIDs for get/rename/delete
- handle_errors(json_mode) — context manager replacing try/except in all commands
- require_notebook(notebook_arg) — gets notebook ID from arg or persistent context
- sanitize_filename(name) — safe filenames from artifact titles
- poll_until_complete(check_fn) — exponential backoff polling
- get_context_value(key) / set_context_value(key, value) — persistent context.json See references/helpers-module-example.py for the complete module.

Not all helpers apply to every CLI. Include only what the CLI uses: handle_errors and print_json are always needed. resolve_partial_id only for UUID-based apps. require_notebook/context helpers only for apps with persistent context. poll_until_complete only for generation/async operations.

REPL Implementation Rules (Critical)

These three bugs appear in almost every generated REPL. Get them right the first time:

1. Use shlex.split(), never line.split()

python

# ✓ Correct — handles quoted args: players search "messi" -> ['players', 'search', 'messi']
import shlex
args = shlex.split(line)

# ✗ Wrong — produces: ['players', 'search', '"messi"'] — quotes become part of the value
args = line.split()

2. Never pass **ctx.params to cli.main() in REPL dispatch

python

# ✓ Correct — preserve --json flag by prepending to args
repl_args = ["--json"] + args if ctx.obj.get("json") else args
cli.main(args=repl_args, standalone_mode=False)

# ✗ Wrong — ctx.params = {"json_mode": False} gets passed to Context.__init__()
# which doesn't accept it → TypeError: Context.__init__() got an unexpected
# keyword argument 'json_mode'
cli.main(args=args, standalone_mode=False, **ctx.params)

3. Keep _print_repl_help() in sync with the actual command surface

The _print_repl_help() function in <app>_cli.py is the user's first discovery surface — it's what they see when they type help in the REPL. It must mirror the real commands, including all key options. A REPL that shows outdated or incomplete help is confusing and makes the CLI feel broken.

python

# ✓ Correct — help lists actual options users can pass
def _print_repl_help():
    _skin.info("Available commands:")
    print("  players list [OPTIONS]")
    print("    --position <GK|ST|CM|...>    Filter by position")
    print("    --rating-min N --rating-max N  Rating range")
    print("    --cheapest                   Sort cheapest first")

# ✗ Wrong — stale help doesn't mention new --position, --rating-min, etc.
def _print_repl_help():
    print("  players list [--min-price N]   List players with filters")

Rule: every time you add options to a command, update _print_repl_help() in the same commit.

4. Use @click.argument for positional REPL params, not @click.option("--x", required=True)

REPL commands show players search <query> in help. If query is a --query option, users typing players search messi get "Error: Missing option '--query'". Use positional arguments for natural command-line style:

python

# ✓ Correct — users type: players search messi  OR  players get 21610
@players.command()
@click.argument("query")
def search(query): ...

@players.command()
@click.argument("player_id", type=int)
def get(player_id): ...

# ✗ Wrong — users get an error unless they type: players search --query messi
@players.command()
@click.option("--query", required=True)
def search(query): ...

Rule of thumb: if a command takes a single required value that would be a positional arg in a shell command (git checkout main, grep pattern), use @click.argument. Use @click.option only for optional or named parameters (--rating-min, --platform).

Parallel Implementation (dispatch independent modules as subagents)

When the CLI has 3+ command groups (e.g., notebooks, sources, chat, artifacts), dispatch parallel subagents -- one per command module. Each agent gets:

The <APP>.md API spec for its resource
The client.py and auth.py interfaces it depends on
Clear scope: "Implement commands/notebooks.py with list, get, create, delete"

Parallelization opportunities:

Independent from each other	Dispatch in parallel
`commands/notebooks.py`, `commands/sources.py`, `commands/chat.py`	Yes -- each command file only depends on `client.py`
`rpc/encoder.py` and `rpc/decoder.py`	Yes -- encoder doesn't depend on decoder
`auth.py` and `models.py`	Yes -- no shared logic
`client.py` and `commands/*`	No -- commands depend on client
`<app>_cli.py` (entry point)	Last -- imports all commands, write after they're done

Implementation order (with maximum parallelism):

Phase A (sequential): Write core foundation
  exceptions.py → client.py → auth.py (if needed) → models.py

Phase B (parallel): Dispatch ALL independent work simultaneously
  ┌─ Agent 1: commands/notebooks.py
  ├─ Agent 2: commands/sources.py
  ├─ Agent 3: commands/chat.py
  ├─ Agent 4: commands/artifacts.py
  ├─ Agent 5: rpc/encoder.py + rpc/decoder.py (if non-REST)
  └─ Agent 6 (background): test_core.py (unit tests for core modules)
  All run concurrently — each only depends on Phase A modules

Phase C (sequential): Wire everything together
  utils/helpers.py → <app>_cli.py → __main__.py → setup.py → copy repl_skin.py

Key parallelism rules:

Dispatch independent command modules as parallel subagents (one per commands/*.py file)
Start unit test writing as a background agent during command implementation
Entry point (<app>_cli.py, setup.py) must come last (depends on all commands)

Mandatory Smoke Check (Before Testing Phase)

Before invoking testing, install (pip install -e .) and verify:

cli-web-<app> --help loads
cli-web-<app> auth status --json shows valid (if auth-required)
cli-web-<app> <resource> list --json returns real data
One WRITE command works (if applicable)

Red flags — fix before testing:

wrb.fr, af.httprm in output → decoder broken
[] or null where data expected → wrong params or client-side operation
Wrong field values (e.g., "3" instead of prompt text) → parser index mismatch
Null write response → may be client-side, see references/google-batchexecute.md "Client-Side Operations"

Update phase state:

bash

python ${CLAUDE_PLUGIN_ROOT}/scripts/phase-state.py complete <app> \
  --phase methodology --output <app>/agent-harness/

Next Step

When implementation is complete and the smoke check passes, invoke the testing skill to plan and write tests.

Do NOT skip testing -- every CLI must have comprehensive tests before publishing.

Companion Skills

Skill	When it activates
`capture`	Phase 1 -- traffic recording (prerequisite for this skill)
`testing`	Phase 3 -- test writing, documentation
`standards`	Phase 4 -- publish, verify, smoke test

Integration

Relationship	Skill
Preceded by	`capture` (Phase 1)
Followed by	`testing` (Phase 3)
References	`traffic-patterns.md`, `auth-strategies.md`, `google-batchexecute.md`, `ssr-patterns.md`, `exception-hierarchy-example.py`, `client-architecture-example.py`, `polling-backoff-example.py`, `rich-output-example.py`

Reference Files

references/traffic-patterns.md -- Common API patterns (REST, GraphQL, RPC)
references/auth-strategies.md -- Auth implementation strategies
references/google-batchexecute.md -- Google batchexecute RPC protocol spec
references/ssr-patterns.md -- SSR framework patterns and data extraction strategies
references/exception-hierarchy-example.py -- Complete exception hierarchy with HTTP status mapping
references/client-architecture-example.py -- Namespaced sub-client pattern with auth retry
references/polling-backoff-example.py -- Exponential backoff polling and rate-limit retry
references/rich-output-example.py -- Rich progress bars, JSON error responses, table formatting

Maintainer

ItamarZand88 Core maintainer

Source details

Full Name: ItamarZand88/CLI-Anything-WEB
Branch: main
Path in repo: cli-anything-web-plugin/skills/methodology
License: MIT License
Topics: claude-code cli python playwright claude-code-plugin reverse-engineering web-scraping agent-native

Featured Tools

Join Our Newsletter

Use cli-web-hackernews to browse and interact with Hacker News — top stories, newest, best, Ask HN, Show HN, jobs, search stories/comments, view story details with comments, user profiles, and (with auth) upvote, submit stories, post comments, favorite, hide, view favorites, submissions, and comment threads. Invoke this skill whenever the user asks about Hacker News, HN stories, HN search, trending tech posts, tech news, startup news, or wants to browse/search/interact with Hacker News content. Always prefer cli-web-hackernews over manually fetching the HN website.

137 29

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

CLI-Anything-Web Methodology (Phase 2)

Prerequisites (Hard Gate)

Step A: Analyze (API Discovery)

Step B: Implement (Code Generation)

Study Existing CLIs First (Critical for Accuracy)

Design Before You Code

Package Structure

Step B.0: Scaffold Core Modules

Implementation Rules

REPL Implementation Rules (Critical)

Parallel Implementation (dispatch independent modules as subagents)

Mandatory Smoke Check (Before Testing Phase)

Next Step

Companion Skills

Integration

Reference Files

Recommended Agent Skills

airbnb-cli

chatgpt-cli

notebooklm-cli

unsplash-cli

futbin-cli

hackernews-cli