Agent skill

tokf-filter

This skill should be used when the user asks to "create a filter", "write a tokf filter", "add a filter for <tool>", "how do I filter output", or needs guidance on tokf filter step types, templates, pipes, or placement conventions.

Stars 147
Forks 14

Install this agent skill to your Project

npx add-skill https://github.com/mpecan/tokf/tree/main/crates/tokf-cli/skills/tokf-filter

SKILL.md

tokf Filter Authoring

You are an expert at writing tokf filter files. tokf is a config-driven CLI that compresses command output before it reaches an LLM context. Filters are TOML files that define how to process a command's output.

When the user asks you to create or modify a filter, follow this guide exactly. Produce valid, idiomatic TOML that matches the schema described below.


Section 1 — What a Filter File Is

A filter file is a TOML file that describes:

  • Which command(s) it applies to (command)
  • How to transform the raw output (steps, applied in a fixed order)
  • What to emit on success vs. failure

Filters live in three places, searched in priority order:

  1. .tokf/filters/ — project-local (repo-level overrides)
  2. ~/.config/tokf/filters/ — user-level overrides
  3. Built-in library (embedded in the tokf binary)

First match wins. Use tokf which "cargo test" to see which filter would activate for a given command.


Section 2 — Processing Order

Steps execute in this fixed order — do not rearrange them:

  1. match_output — whole-output substring checks; if matched, short-circuits the entire pipeline and emits immediately
  2. [[replace]] — per-line regex transforms applied to every line, in array order
  3. strip_ansi / trim_lines — per-line cleanup (ANSI stripping, whitespace trimming)
  4. skip / keep — line-level filtering (drop or retain lines by regex)
  5. dedup / dedup_window — collapse duplicate consecutive lines
  6. lua_script — Luau escape hatch; runs after dedup, before JSON/section/parse
  7. [json] — JSON extraction via JSONPath; when configured, replaces section/parse/chunk
  8. [[section]] OR [parse] — structured extraction (these are mutually exclusive; section is a state machine, parse is a declarative grouper). Skipped when [json] is configured.
  9. [[chunk]] — block-based structured extraction with per-block aggregation, grouping, and tree output (runs on raw output, alongside sections). Skipped when [json] is configured.
  10. Exit-code branch[on_success] or [on_failure] depending on exit code
  11. [fallback] — if neither on_success nor on_failure produced output
  12. strip_empty_lines / collapse_empty_lines — post-processing cleanup on the final output

Within [on_success] and [on_failure], fields are processed as:

  • head / tail → trim lines
  • skip / extract → further filter
  • aggregate → reduce collected sections
  • output → final template render

Section 3 — Top-Level Fields Reference

Field Type Default Description
command string or array of strings required Command pattern(s) to match. Supports * wildcard.
run string (same as command) Override the actual command executed. Use {args} to forward arguments.
match_output array of tables [] Whole-output checks. Short-circuit on first match.
[[replace]] array of tables [] Per-line regex replacements, in order.
skip array of strings (regex) [] Drop lines matching any regex.
keep array of strings (regex) [] Retain only lines matching any regex. (Inverse of skip.)
dedup bool false Collapse consecutive identical lines.
dedup_window integer 0 (off) Dedup within a sliding window of N lines.
strip_ansi bool false Strip ANSI escape sequences before skip/keep.
trim_lines bool false Trim leading/trailing whitespace from each line.
lua_script table (absent) Luau escape hatch.
[json] table (absent) JSON extraction via JSONPath. When configured, replaces [[section]]/[parse]/[[chunk]].
[[section]] array of tables [] State-machine section collectors.
[[chunk]] array of tables [] Block-based structured extraction with per-block aggregation and grouping.
[parse] table (absent) Declarative structured parser (branch + group).
[on_success] table (absent) Output branch for exit code 0.
[on_failure] table (absent) Output branch for non-zero exit.
[output] table (absent) Top-level output template (used by [parse]).
[fallback] table (absent) Fallback when no branch matched.
strip_empty_lines bool false Remove all blank lines from the final output.
collapse_empty_lines bool false Collapse consecutive blank lines into one.
show_history_hint bool false Append a hint line after filtered output pointing to the full output in history.
[[variant]] array of tables [] Context-aware delegation to specialized child filters.

Section 4 — Step Types

4.1 match_output — Whole-Output Short-Circuit

Check the entire raw output for a substring. If matched, emit a fixed string and stop — no further processing.

toml
match_output = [
  { contains = "Everything up-to-date", output = "ok (up-to-date)" },
  { contains = "rejected", output = "✗ push rejected (try pulling first)" },
]
  • contains: literal substring to search for (case-sensitive)
  • output: string to emit if matched
  • {line_containing} template variable: the first line that contains the substring
toml
match_output = [
  { contains = "error", output = "Error on: {line_containing}" },
]

When to use: for well-known one-liner outcomes that make the rest of filtering irrelevant (e.g., "already up to date", "nothing to push", "authentication failed").


4.2 [[replace]] — Per-Line Regex Transforms

Applied to every line, in array order, before skip/keep. Use to reformat noisy lines.

toml
[[replace]]
pattern = '^(\S+)\s+\S+\s+(\S+)\s+(\S+)'
output = "{1}: {2} → {3}"

[[replace]]
pattern = '^\s+Compiling (\S+) v(\S+)'
output = "compiling {1}@{2}"
  • pattern: Rust regex (RE2 syntax, no lookaheads)
  • output: template with {1}, {2}, … for capture groups; {0} is the full match
  • If the pattern doesn't match a line, that line passes through unchanged
  • Invalid patterns are silently skipped at runtime

When to use: when a line contains useful information but in a verbose format — reformat it rather than dropping it.


4.3 skip / keep — Line Filtering

skip drops lines matching any regex. keep retains only lines matching any regex. They compose:

toml
skip = [
  "^\\s*Compiling ",
  "^\\s*Downloading ",
  "^\\s*$",
]

keep = ["^error", "^warning"]
  • Both are arrays of regex strings
  • Applied after [[replace]]
  • skip is checked first, then keep
  • A line must pass both: not skipped, and (if keep is non-empty) matching keep

When to use: skip for removing known noise patterns; keep for allow-listing (e.g., keep only lines that start with error or warning).

Also available inside [on_success] and [on_failure] for branch-level filtering.


4.4 dedup / dedup_window — Deduplication

toml
dedup = true           # collapse consecutive identical lines
dedup_window = 10      # dedup within a 10-line sliding window
  • dedup = true: removes consecutive duplicate lines (like uniq)
  • dedup_window = N: deduplicates within a sliding window of N lines (catches near-consecutive repeats)
  • They are independent; you can use both

When to use: for commands that emit repetitive progress lines (e.g., npm install printing the same package multiple times, spinner frames, repeated warnings).


4.5 lua_script — Luau Escape Hatch

For logic that pure TOML cannot express: numeric math, multi-line lookahead, conditional branching.

toml
[lua_script]
lang = "luau"
source = '''
if exit_code == 0 then
    return "passed"
else
    local msg = output:match("Error: (.+)") or "unknown error"
    return "FAILED: " .. msg
end
'''

Or load the script from an external file:

toml
[lua_script]
lang = "luau"
file = "scripts/my-filter.luau"

The file path resolves relative to the current working directory. Exactly one of source or file must be set.

Globals available:

  • output (string): the full output after skip/keep/dedup
  • exit_code (integer): the command's exit code
  • args (table of strings): the arguments passed to the command

Return semantics:

  • Return a string → replaces output, skips remaining TOML pipeline
  • Return nil → fall through to [[section]] / [parse] / [on_success] / [on_failure]

Sandbox: io, os, and package are blocked. No filesystem or network access. Standard math/string/table libraries are available.

When to use: only when no TOML step can express the logic. Most filters do not need this. Consider it after exhausting match_output, skip/keep, [[replace]], [[section]], and [parse].


4.6 [json] — JSON Extraction via JSONPath

For commands that produce JSON output (e.g. kubectl get pods -o json, gh api, docker inspect). Extracts values using JSONPath (RFC 9535) queries and produces template variables and structured collections.

toml
[json]

[[json.extract]]
path = "$.items[*]"
as = "pods"

[[json.extract.fields]]
field = "metadata.name"
as = "name"

[[json.extract.fields]]
field = "status.phase"
as = "phase"

[on_success]
output = "Pods ({pods_count}):\n{pods | each: \"  {name}: {phase}\" | join: \"\\n\"}"

[[json.extract]] fields:

Field Type Required Description
path string yes JSONPath expression (RFC 9535), e.g. "$.items[*]", "$.version"
as string yes Variable name to bind the result to
fields array of tables no Sub-field extraction for each matched object

[[json.extract.fields]] fields:

Field Type Required Description
field string yes Dot-separated path within each object (e.g. "metadata.name", "containers.0.name"). Not JSONPath — uses simple dot-notation. Supports numeric array indices.
as string yes Variable name for the extracted value

Result mapping:

  • Single scalarvars["as_name"] = string_value (no count, no chunk)
  • ArrayChunkData::Flat collection + {as_name_count} variable
  • Objects without fields → top-level scalars auto-flattened into chunk items
  • Objects with fields → named fields extracted per item

Pipeline behavior: when [json] is configured, [[section]], [parse], and [[chunk]] are skipped. JSON replaces line-based structural processing. Extracted vars and chunks flow into [on_success]/[on_failure] template rendering.

Error handling: invalid JSON input → extraction skipped, pipeline falls back to raw output (templates are not rendered). Invalid JSONPath → rule silently skipped, other rules still run. Empty array with fields → emits {as_name_count} = "0".

When to use: when the command produces structured JSON output and you need to extract specific fields. Prefer this over [parse] + skip/keep for JSON-native commands.


4.7 [[section]] — State-Machine Section Collector

The most powerful step. Defines a state machine that collects lines into named variables as it scans top-to-bottom.

toml
[[section]]
name = "failures"
enter = "^failures:$"      # regex: start collecting when this matches
exit = "^failures:$"       # regex: stop collecting when this matches (after start)
split_on = "^\\s*$"        # regex: split collected lines into blocks on blank lines
collect_as = "failure_blocks"

[[section]]
name = "summary"
match = "^test result:"    # regex: collect only lines matching this (no enter/exit)
collect_as = "summary_lines"

Fields:

Field Required Description
name yes Identifier for this section (used in error messages)
enter no Regex to start collecting (state transitions to "inside")
exit no Regex to stop collecting (state transitions to "outside")
match no Collect any line matching this regex, without enter/exit state
split_on no Split collected lines into blocks when this regex matches
collect_as yes Variable name to bind the result to

Accessing collected variables in templates:

Expression Type Description
{name} string Full collected text joined with newlines
{name.lines} collection Individual lines as a list
{name.blocks} collection Blocks split by split_on
{name.count} integer Number of blocks (or lines if no split_on)

When to use: when the output has distinct sections with clear start/end markers — test failure blocks, error sections, file change groups.


4.8 [[chunk]] — Block-Based Structured Extraction

Chunks split raw output into repeating structural blocks (e.g., per-crate test suites in a Cargo workspace), extract structured data per-block, and produce named collections for template rendering. Like sections, chunks operate on the raw (unfiltered) command output — skip/keep patterns do not affect chunk processing.

toml
[[chunk]]
split_on = "^\\s*Running "       # regex marking the start of each chunk
include_split_line = true         # include the splitting line in the chunk (default: true)
collect_as = "suites_detail"      # name for the structured collection
group_by = "crate_name"           # merge chunks sharing this field value
children_as = "children"          # preserve original items as nested collection

[chunk.extract]
pattern = 'unittests.+deps/([\w_-]+)-'  # extract a field from the header line
as = "crate_name"
carry_forward = true              # inherit value from previous chunk when pattern doesn't match

[[chunk.body_extract]]
pattern = 'Running\s+(.+?)\s+\('
as = "suite_name"

[[chunk.aggregate]]
pattern = '(\d+) passed'          # aggregates run within each chunk's lines
sum = "passed"

[[chunk.aggregate]]
pattern = '^test result:'
count_as = "suite_count"

Fields:

Field Type Required Description
split_on string (regex) yes Regex marking the start of each chunk
include_split_line bool no Whether the splitting line is part of the chunk (default: true)
collect_as string yes Name for the resulting structured collection
extract table no Extract a named field from the header line (pattern + as)
body_extract array of tables no Extract fields from body lines (pattern + as, first match wins)
aggregate array of tables no Per-chunk aggregation rules (pattern + sum/count_as)
group_by string no Merge chunks sharing the same field value, summing numeric fields
children_as string no When set with group_by, preserve original items as a nested collection

carry_forward (on extract or body_extract): when a chunk's pattern doesn't match, inherit the value from the most recent chunk that did. Useful when boundary markers (like Running unittests) identify a group, and subsequent chunks should inherit that identity.

Structured collections in templates: each item has named fields accessible in each pipes:

toml
[on_success]
output = """\
{suites_detail | each: "  {crate_name}: {passed} passed ({suite_count} suites)" | join: "\\n"}"""

Tree output with children_as: groups preserve their child items for nested template rendering:

toml
[on_success]
output = """\
{suites_detail | each: "  {crate_name}: {passed} passed\\n{children | each: \"    {suite_name}: {passed} passed\" | join: \"\\n\"}" | join: "\\n"}"""

When to use: when output contains repeating structural blocks with per-block data you want to aggregate and display. Common for workspace build tools (Cargo, Gradle, Nx) where output is organized by sub-project.


4.9 [parse] — Declarative Structured Parser

Alternative to [[section]] for commands with table-like output. Declaratively extracts a header field and groups remaining lines.

toml
[parse]
branch = { line = 1, pattern = '## (\S+?)(?:\.\.\.(\S+))?(?:\s+\[(.+)\])?$', output = "{1}" }

[parse.group]
key = { pattern = '^(.{2}) ', output = "{1}" }
labels = { "M " = "modified", "??" = "untracked", "D " = "deleted" }

[output]
format = """
{branch}{tracking_info}
{group_counts}"""
group_counts_format = "  {label}: {count}"
empty = "clean — nothing to commit"

[parse] fields:

Field Description
branch Extract a single value from a specific line (line, pattern, output)
[parse.group] Group remaining lines by a key pattern

[parse.group] fields:

Field Description
key { pattern, output } — extract the grouping key from each line
labels Map from raw key string to human-readable label

[output] fields (used with [parse]):

Field Description
format Template string for the overall output
group_counts_format Template for each group entry: {label}, {count}
empty String to emit when no lines were grouped

When to use: for commands like git status, docker ps, kubectl get — table-formatted output where you want to extract a header and count/group rows.


4.10 [on_success] / [on_failure] — Exit Code Branches

These branches run after all top-level steps. They have their own sub-fields:

toml
[on_success]
output = "ok ✓ {2}"          # template; collected variables are available
head = 20                     # keep first N lines
tail = 10                     # keep last N lines
skip = ["^\\s*$"]            # additional line filtering
extract = { pattern = '(\S+)\s*->\s*(\S+)', output = "ok ✓ {2}" }

# Singular form (one rule):
aggregate = { from = "summary_lines", pattern = 'ok\. (\d+) passed', sum = "passed", count_as = "suites" }

# Plural form (multiple rules):
# [[on_success.aggregates]]
# from = "summary_lines"
# pattern = 'ok\. (\d+) passed'
# sum = "passed"
# count_as = "suites"
#
# [[on_success.aggregates]]
# from = "summary_lines"
# pattern = '(\d+) failed'
# sum = "failed"

[on_failure]
tail = 10
output = "FAILED: {summary_lines | join: \"\\n\"}"

Branch sub-fields:

Field Description
output Template string for the output. Has access to all [[section]] and [[chunk]] variables. {output} = the filtered output text.
head Keep first N lines of filtered output
tail Keep last N lines of filtered output
skip Array of regexes to filter output lines within this branch
extract { pattern, output } — find first match, render template with capture groups
aggregate Reduce collected section lines into numeric summaries (singular form)
aggregates Array of aggregate rules (plural form — use [[on_success.aggregates]])

aggregate / aggregates fields:

Field Description
from Variable name (a collect_as result from [[section]])
pattern Regex with one capture group to extract a number
sum Variable name to bind the sum to
count_as Variable name to bind the count (number of lines matched) to

Both singular aggregate and plural aggregates can be used together — they are merged at runtime.

When to use: Always. Every filter should have at least one of [on_success] or [on_failure]. Use [on_success] to produce a clean summary. Use [on_failure] to show enough context to diagnose the issue.


4.11 [fallback] — Last Resort

Emits output when neither [on_success] nor [on_failure] produced anything.

toml
[fallback]
tail = 5

When to use: as a safety net when you have complex branching logic. Ensures tokf never silently swallows output.


4.12 [[variant]] — Context-Aware Filter Delegation

Some commands are wrappers around different underlying tools (e.g. npm test may run Jest, Vitest, or Mocha). A parent filter can declare [[variant]] entries that delegate to specialized child filters based on project context.

toml
command = ["npm test", "pnpm test", "yarn test"]

strip_ansi = true
skip = ["^> ", "^\\s*npm (warn|notice|WARN|verbose|info|timing|error|ERR)"]

[on_success]
output = "{output}"

[on_failure]
tail = 20

[[variant]]
name = "vitest"
detect.files = ["vitest.config.ts", "vitest.config.js", "vitest.config.mts"]
filter = "npm/test-vitest"

[[variant]]
name = "jest"
detect.files = ["jest.config.js", "jest.config.ts", "jest.config.json"]
filter = "npm/test-jest"

Fields:

Field Type Required Description
name string yes Human-readable identifier for this variant
detect.files array of strings no File paths to check in CWD (pre-execution detection)
detect.output_pattern string (regex) no Regex to match against command output (post-execution fallback)
filter string yes Filter to delegate to (relative path without .toml, e.g. "npm/test-vitest")

Two-phase detection:

  1. File detection (before execution) — checks if any listed config files exist in the current directory. First match wins.
  2. Output pattern (after execution) — regex-matches the command output. Used as a fallback when no file was detected.

At least one of detect.files or detect.output_pattern must be set.

Behavior:

  • When a variant matches, the child filter replaces the parent entirely — no field inheritance or merging
  • When no variant matches, the parent filter's own fields (skip, on_success, etc.) apply as the fallback
  • The filter field references another filter by its discovery name (e.g. "npm/test-vitest" maps to filters/npm/test-vitest.toml)

TOML ordering: [[variant]] entries must appear after all top-level fields (skip, [on_success], etc.) because TOML array-of-tables sections capture subsequent keys.

When to use: when a single command pattern maps to different underlying tools that produce fundamentally different output formats. Create a parent filter with a generic fallback, then create specialized child filters for each tool.


Section 5 — Template Pipes

Output templates support pipe chains: {var | pipe | pipe: "arg"}.

Pipe Input → Output Description
lines Str → Collection Split string on newlines into a list
join: "sep" Collection → Str Join list items with separator string
each: "tmpl" Collection → Collection Map each item through a sub-template; {value} = item, {index} = 1-based index. For structured collections (from chunks), all named fields are also available (e.g. {crate_name}, {passed}).
keep: "re" Collection → Collection Retain items matching the regex
where: "re" Collection → Collection Alias for keep:
truncate: N Str → Str Truncate to N characters, appending

Examples:

Filter a multi-line output variable to only error lines:

toml
[on_failure]
output = "{output | lines | keep: \"^error\" | join: \"\\n\"}"

For each collected block, show only > (pointer) and E (assertion) lines:

toml
[on_failure]
output = "{failure_blocks | each: \"{value | lines | keep: \\\"^[>E] \\\"}\" | join: \"\\n\"}"

Truncate long lines and number them:

toml
[on_failure]
output = "{summary_lines | each: \"{index}. {value | truncate: 120}\" | join: \"\\n\"}"

Section 6 — Naming & Placement Conventions

File naming:

  • filters/<tool>/<subcommand>.toml for two-word commands: filters/git/push.toml for git push
  • filters/<tool>.toml for single-word commands: filters/pytest.toml for pytest
  • For wildcards: filters/npm/run.toml with command = "npm run *" in the TOML
  • Lowercase filenames only, no spaces

Placement:

Location Purpose
.tokf/filters/ Project-local override (committed to the repo)
~/.config/tokf/filters/ User-level override (your personal filters)
filters/ in the tokf source repo Built-in library (requires a tokf release)

When creating a filter for a user's project, default to .tokf/filters/ unless they specify otherwise.

Command field:

  • Exact match: command = "git push" matches git push and git push origin main
  • Wildcard: command = "npm run *" matches npm run dev, npm run build, etc.
  • Array: command = ["cargo test", "cargo t"] matches either form

Section 7 — Workflow for Creating a New Filter

Follow these steps when asked to create a filter:

Step 1: Understand the command's output

Ask the user to provide (or capture) example output from the command. If they don't have it, generate a plausible example based on the tool's known output format. Look for:

  • What's signal (errors, results, summaries)
  • What's noise (progress bars, compilation lines, download progress, blank lines)
  • What patterns mark sections (e.g., "failures:", "test result:")

Step 2: Choose the right complexity level

Level When to use Steps to use
Level 1 (simple) Command produces one-liner outcomes match_output, skip, extract
Level 2 (structured) Table-like output needing grouping [parse] + [output]
Level 2J (JSON) Command produces JSON output [json] + on_success/on_failure templates
Level 3 (stateful) Multi-section output with nested structure [[section]] + aggregate + pipes
Level 4 (chunked) Repeating blocks with per-block aggregation (workspaces) [[chunk]] + [[section]] + aggregates + tree templates

Start at the lowest level that handles the use case. Don't reach for [[section]] when skip + extract suffices.

Step 3: Draft the filter

  1. Set command to match the command pattern
  2. Add match_output for well-known short-circuit cases (empty output, auth failure, "already done")
  3. Add skip to drop noise lines (progress, compile output, blank lines)
  4. Add [[replace]] to reformat noisy-but-useful lines
  5. Add [[section]] or [parse] if you need structured extraction
  6. Write [on_success] with the desired output format
  7. Write [on_failure] with enough context to diagnose (tail = 20 is a safe default)
  8. Add [fallback] with tail = 5 as a safety net for complex filters

Step 4: Write test cases and verify

Create a <stem>_test/ directory adjacent to the filter TOML and add at least one test case per meaningful outcome (success, failure, edge cases):

filters/mytool/
  mysubcmd.toml       ← filter config
  mysubcmd_test/      ← test suite
    success.toml
    failure.toml

Each test case is a TOML file:

toml
name = "success shows one clean line"
fixture = "tests/fixtures/mytool_success.txt"  # path relative to this file, then CWD
exit_code = 0

[[expect]]
equals = "ok ✓"

[[expect]]
not_contains = "noise"

Or with an inline fixture (no file needed):

toml
name = "known error message"
inline = "Error: connection refused\n"
exit_code = 1

[[expect]]
contains = "connection refused"

Run the suite:

sh
tokf verify mytool/mysubcmd   # run one suite
tokf verify                   # run all suites

For quick one-off testing without creating test files:

sh
tokf apply filters/mytool/mysubcmd.toml tests/fixtures/mytool_output.txt --exit-code 0

Step 5: Place and name the file correctly

  • Two-word command: .tokf/filters/mytool/mysubcmd.toml
  • Single-word command: .tokf/filters/mytool.toml
  • Wildcard command: .tokf/filters/mytool/run.toml with command = "mytool run *"

Section 8 — Three Annotated Examples

Example 1: git push (Level 1 — match_output + extract)

Goal: 15 lines of push noise → "ok ✓ main" (or failure message).

toml
# filters/git/push.toml — Level 1
# Raw output: 15 lines of object counting, compression, "remote:" lines
# Filtered (success): "ok ✓ main"
# Filtered (up-to-date): "ok (up-to-date)"
# Filtered (rejected): "✗ push rejected (try pulling first)"

command = "git push"

# Check full output for well-known outcomes before any processing
match_output = [
  { contains = "Everything up-to-date", output = "ok (up-to-date)" },
  { contains = "rejected", output = "✗ push rejected (try pulling first)" },
]

[on_success]
# Drop all the noise lines
skip = [
  "^Enumerating objects:",
  "^Counting objects:",
  "^Delta compression",
  "^Compressing objects:",
  "^Writing objects:",
  "^Total \\d+",
  "^remote:",
  "^To ",
]
# Extract the branch name from the ref update line: "abc1234..def5678  main -> main"
extract = { pattern = '(\S+)\s*->\s*(\S+)', output = "ok ✓ {2}" }

[on_failure]
tail = 10

Key decisions:

  • match_output handles the two most common "instant" outcomes
  • extract captures the branch name from the ref update line
  • tail = 10 on failure gives enough context without overwhelming

Example 2: git status (Level 2 — parse + group)

Goal: 30+ lines of verbose status → branch name + grouped file counts.

toml
# filters/git/status.toml — Level 2
# Raw output: 30+ lines with hints, file paths, status codes
# Filtered: "main [ahead 2]\n  modified: 3\n  untracked: 2"

command = "git status"

# Override: use porcelain format for reliable machine parsing
run = "git status --porcelain -b"

match_output = [
  { contains = "not a git repository", output = "Not a git repository" },
]

[parse]
# First line: "## main...origin/main [ahead 2]"
# Extract: branch name, upstream, ahead/behind info
branch = { line = 1, pattern = '## (\S+?)(?:\.\.\.(\S+))?(?:\s+\[(.+)\])?$', output = "{1}" }

[parse.group]
# Group remaining lines by their two-character status code
key = { pattern = '^(.{2}) ', output = "{1}" }
labels = {
  "M " = "modified",
  " M" = "modified (unstaged)",
  "MM" = "modified (staged+unstaged)",
  "A " = "added",
  "??" = "untracked",
  "D " = "deleted",
  " D" = "deleted (unstaged)",
  "R " = "renamed",
  "UU" = "conflict",
  "AM" = "added+modified"
}

[output]
format = """
{branch}{tracking_info}
{group_counts}"""
group_counts_format = "  {label}: {count}"
empty = "clean — nothing to commit"

Key decisions:

  • run overrides to porcelain format — machine-readable is easier to parse
  • [parse] extracts the branch header line declaratively
  • [parse.group] groups by status code without needing [[section]]
  • [output] uses built-in {group_counts} variable populated by the parser

Example 3: cargo test (Level 4 — section + chunk + aggregates + tree)

Goal: 200+ lines with compile noise, per-test "ok" lines, failure blocks → per-crate tree summary on pass, structured failure report on fail.

toml
# filters/cargo/test.toml — Level 4
# Raw output: 200+ lines
# Filtered (pass): "✓ cargo test: 1279 passed, 0 failed, 119 ignored (42 suites)"
#                   with per-crate tree breakdown showing individual test suites
# Filtered (fail): failure details + summary

command = "cargo test"
strip_ansi = true

# Drop all the noise
skip = [
  "^\\s*Compiling ",
  "^\\s*Downloading ",
  "^\\s*Downloaded ",
  "^\\s*Finished ",
  "^\\s*Locking ",
  "^running \\d+ tests?$",
  "^test .+ \\.\\.\\. ok$",   # individual passing tests
  "^\\s*$",
  "^\\s*Doc-tests ",
]

# State machine: collect the "failures:" section into blocks split by blank lines
[[section]]
name = "failures"
enter = "^failures:$"
exit = "^failures:$"
split_on = "^\\s*$"
collect_as = "failure_blocks"

# Collect "test result: ok/FAILED" summary lines (one per test suite)
[[section]]
name = "summary"
match = "^test result:"
collect_as = "summary_lines"

# Chunk processing: per-crate breakdown from "Running" headers.
# "unittests" lines define crate boundaries; integration test suites
# inherit the crate name via carry_forward.
[[chunk]]
split_on = "^\\s*Running "
include_split_line = true
collect_as = "suites_detail"
group_by = "crate_name"
children_as = "children"

[chunk.extract]
pattern = 'unittests.+deps/([\w_-]+)-'
as = "crate_name"
carry_forward = true

[[chunk.body_extract]]
pattern = 'Running\s+(.+?)\s+\('
as = "suite_name"

[[chunk.aggregate]]
pattern = '(\d+) passed'
sum = "passed"

[[chunk.aggregate]]
pattern = '(\d+) failed'
sum = "failed"

[[chunk.aggregate]]
pattern = '(\d+) ignored'
sum = "ignored"

[[chunk.aggregate]]
pattern = '^test result:'
count_as = "suite_count"

# Success: aggregate summaries + per-crate tree breakdown
[on_success]
output = "✓ cargo test: {passed} passed, {failed} failed, {ignored} ignored ({suites} suites)\n{suites_detail | each: \"  {crate_name}: {passed} passed ({suite_count} suites)\\n{children | each: \\\"    {suite_name}: {passed} passed\\\" | join: \\\"\\\\n\\\"}\" | join: \"\\n\"}"

[[on_success.aggregates]]
from = "summary_lines"
pattern = 'ok\. (\d+) passed'
sum = "passed"
count_as = "suites"

[[on_success.aggregates]]
from = "summary_lines"
pattern = '(\d+) failed'
sum = "failed"

[[on_success.aggregates]]
from = "summary_lines"
pattern = '(\d+) ignored'
sum = "ignored"

# Failure: show failure details + summary
[on_failure]
output = "✗ cargo test: {passed} passed, {failed} failed ({suites} suites)\n\nFAILURES ({failure_blocks.count}):\n{failure_blocks | each: \"\\n── {index}. ──\\n{value}\" | join: \"\\n\"}\n\n{summary_lines | join: \"\\n\"}"

[[on_failure.aggregates]]
from = "summary_lines"
pattern = '(\d+) passed'
sum = "passed"
count_as = "suites"

[[on_failure.aggregates]]
from = "summary_lines"
pattern = '(\d+) failed'
sum = "failed"

[fallback]
tail = 5

Key decisions:

  • skip removes all per-test "ok" lines — only failures and summaries remain
  • [[section]] collectors handle failure blocks and summary lines
  • [[chunk]] splits on Running headers, extracts crate names from unittests lines
  • carry_forward = true makes integration test suites inherit the crate name from the preceding unit test suite
  • children_as = "children" preserves per-suite detail within each crate group
  • [[on_success.aggregates]] (plural) sums passed/failed/ignored across all suite summary lines
  • Nested each pipes produce tree output: crate → suites
  • [fallback] catches edge cases (compile errors with no test output)

Section 9 — Writing Test Cases

Every filter in the standard library has a <stem>_test/ directory with declarative test cases. When writing or modifying a filter, write test cases alongside it.

Test case format

toml
name = "success output is a single clean line"   # required, human-readable
fixture = "tests/fixtures/cargo_build_success.txt"  # path to raw output file
# inline = "some raw output\nline two"            # alternative: inline fixture
exit_code = 0                                    # optional, default 0
args = []                                        # optional, forwarded to filter

[[expect]]
equals = "ok ✓"          # exact match

[[expect]]
not_contains = "Compiling"  # noise must be gone

Assertion types

Field Description
equals Output exactly equals this string
contains Output contains this substring
not_contains Output does not contain this substring
starts_with Output starts with this string
ends_with Output ends with this string
line_count Output has exactly N non-empty lines
matches Output matches this regex
not_matches Output does not match this regex

Every [[expect]] entry checks one assertion. A test case with multiple [[expect]] entries must pass all of them. A test case with no [[expect]] entries is an error.

What to test

For every filter, write at least:

  • Success case: the happy path produces the expected one-liner or summary
  • Failure case: a failing exit code produces enough context to diagnose
  • Edge cases: cover each match_output branch (e.g., "up-to-date", "rejected")

Directory convention

filters/
  git/
    push.toml          ← filter config
    push_test/         ← test suite (identified by _test suffix)
      success.toml
      up_to_date.toml
      rejected.toml
      failure.toml

The _test suffix makes suite directories immediately identifiable in file listings and distinguishes them from filter category directories.


Section 10 — Common Mistakes to Avoid

  1. Don't use keep when skip is enough. keep is an allow-list — it drops everything that doesn't match. Use it only when you want to radically filter to a specific type of line.

  2. Escape backslashes in TOML strings. In regular strings, \\d means literal \d in the regex. In TOML raw strings ('...'), backslashes are literal. Use raw strings for complex patterns.

  3. match_output is a short-circuit. If it matches, nothing else runs. Don't put it at the end expecting it to be a fallback — it runs first.

  4. [[section]], [parse], and [json] are mutually exclusive in practice. [json] replaces [[section]]/[parse]/[[chunk]] — when [json] is configured, those line-based steps are skipped. Use [json] for JSON output, [[section]]/[parse]/[[chunk]] for line-based output.

  5. {output} in branch templates is the filtered output text (after skip/keep/replace/dedup), not the raw command output.

  6. Pipe chains need careful quoting. When nesting templates inside each:, escape inner quotes: {each: "{value | lines | keep: \\\"^error\\\"}"}.

  7. Don't skip the [fallback]. Complex filters with [[section]] can produce empty output if sections don't match. Always add [fallback] tail = 5 as a safety net.

  8. Test with realistic fixture data. A filter that works on a trimmed example may miss edge cases. Use real command output saved to a .txt fixture file.


Section 11 — Generic Commands (When a Filter Isn't Needed)

Before writing a full filter, consider whether a generic command would be sufficient. tokf provides three built-in subcommands that work on any command without a TOML filter:

Command Use case Example
tokf err <cmd> Extract errors/warnings tokf err mix compile
tokf test <cmd> Extract test failures tokf test ctest --output-on-failure
tokf summary <cmd> Heuristic summary tokf summary terraform plan

When to use generic commands vs. writing a filter

  • Use generic commands when the tool's output follows common patterns (error lines, test results, repetitive logs) and you don't need precise control over the output format.
  • Write a filter when you need structured extraction, specific sections, custom templates, or the generic output isn't useful enough.

Routing generic commands through rewrites

To make generic commands trigger automatically via the hook, add rewrite rules to .tokf/rewrites.toml:

toml
# Build tools without dedicated filters
[[rewrite]]
match = "^mix compile"
replace = "tokf err {0}"

# Test runners without dedicated filters
[[rewrite]]
match = "^mix test"
replace = "tokf test {0}"

# Long output without dedicated filters
[[rewrite]]
match = "^terraform plan"
replace = "tokf summary {0}"

Important: Only add rewrite rules for commands that don't already have a filter. Check with tokf which "<command>" first. Commands with dedicated filters produce better output through tokf run.

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results