Run Benchmark Tool

Purpose

Compare Gemini File API vs Inline document approaches for performance.

What It Tests

File API: Upload documents once, reuse cached URIs for multiple queries
Inline: Send raw document bytes with each request

The benchmark shuffles document order each round to prevent Gemini's native caching from affecting results.

Running the Benchmark

Basic Usage

bash

export GEMINI_API_KEY="your-key"
make build
./bin/benchmark -docs test_loan_files/loan_file_1_LN-2024-001847

With Options

bash

./bin/benchmark \
  -docs /path/to/documents \
  -rounds 20 \
  -max-docs 10 \
  -json

CLI Flags

-docs string      Directory containing documents (required)
-rounds int       Number of test rounds per method (default 10)
-max-docs int     Maximum income documents to use (default 6)
-json             Output results as JSON
-income           Only use income documents (default true)

Using Makefile

bash

make benchmark DOCS=test_loan_files/loan_file_1_LN-2024-001847 ROUNDS=10

Understanding Results

Output Structure

PHASE 1: INLINE DOCUMENTS
  Round 1: [shuffled order] -> time, tokens
  ...

PHASE 2: FILE API
  Upload: X seconds (one-time)
  Round 1: [shuffled order] -> time, tokens
  ...

FINAL COMPARISON
  - Total time comparison
  - Average per-query time
  - Token usage
  - Winner & speedup factor
  - Break-even analysis

Key Metrics

Metric	Meaning
Upload time	One-time cost for File API
Total time	Sum of all operations
Avg per round	Mean time per query
Min/Max round	Query time variance
Speedup	How much faster winner is
Break-even	Queries needed for File API to win

Interpreting Results

File API wins when:

Many queries against same documents
Break-even point is low (< 10 queries)
Per-query savings compound

Inline wins when:

Few queries (< break-even)
Different documents each time
Simplicity preferred

Example Output

TIMING COMPARISON
┌─────────────────┬──────────────────┬──────────────────┐
│ Metric          │ File API         │ Inline Docs      │
├─────────────────┼──────────────────┼──────────────────┤
│ Upload (1x)     │           1.976s │              N/A │
│ Total time      │        1m13.733s │        1m14.454s │
│ Avg per round   │           7.176s │           7.445s │
└─────────────────┴──────────────────┴──────────────────┘

BREAK-EVEN ANALYSIS
   Upload overhead:      1.976s
   Savings per query:    270ms
   Break-even at:        7.3 queries

Why Shuffled Order?

Documents are shuffled each round because:

Gemini may cache based on content/order
Shuffling ensures each query is "fresh"
Gives accurate per-query timing
More realistic for production workloads

Test Questions

The benchmark uses varied income-related questions:

Annual/monthly income extraction
Employer information
YTD income calculation
Deductions and withholdings
Income source classification
Tax year coverage
And more...

Recommendations

Scenario	Recommendation
Underwriter iterating on loan	File API
One-off document analysis	Inline
Batch processing same docs	File API
Real-time different docs	Inline

Related Files

cmd/benchmark/main.go - Benchmark implementation
internal/gemini/client.go - Both API approaches
internal/gemini/cache.go - File caching logic

Search AI Tools

run-benchmark

Install this agent skill to your Project

SKILL.md

Run Benchmark Tool

Purpose

What It Tests

Running the Benchmark

Basic Usage

With Options

CLI Flags

Using Makefile

Understanding Results

Output Structure

Key Metrics

Interpreting Results

Example Output

Why Shuffled Order?

Test Questions

Recommendations

Related Files