Agent skill

working-with-reference-tables

Work with Reference Tables (static CSV lookup data) using OPAL to enrich datasets with descriptive information. Use when you need to map IDs to human-readable names, add static metadata from CSV uploads, or perform lookups without temporal considerations. Covers both explicit and implicit lookup patterns, column name matching, and when to choose Reference Tables vs Resources vs Correlation Tags.

Stars 1
Forks 1

Install this agent skill to your Project

npx add-skill https://github.com/rustomax/observe-community-mcp/tree/main/skills/working-with-reference-tables

SKILL.md

Working with Reference Tables

Work with Reference Tables (static lookup data) using OPAL to enrich datasets with descriptive information. Reference Tables store static mappings (max 10MB CSV) that don't track changes over time, providing an alternative to Resources when no temporal aspect is needed.

Use when you need to:

  • Map IDs to human-readable names (product IDs → product names, error codes → descriptions)
  • Enrich logs/spans with static metadata
  • Add dimension data from CSV uploads
  • Lookup values without temporal considerations

Covers reference table fundamentals, both explicit and implicit lookup patterns, and when to choose Reference Tables vs Resources vs Correlation Tags.

Key Concepts

What Are Reference Tables?

Reference Tables are static lookup datasets created from CSV uploads:

  • No timestamps - Static data
  • No change tracking - Subsequent uploads of the same table overwrite previous state
  • Max 10MB CSV file size
  • Uploaded manually via Observe UI or API
  • Primary key column for joining
  • Value columns with descriptive data

Example: Product reference table

app_product_id,app_product_name
"OLJCESPC7Z","National Park Foundation Explorascope"
"L9ECAV7KIM","Lens Cleaning Kit"
"6E92ZMYYFZ","Solar Filter"

When to Use Reference Tables

  • Data is static (doesn't change over time, or if it does change, the state tracking is not required)
  • Need simple ID-to-name mappings
  • No temporal aspect required
  • Dataset size under 10MB

Lookup Patterns

Reference Tables support two lookup approaches: explicit join (recommended) and implicit join (requires column name matching).

Pattern A: Explicit Lookup (Recommended)

Most flexible - specify join condition directly without column name matching:

opal
# Join Product Logs with Product reference table using alias
lookup @product.app_product_id=product_id, pid:product_id, product_name:@product.app_product_name
| statsby count(), group_by(product_name)
| topk 10, max(product_name)

How it works:

  1. Use alias for reference table: @product
  2. Specify join condition: @product.app_product_id=product_id
  3. Select columns to retrieve: pname:@product.app_product_name
  4. No need to match column names!

MCP Parameters:

json
{
  "primary_dataset_id": "42782295",
  "secondary_dataset_ids": ["42782294"],
  "dataset_aliases": {"product": "42782294"}
}

Key advantages:

  • Column names don't need to match
  • Clear and explicit join condition
  • Full control over retrieved columns
  • No extra make_col needed

Pattern B: Implicit Lookup (Column Name Matching)

Requires exact column name matching - simpler syntax but less flexible:

opal
make_col app_product_id:product_id
| lookup @product_ref
| make_col pid:product_id, pname:app_product_name
| limit 10

How it works:

  1. Reference table has primary key app_product_id
  2. Source dataset creates matching column: make_col app_product_id:product_id
  3. lookup automatically joins on matching column name
  4. All reference table columns added to result

MCP Parameters:

json
{
  "primary_dataset_id": "42782295",
  "secondary_dataset_ids": ["42782294"],
  "dataset_aliases": {"product_ref": "42782294"}
}

Why implicit requires matching:

❌ WRONG - Mismatched names fail:

opal
lookup @product_ref  # Source has 'product_id', reference has 'app_product_id'

Error: "implicit lookup requires all primary key columns in the other dataset to match columns in the source dataset; missing columns from the source dataset: app_product_id"

✅ CORRECT - Create matching column first:

opal
make_col app_product_id:product_id
| lookup @product_ref

When to use implicit:

  • Simple joins where column names already match
  • When you want automatic inclusion of all reference columns
  • Legacy queries or established patterns

When to use explicit (Pattern A):

  • Column names don't match (most common!)
  • Want control over which columns to retrieve
  • Clearer, more maintainable queries

Pattern C: Using on() with Column Bindings

Full control over join conditions and column selection:

opal
# Join with explicit on() syntax
lookup on(product_id=@product.app_product_id), product_name:@product.app_product_name
| statsby count(), group_by(product_name)

Use case: Complex join conditions or when you need precise control over the join and column bindings

Lookup Behavior

  • Join type: Left outer join (keeps all rows)
  • No match: Reference table columns are NULL
  • Multiple matches: Returns all matching rows (Cartesian product)
  • Performance: Fast for small reference tables (<10MB)

Common Patterns

Pattern: Enrich with Descriptive Names (Explicit Join)

opal
# Using explicit lookup - no column name matching needed!
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| filter not is_null(pname)
| make_col pid:product_id, name:pname, service:container
| limit 20

Use case: Add human-readable product names to logs

Result: Logs with "National Park Foundation Explorascope" instead of "OLJCESPC7Z"

Alternative (implicit join):

opal
make_col app_product_id:product_id
| lookup @product_ref
| filter not is_null(app_product_name)
| make_col pid:product_id, name:app_product_name, service:container
| limit 20

Pattern: Aggregate with Reference Data (Explicit Join)

opal
# Using explicit lookup - cleaner and more maintainable
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| statsby log_count:count(), group_by(pname)
| sort desc(log_count)

Use case: Count events by descriptive name

Result:

pname,log_count
"National Park Foundation Explorascope",864

Alternative (implicit join):

opal
make_col app_product_id:product_id
| lookup @product_ref
| statsby log_count:count(), group_by(app_product_name)
| sort desc(log_count)

Pattern: Browse Reference Table Contents

opal
make_col id:app_product_id, name:app_product_name
| limit 50

Dataset: Query reference table directly (use reference table as primary_dataset_id)

Use case: See available lookup values

Result: Complete list of products in reference table

Pattern: Handle Missing Lookups (Explicit Join)

opal
# Using explicit lookup
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| make_col pid:product_id,
          name:if(is_null(pname), "Unknown Product", pname)
| limit 10

Use case: Provide default value when reference lookup fails

Behavior: Shows "Unknown Product" when pname is NULL

Pattern: Filter to Matched Rows Only (Explicit Join)

opal
# Using explicit lookup
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| filter not is_null(pname)
| make_col pid:product_id, name:pname
| limit 20

Use case: Exclude rows without reference table matches

Behavior: Only returns rows with successful lookups

Troubleshooting

Issue: "Missing columns from source dataset"

Error: "implicit lookup requires all primary key columns in the other dataset to match columns in the source dataset; missing columns from the source dataset: app_product_id"

Cause: Source dataset doesn't have column matching reference table's primary key

Solution: Create matching column with make_col:

opal
make_col app_product_id:product_id
| lookup @product_ref

Key insight: Column names must match exactly (case-sensitive!)

Issue: All reference columns are NULL

Cause: No matches found (lookup is left outer join)

Diagnosis: Check if join values actually exist in reference table:

opal
filter app_product_id = "OLJCESPC7Z"
| limit 1

(Query reference table directly to verify value exists)

Solutions:

  1. Verify product_id values in source match app_product_id in reference
  2. Check for typos or case sensitivity issues
  3. Ensure reference table uploaded correctly

Issue: "Implicit lookup does not support additional arguments"

Error: "implicit lookup does not support additional arguments like explicit join predicates or column bindings"

Cause: Mixing implicit and explicit syntax:

opal
lookup @product_ref on product_id = @product_ref.app_product_id  ❌

Solution Option 1: Use explicit lookup (recommended):

opal
lookup @product.app_product_id=product_id, pname:@product.app_product_name  ✅

Solution Option 2: Use implicit lookup with column name matching:

opal
make_col app_product_id:product_id
| lookup @product_ref  ✅

Key Takeaways

  1. Reference Tables are for static CSV data (no timestamps, no change tracking)
  2. Three lookup patterns available:
    • Explicit (recommended): lookup @product.app_product_id=product_id, pname:@product.app_product_name
    • Implicit: Requires exact column name matching via make_col
    • on() syntax: Full control with lookup on(condition), bindings
  3. Explicit lookup advantages: No column name matching needed, clearer syntax, full control over retrieved columns
  4. Left outer join behavior - keeps all rows, NULL when no match
  5. Use for static enrichment - alternative to Resources when no temporal aspect needed
  6. 10MB size limit per reference table
  7. Fast and efficient for small lookup datasets
  8. Column matching is case-sensitive when using implicit lookup
  9. Use is_null() to check for failed lookups - provides default values or filters unmatched rows
  10. Query reference tables directly to browse available lookup values

When to Choose Reference Tables vs Resources

Scenario Use Reference Tables Use Resources
Static data that doesn't change
Data changes over time and you need state history
Simple ID-to-name mappings
Need temporal joins with Events/Intervals
CSV upload (max 10MB)
Track mutable state evolution
Fast lookups without timestamps

References

  • Reference Tables created via CSV upload in Observe UI
  • Use lookup verb with @ alias for joining
  • Explicit lookup: @alias.ref_column=source_column, result:@alias.value_column
  • Implicit lookup: Requires matching column names (automatic join)
  • on() syntax: lookup on(condition), column_bindings
  • Maximum 10MB CSV size per table
  • Interface type typically shows as "unknown" in discovery

Expand your agent's capabilities with these related and highly-rated skills.

rustomax/observe-community-mcp

working-with-resources

Work with Resource datasets (mutable state tracking) using OPAL temporal joins. Use when you need to enrich Events/Intervals with contextual state information, track resource state changes over time, or navigate between datasets using temporal relationships. Covers temporal join mechanics (lookup, join, follow), automatic field matching, and when to use Resources vs Reference Tables.

1 1
Explore
rustomax/observe-community-mcp

analyzing-text-patterns

Extract and analyze recurring patterns from log messages, span names, and event names using punctuation-based template discovery. Use when you need to understand log diversity, identify common message structures, detect unusual formats, or prepare for log parser development. Works by removing variable content and preserving structural markers.

1 1
Explore
rustomax/observe-community-mcp

time-series-analysis

Analyze event datasets (logs) and intervals over time using OPAL timechart. Use when you need to visualize trends, track metrics over time, or create time-series charts. Covers timechart for temporal binning, bin duration options (1h, 5m, 1d), options(bins:N) for controlling bin count, and understanding temporal output columns (_c_valid_from, _c_valid_to, _c_bucket). Returns multiple rows per group for time-series visualization. For single summaries, see aggregating-event-datasets skill.

1 1
Explore
rustomax/observe-community-mcp

detecting-anomalies

Detect anomalies in metrics and time-series data using OPAL statistical methods. Use when you need to identify unusual patterns, spikes, drops, or outliers in observability data. Covers statistical outlier detection (Z-score, IQR), threshold-based alerts, rate-of-change detection with window functions, and moving average baselines. Choose pattern based on data distribution and anomaly type.

1 1
Explore
rustomax/observe-community-mcp

aggregating-gauge-metrics

Aggregate pre-computed metrics (gauge, counter, delta types) using OPAL. Use when analyzing request counts, error rates, resource utilization, or any numeric metrics over time. Covers align + m() + aggregate pattern, summary vs time-series output, and common aggregation functions. For percentile metrics (tdigest), see analyzing-tdigest-metrics skill.

1 1
Explore
rustomax/observe-community-mcp

aggregating-event-datasets

Aggregate and summarize event datasets (logs) using OPAL statsby. Use when you need to count, sum, or calculate statistics across log events. Covers make_col for derived columns, statsby for aggregation, group_by for grouping, aggregation functions (count, sum, avg, percentile), and topk for top N results. Returns single summary row per group across entire time range. For time-series trends, see time-series-analysis skill.

1 1
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results