Agent skill

article-exporter

Export any web article to a local Obsidian-ready Markdown directory. Fetches page content via actionbook CLI, downloads images locally, rewrites image references to relative paths, and optionally translates the article using AI. Produces a self-contained folder with README.md, images/, and an index.md navigation file.

View SKILL.md on GitHub Repository

Stars 1,464

Forks 102

Install this agent skill to your Project

npx add-skill https://github.com/actionbook/actionbook/tree/main/playground/article-exporter

SKILL.md

Article Exporter - Export Articles to Obsidian

Version: 0.5.0 | Last Updated: 2026-03-13

You are an expert at web content archiving and Obsidian workflow automation.

Lessons from Failed Exports

These rules were extracted from real export failures. Each one prevents a specific class of error:

Twitter/X needs AI reformatting — fetch returns flat text because Twitter uses custom UI without semantic HTML. The AI reformatting step reconstructs headings, lists, and code blocks. See references/twitter-handling.md.
Ask for output path first — users have different vault locations. Assuming a default creates files in the wrong place and wastes time moving them.
Check actionbook version >= 0.9.1 — the --wait-hint parameter was added in 0.9.1. Without it, dynamic content (SPAs, lazy-loaded pages) returns empty or partial results.
Wait after navigation — use --wait-hint heavy for Twitter, Medium, and other dynamic sites. Without it, the page hasn't finished rendering when content is extracted.
Rate limit batch exports — 3-5s delay between requests prevents being flagged as a bot (ToS compliance).

Quick Reference

Task	Command	Success Criteria
Check deps	`actionbook --version`	Shows version >= 0.9.1
Fetch article	`actionbook browser fetch <url> --wait-hint heavy`	Returns plain text (AI reformats to Markdown in Step 1b)
Translate	AI session directly	README_CN.md created
Open in Obsidian	`obsidian-cli open "path/index.md"`	File opens in Obsidian

Complete Export Workflow

Goal: Export web article to Obsidian directory with images and optional translation

Success criteria:

Article directory created with README.md
All images downloaded to images/
index.md navigation file created
Optional: README_CN.md translation
Opened in Obsidian (if obsidian-cli available)

Step 1: Fetch Article Content

Execution: Direct (Bash)

bash

# Fetch article as readability text (with log cleaning)
actionbook browser fetch "$URL" --wait-hint heavy 2>/dev/null | \
  sed '/^[[:space:]]*$/d;/^\x1b\[/d;/^INFO/d' > /tmp/article_raw.txt

Success criteria:

/tmp/article_raw.txt exists and size > 0 bytes
Content contains the article's main text

The fetch command returns readability-extracted plain text (not Markdown). AI reformatting in Step 1b is always needed to produce proper Markdown.

Rules:

Use --wait-hint heavy for Twitter, Medium, dynamic content
Use --wait-hint light for static blogs
2>/dev/null suppresses stderr logs
sed removes ANSI codes, INFO lines, empty lines

Twitter/X Special Handling

Twitter uses non-semantic HTML, so fetch output loses all structure (headings become flat text, code blocks disappear). If the URL contains x.com or twitter.com, pay extra attention to structure reconstruction in Step 1b. See references/twitter-handling.md.

Step 1b: AI Reformat to Markdown

Execution: Direct (AI session)

Read /tmp/article_raw.txt and convert the plain text into well-structured Markdown. Save the result to /tmp/article.md.

Reformatting rules:

Reconstruct headings (#, ##, ###) from the text structure
Preserve original image URLs as ![alt](url) references
Format code blocks, lists, tables, and blockquotes
Keep the original article title as the first # H1 heading

Success criteria:

/tmp/article.md exists and starts with # <Title>
Image URLs are preserved as Markdown image syntax

Step 2: Extract Metadata

Execution: Direct (Bash)

bash

# Extract title (first H1 heading from AI-reformatted markdown)
TITLE=$(grep -m 1 "^# " /tmp/article.md | sed 's/^# //')

# Extract image URLs (filter out data: URLs)
IMAGE_URLS=$(grep -o '!\[[^]]*\]([^)]*)' /tmp/article.md | \
    sed -E 's/!\[[^]]*\]\(([^)]*)\)/\1/' | \
    grep -v '^data:')

Success criteria:

$TITLE is non-empty
$IMAGE_URLS count matches expected (use wc -l)

Step 3: Ask Output Directory

Execution: [human] Human checkpoint: Confirm output location before creating files

Ask user: "Where should I save the exported article?"

Suggested paths:

~/Work/Write/Articles (default)
~/Documents/Obsidian/Articles
~/Notes/Imported
(or custom path from $output_dir argument)

Success criteria: User confirms output directory

Artifacts: $OUTPUT_DIR variable set

Step 4: Create Directory Structure

Execution: Direct (Bash)

bash

# Use argument if provided, otherwise use confirmed path
OUTPUT_DIR="${output_dir:-$USER_CONFIRMED_PATH}"

# Sanitize title for directory name
SAFE_TITLE=$(echo "$TITLE" | sed 's/[/:*?"<>|]//g' | cut -c1-100 | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')

# Create output directory
ARTICLE_DIR="$OUTPUT_DIR/$SAFE_TITLE"
mkdir -p "$ARTICLE_DIR/images"

Success criteria:

Directory $ARTICLE_DIR exists
Subdirectory images/ exists
Directory is writable

Rules:

Remove special characters: / : * ? " < > |
Limit title length to 100 characters
Trim leading/trailing whitespace

Step 5: Download Images (Parallel if possible)

Execution: Direct (Bash)

bash

counter=1
for url in $IMAGE_URLS; do
    ext=$(echo "$url" | grep -oE '\.(jpg|jpeg|png|gif|webp|svg)' || echo ".jpg")
    curl -L -s "$url" -o "$ARTICLE_DIR/images/image_${counter}${ext}"

    # Check file size (detect 0-byte failures)
    if [ ! -s "$ARTICLE_DIR/images/image_${counter}${ext}" ]; then
        # Try alternative format (Twitter)
        curl -L -s "${url}?format=jpg&name=orig" -o "$ARTICLE_DIR/images/image_${counter}.jpg"
    fi

    counter=$((counter + 1))
done

Success criteria:

All image files exist and size > 0 bytes
File count matches $IMAGE_URLS count

Rules:

Use curl -L to follow redirects
Check file size after download
Try alternative formats for Twitter images

Step 6: Update Image References

Execution: Direct (Bash)

bash

# Replace remote URLs with local paths
counter=1
for url in $IMAGE_URLS; do
    ext=$(echo "$url" | grep -oE '\.(jpg|jpeg|png|gif|webp|svg)' || echo ".jpg")
    sed -i.bak "s|$url|./images/image_${counter}${ext}|g" /tmp/article.md
    counter=$((counter + 1))
done

# Save updated markdown
cp /tmp/article.md "$ARTICLE_DIR/README.md"
rm /tmp/article.md.bak

Success criteria:

README.md contains ./images/image_N.* references
No remote URLs remain in image links

Step 7: AI Translation (Optional)

Execution: Direct (AI session)

Human checkpoint: Ask user: "Do you want to translate the article? (y/n)"

If yes:

Read $ARTICLE_DIR/README.md
Translate using AI capabilities (no external API)
Write to $ARTICLE_DIR/README_CN.md (or other language code)

Translation Prompt Template:

Translate the following Markdown article to [LANGUAGE] while preserving:
- All Markdown formatting (headings, lists, code blocks, tables)
- Image references exactly as-is: ![alt](./images/image_N.*)
- Links and URLs unchanged
- Code blocks and technical terms in original language

Only output the translated Markdown content.

---
[Paste README.md content]

Success criteria: Translation file exists and size ≈ original ± 20%

Supported languages: en, zh, es, fr, de, ja, ko

Step 8: Create Navigation Index

Execution: Direct (Bash)

bash

# Auto-detect source from URL
case "$URL" in
    *x.com*|*twitter.com*) SOURCE="X" ;;
    *medium.com*) SOURCE="Medium" ;;
    *dev.to*) SOURCE="Dev.to" ;;
    *openai.com*) SOURCE="OpenAI Blog" ;;
    *substack.com*) SOURCE="Substack" ;;
    *github.com*) SOURCE="GitHub" ;;
    *) SOURCE=$(echo "$URL" | sed 's|https\?://||' | cut -d/ -f1) ;;
esac

# Create index.md
cat > "$ARTICLE_DIR/index.md" <<EOF
# $TITLE

> **Export Date**: $(date +%Y-%m-%d)
> **Original URL**: $URL
> **Source**: $SOURCE

## 📚 Language Versions

- 🇬🇧 **English**: [[README]]
- 🇨🇳 **Chinese**: [[README_CN]]  <!-- if translated -->

## 📊 Metadata

| Property | Value |
|----------|-------|
| **Source** | $SOURCE |
| **Images** | $(ls images/ 2>/dev/null | wc -l) images |
| **Export Tool** | actionbook CLI |
| **Export Date** | $(date +%Y-%m-%d) |

---

**Exported using**: actionbook browser automation + AI assistant
EOF

Success criteria: index.md exists with metadata table

Step 9: Open in Obsidian

Execution: Direct (Bash)

bash

if command -v obsidian-cli &> /dev/null; then
    VAULT_ROOT="$OUTPUT_DIR"
    REL_PATH=$(echo "$ARTICLE_DIR" | sed "s|$VAULT_ROOT/||")
    obsidian-cli open "$REL_PATH/index.md"
    echo "✓ Opened in Obsidian: $REL_PATH/index.md"
else
    # Fallback: Open in file manager
    case "$(uname)" in
        Darwin)  open "$ARTICLE_DIR" ;;
        Linux)   xdg-open "$ARTICLE_DIR" ;;
        CYGWIN*|MINGW*|MSYS*) start "$ARTICLE_DIR" ;;
    esac
    echo "⚠️  Install obsidian-cli for automatic opening: npm install -g obsidian-cli"
fi

Success criteria:

File opens in Obsidian OR directory opens in file manager
User sees success message

Step 10: Report Success

Execution: Direct (Output)

bash

echo ""
echo "════════════════════════════════════════════"
echo "✓ Article exported successfully!"
echo ""
echo "📁 Location: $ARTICLE_DIR"
echo "📄 Files:"
echo "     - README.md (original)"
[ -f "$ARTICLE_DIR/README_CN.md" ] && echo "     - README_CN.md (translation)"
echo "     - index.md (navigation)"
echo "🖼️  Images: $(ls images/ 2>/dev/null | wc -l) files"
echo "════════════════════════════════════════════"

Common Issues

Issue	Cause	Solution
"actionbook: command not found"	CLI not installed	`npm install -g @actionbookdev/cli@latest`
"unknown flag: --wait-hint"	Version < 0.9.1	Upgrade: `npm install -g @actionbookdev/cli@latest`
Twitter format broken	`fetch` loses structure	Use AI reformatting (see references/twitter-handling.md)
Images 0 bytes	URL expired	Try `?format=jpg&name=orig`
obsidian-cli not found	Not installed	`npm install -g obsidian-cli`
Batch export blocked	Too fast, flagged as bot	Add 3-5s `sleep` between requests

Detailed troubleshooting: See ./references/troubleshooting.md

Edge Cases Handled

Long titles → Auto-truncate to 100 chars
Special characters → Sanitized (/ : * ? " < > | removed)
No images → Steps 5-6 skip gracefully
0-byte images → Auto-retry with alternative formats
Data URLs → Filtered out in Step 2

When Using This Skill

Check dependencies first — actionbook --version >= 0.9.1
Test with one article — Verify before batch processing
Twitter/X requires special handling — See references/twitter-handling.md
Respect ToS — Personal use only, rate limit batch exports

References (Progressive Disclosure)

For detailed documentation, see:

./references/twitter-handling.md — Twitter/X special handling (AI reformatting)
./references/batch-export.md — Batch export with rate limiting
./references/troubleshooting.md — Detailed troubleshooting guide
./references/obsidian-setup.md — obsidian-cli setup and configuration
./references/supported-websites.md — Complete website compatibility list

Last Updated: 2026-03-13 | Version: 0.5.0

Maintainer

actionbook Core maintainer

Source details

Full Name: actionbook/actionbook
Branch: main
Path in repo: playground/article-exporter
License: Apache License 2.0
Topics: ai-agents skills rust browser-automation

Featured Tools

Join Our Newsletter

Automates browser interactions for form filling and web page interaction. Used by the request-website command to submit website indexing requests.

1,464 102

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Article Exporter - Export Articles to Obsidian

Lessons from Failed Exports

Quick Reference

Complete Export Workflow

Step 1: Fetch Article Content

Step 1b: AI Reformat to Markdown

Step 2: Extract Metadata

Step 3: Ask Output Directory

Step 4: Create Directory Structure

Step 5: Download Images (Parallel if possible)

Step 6: Update Image References

Step 7: AI Translation (Optional)

Step 8: Create Navigation Index

Step 9: Open in Obsidian

Step 10: Report Success

Common Issues

Edge Cases Handled

When Using This Skill

References (Progressive Disclosure)

Recommended Agent Skills

actionbook

deep-research

actionbook-web-test

arxiv-viewer

actionbook-scraper

agent-browser