Agent skills
web-article-extractor

Agent skill

web-article-extractor

使用 Chrome DevTools MCP 提取网页正文、保存 Markdown、下载文章图片或分析页面结构时调用。适用于博客、新闻站、微信公众号等文章页面；当用户要求“提取文章”“抓网页正文”“保存为 markdown”“连图片一起保存”时使用。

View SKILL.md on GitHub Repository

Stars 170

Forks 42

Install this agent skill to your Project

npx add-skill https://github.com/dongbeixiaohuo/writing-agent/tree/main/.claude/skills/公众号文章获取

SKILL.md

Web Article Extractor

目标很简单：先拿到干净正文，再按用户需要决定输出成纯文本、结构化 JSON，还是 Markdown + 图片。

前置条件

确保已配置 chrome-devtools MCP 服务器：

bash

claude mcp add chrome-devtools npx -y chrome-devtools-mcp@latest -- \
  --disable-blink-features=AutomationControlled \
  --disable-web-security \
  --disable-features=IsolateOrigins,site-per-process

基本原则

默认优先 Readability.js，不要一上来就手写大段选择器。
用户要的是“保存为 Markdown”时，不要只返回正文字符串，直接走 Markdown 导出链路。
遇到公众号、知乎这类安全限制页面，再读平台专用说明，不要把平台细节堆在主流程里。
多链接批量提取时，串行处理，避免风控。

选择哪条提取路径

1. 普通网页正文提取

默认走：

脚本：scripts/readability_extractor.js
需要细调参数时再读：
- references/readability-guide.md
- references/config-options.md

适用场景：

提取博客、新闻、专栏正文
返回标题、作者、正文、图片、阅读时长等结构化信息

2. 导出为 Markdown，并尽量保留图片

直接走：

转换脚本：scripts/markdown_converter.js
图片落盘脚本：scripts/save_with_images.js
使用说明：
- references/markdown_usage.md

适用场景：

用户明确说“保存成 markdown”
需要把图片下载到本地并重写 Markdown 图片路径

3. Readability 不稳定，改走轻量提取或手工选择器

回退路径：

轻量提取脚本：scripts/extract_article.js
选择器参考：
- references/selector_patterns.md
- references/platform-specific.md

适用场景：

页面 DOM 很怪，Readability 抽不准
需要按平台特征补自定义选择器

标准流程

1. 打开页面并等待正文稳定

导航到目标 URL
等页面加载完成
如果正文节点迟迟不出现，额外等 2-3 秒再提取

2. 先判断输出形态

用户只要“看内容”：
- 提取正文并直接返回摘要/结构化内容
用户要“保存为 Markdown”：
- 直接走 markdown_converter.js + save_with_images.js
用户要“分析页面结构”：
- 先读 references/selector_patterns.md

3. 微信公众号特殊处理

只有在目标页面确实是公众号链接时，才启用特殊处理：

优先读 references/platform-specific.md
必要时模拟微信 User-Agent
必要时追加等待逻辑，确保 #js_content 真正加载完成

不要把公众号逻辑默认套在所有网页上。

输出要求

纯提取结果

至少包含：

title
author
content
url
wordCount

Markdown 输出

至少包含：

正文 .md
图片资源目录或本地图片文件
图片路径已回写到 Markdown

参考资料导航

Readability 原理与限制：
- references/readability-guide.md
Readability 可调参数：
- references/config-options.md
平台专用说明：
- references/platform-specific.md
Markdown 保存链路：
- references/markdown_usage.md
常见选择器模式：
- references/selector_patterns.md
实际调用示例：
- references/usage_examples.md
提取成功率与排错：
- references/best-practices.md

脚本清单

scripts/readability_extractor.js
- 主提取脚本，默认入口
scripts/readability_loader.js
- 负责在运行时加载提取逻辑
scripts/extract_article.js
- 轻量回退提取器
scripts/markdown_converter.js
- 转成 Markdown 数据结构
scripts/save_with_images.js
- 下载图片并落盘
scripts/Readability.js
- Readability 运行库，不直接改调用方式

常见问题

需要登录的内容：
- 使用已登录浏览器实例，不要假设匿名可读
公众号提示“请在微信中打开”：
- 先读 references/platform-specific.md，不要直接硬改全局流程
提取失败或内容不完整：
- 先读 references/best-practices.md，再决定是否切换到 extract_article.js

Maintainer

dongbeixiaohuo Core maintainer

Source details

Full Name: dongbeixiaohuo/writing-agent
Branch: main
Path in repo: .claude/skills/公众号文章获取
License: MIT License
Topics: claude-code deepseek ai-writing content-generation style-modeling writing-agent writing-assistant writing-workflow

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

dongbeixiaohuo/writing-agent

workflow-producer

[MASTER ENTRY POINT] 写作工作流总导演 - 所有写作请求的唯一入口点。触发词：帮我写、写文章、写一篇、创作、写作、产出、起草第一步必须询问用户选择模式（A/B/C），禁止跳过，禁止自动判断。

dongbeixiaohuo/writing-agent

style-modeler

当用户需要学习某种风格、提取写作配方、建立风格库或模仿特定作者时调用。深度解构文本的15个维度，包括作者画像、思维内核、创作路径、互动设计等，建模为可精准复制的风格文件。触发词：风格建模、提取风格、学习风格、模仿写作、解构文章、写作配方、风格库。

davila7/claude-code-templates

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

davila7/claude-code-templates

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

davila7/claude-code-templates

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

davila7/claude-code-templates

Claude Code Guide

Master guide for using Claude Code effectively. Includes configuration templates, prompting strategies "Thinking" keywords, debugging techniques, and best practices for interacting with the agent.

Didn't find tool you were looking for?