Agent skill
audit
Audit paper data quality in the knowledge base. Checks for missing fields, filename issues, DOI duplicates, title mismatches, and more. Supports LLM-based deep diagnosis for title mismatches and automated repair. Use when the user wants to check data quality, find problems, or fix metadata issues.
Install this agent skill to your Project
npx add-skill https://github.com/ZimoLiao/scholaraio/tree/main/.claude/skills/audit
SKILL.md
论文审计
检查已入库论文的数据质量。分阶段:规则化检查(自动)+ LLM 深度诊断(对可疑项)+ 自动修复。
阶段一:规则化检查
scholaraio audit [--severity error|warning|info]
问题按严重程度分类:
- 错误:缺少标题、缺少 MD 文件、JSON 解析失败、DOI 重复
- 警告:缺少 DOI/摘要/年份/作者/期刊、MD 过短、标题不一致、文件名年份不匹配
- 提示:文件名不符合规范格式
阶段二:LLM 深度诊断(title_mismatch 专项)
对每篇 title_mismatch 论文,用 Read 工具读取 meta.json 和 paper.md(前 80 行),判断:
- MD 正文的实际主题/标题是否与 JSON 元数据一致
- 无害(MinerU H1 识别问题)vs 真正的内容错配
阶段三:修复
对确认的错配,使用 repair 命令:
# 先 dry-run 预览
scholaraio repair "<paper-id>" --title "正确标题" [--author "一作"] [--year YYYY] [--doi "10.xxx/..."] --dry-run
# 确认后执行
scholaraio repair "<paper-id>" --title "正确标题" [--author "一作"] [--year YYYY] [--doi "10.xxx/..."] [--no-api]
# 修复后重建索引
scholaraio pipeline reindex
检查规则
| 规则 | 级别 | 说明 |
|---|---|---|
missing_title |
error | 缺少标题 |
missing_md |
error | JSON 无对应 MD 文件 |
duplicate_doi |
error | DOI 重复 |
missing_doi |
warning | 缺少 DOI |
missing_abstract |
warning | 缺少摘要 |
title_mismatch |
warning | JSON 标题与 MD H1 不一致 |
nonstandard_filename |
info | 文件名不符合规范格式 |
示例
用户说:"帮我检查一下论文库有没有问题" → 执行阶段一规则化检查
用户说:"深度检查" → 执行阶段一 + 阶段二(LLM 逐篇诊断 title_mismatch)
用户说:"修复那些错配的论文" → 执行阶段三
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
lammps
Use when working on classical materials simulations with LAMMPS, especially interatomic-potential selection, shock or deformation setups, thermodynamic runs, and structure analysis for solids or nanomaterials.
citations
View top-cited papers ranking and refetch citation counts from APIs. Use when the user asks about highly cited papers, citation rankings, or wants to update citation data.
paper-writing
Assist with writing sections of a research paper (Introduction, Related Work, Method, Results, Discussion, Conclusion). Leverages workspace papers for citations and evidence. Use when the user wants help drafting or revising specific paper sections.
arxiv
Use when the user wants to browse arXiv preprints, search arXiv directly, fetch a PDF by arXiv ID or URL, or send a preprint straight into the ingest pipeline.
bioinformatics
Use when working on bioinformatics toolchains such as alignment, variant calling, phylogenetics, or protein-structure analysis, especially when the agent must route across BLAST, minimap2, samtools, bcftools, MAFFT, IQ-TREE, or ESMFold.
review-response
Draft point-by-point responses to peer review comments. Locates supporting evidence from workspace papers and the original manuscript. Use when the user receives reviewer feedback and needs to write a rebuttal or revision response letter.
Didn't find tool you were looking for?