Agent skill

glmocr

Extract text from images using GLM-OCR API. Supports images and PDFs with high accuracy OCR, table recognition, formula extraction, and handwriting recognition. Use this skill whenever the user wants to extract text from images, perform OCR on pictures, scan documents, convert images to text, or process any image files to get their textual content.

View SKILL.md on GitHub Repository

Stars 304

Forks 22

Install this agent skill to your Project

npx add-skill https://github.com/zai-org/GLM-skills/tree/main/skills/glmocr

Metadata

Additional technical details for this skill

openclaw: { "emoji": "\ud83d\udcc4", "homepage": "https://github.com/zai-org/GLM-OCR/tree/main/skills/glmocr", "requires": { "env": [ "ZHIPU_API_KEY", "GLM_OCR_TIMEOUT" ], "bins": [ "python" ] }, "primaryEnv": "ZHIPU_API_KEY" }

SKILL.md

GLM-OCR Text Extraction Skill

Extract text from images and PDFs using the GLM-OCR layout parsing API.

When to Use

Extract text from images (PNG, JPG, PDF)
Convert screenshots to text
Process scanned documents
OCR photos containing text (including handwritten text)
Recognize tables and formulas in documents
User mentions "OCR", "文字识别", "文档解析"

Key Features

Table recognition: Detects and converts tables to Markdown format
Formula extraction: LaTeX format output
Handwriting support: Strong recognition for handwritten text
Local file & URL: Supports both local files and remote URLs

Resource Links

Resource	Link
Get API Key	https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
GitHub	https://github.com/zai-org/GLM-OCR

Prerequisites

ZHIPU_API_KEY configured (see Setup below)

Security Notes

No runtime package installation is performed by the scripts.
OCR requests use the fixed official GLM endpoint and do not accept custom API URLs.
Only ZHIPU_API_KEY (and optional timeout) is read from environment variables.

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

ONLY use GLM-OCR API - Execute the script python scripts/glm_ocr_cli.py
NEVER parse documents directly - Do NOT try to extract text yourself
NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
IF API fails - Display the error message and STOP immediately
NO fallback methods - Do NOT attempt text extraction any other way

Setup

Get your API key: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys

Configure:

bash

python scripts/config_setup.py setup --api-key YOUR_KEY

How to Use

Extract from URL

bash

python scripts/glm_ocr_cli.py --file-url "URL provided by user"

Extract from Local File

bash

python scripts/glm_ocr_cli.py --file /path/to/image.jpg

Save result to file (recommended)

bash

python scripts/glm_ocr_cli.py --file-url "URL" --output result.json

CLI Reference

python {baseDir}/scripts/glm_ocr_cli.py (--file-url URL | --file PATH) [--output FILE] [--pretty]

Parameter	Required	Description
`--file-url`	One of	URL to image/PDF
`--file`	One of	Local file path to image/PDF
`--output`, `-o`	No	Save result JSON to file
`--pretty`	No	Pretty-print JSON output

Response Format

json

{
  "ok": true,
  "text": "# Extracted text in Markdown...",
  "layout_details": [[...]],
  "result": { "raw_api_response": "..." },
  "error": null,
  "source": "/path/to/file.jpg",
  "source_type": "file"
}

Key fields:

ok — whether extraction succeeded
text — extracted text in Markdown (use this for display)
layout_details — layout analysis details
result — raw API response
error — error details on failure

Error Handling

API key not configured:

Error: ZHIPU_API_KEY not configured. Get your API key at: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys

→ Show exact error to user, guide them to configure

Authentication failed (401/403): API key invalid/expired → reconfigure

Rate limit (429): Quota exhausted → inform user to wait

File not found: Local file missing → check path

Reference

references/output_schema.md — detailed output format specification

Maintainer

zai-org Core maintainer

Source details

Full Name: zai-org/GLM-skills
Branch: main
Path in repo: skills/glmocr
License: Apache License 2.0
Topics: skills glm ocr multimodal vision

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

zai-org/GLM-skills

glmocr-handwriting

Official skill for recognizing handwritten text from images using ZhiPu GLM-OCR API. Supports various handwriting styles, languages, and mixed handwritten/printed content. Use this skill when the user wants to read handwritten notes, convert handwriting to text, or OCR handwritten documents.

304 22

Explore

zai-org/GLM-skills

glmv-prd-to-app

Build a complete, production-ready full-stack web application from PRD documents, prototype images, and resource files. Handles the entire pipeline: system design, database schema, seed data, backend API, frontend UI, visual verification against prototypes, and deployment script generation. Use this skill whenever the user: - Provides a PRD (product requirement document) and wants a working app built - Says things like "根据PRD开发", "build from PRD", "implement this product", "把需求文档做成应用", "develop this app from requirements" - Has prototype images + requirements and wants full-stack implementation - Wants to turn product specifications into a running web application - Mentions building an app from wireframes/mockups combined with a requirements doc Trigger this skill even if the user just says "帮我开发" or "build this" with PRD materials present in the working directory.

304 22

Explore

zai-org/GLM-skills

glmocr-table

Official skill for recognizing and extracting tables from images and PDFs into Markdown format using ZhiPu GLM-OCR API. Supports complex tables, merged cells, and multi-page documents. Use this skill when the user wants to extract tables, recognize spreadsheets, or convert table images to editable format.

304 22

Explore

zai-org/GLM-skills

glmv-doc-based-writing

Write a textual content based on given document(s) and requirements, using ZhiPu GLM-V multimodal model. Read and comprehend one or multiple documents (PDF/DOCX), write a content in Markdown format according to the specified requirements. Use when the user wants to draft a paper/article/essay/report/review/post/brief/proposal/plan, etc.

304 22

Explore

zai-org/GLM-skills

glmocr-formula

Official skill for recognizing and extracting mathematical formulas from images and PDFs into LaTeX format using ZhiPu GLM-OCR API. Supports complex equations, inline formulas, and formula blocks. Use this skill when the user wants to extract formulas, convert formula images to LaTeX, or OCR mathematical expressions.

304 22

Explore

zai-org/GLM-skills

glmv-caption

Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).

304 22

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

Metadata

SKILL.md

GLM-OCR Text Extraction Skill

When to Use

Key Features

Resource Links

Prerequisites

Security Notes

Setup

How to Use

Extract from URL

Extract from Local File

Save result to file (recommended)

CLI Reference

Response Format

Error Handling

Reference

Recommended Agent Skills

glmocr-handwriting

glmv-prd-to-app

glmocr-table

glmv-doc-based-writing

glmocr-formula

glmv-caption