Agent skill
multimodal-analysis
Stars
163
Forks
31
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/multimodal-analysis
SKILL.md
---name: multimodal-medical-imaging description: Analyzes medical images (X-ray, MRI, CT) using multimodal LLMs to identify anomalies and generate reports. license: MIT metadata: author: AI Group version: "1.0.0" compatibility:
- system: Python 3.10+ allowed-tools:
- run_shell_command
- read_file
keywords:
- multimodal-analysis
- automation
- biomedical measurable_outcome: execute task with >95% success rate. ---"
Multimodal Medical Imaging Analysis
The Multimodal Medical Imaging Analysis Skill leverages state-of-the-art Vision-Language Models (VLMs) like Gemini 1.5 Pro and GPT-4o to interpret medical imagery alongside clinical text.
When to Use This Skill
- When you need a preliminary screening of medical images.
- When correlating visual findings with textual clinical notes.
- To generate structured reports (DICOM-SR-like) from raw images.
Core Capabilities
- Anomaly Detection: Identify potential pathologies in X-rays, CTs, etc.
- Report Generation: Draft radiology reports in standard formats.
- VQA (Visual Question Answering): Answer specific questions about an image (e.g., "Is there a fracture in the left femur?").
Workflow
- Input: Provide an image file path (JPG, PNG) and a specific clinical question or "generate report" instruction.
- Analyze: The agent sends the image and prompt to the VLM.
- Output: Returns a JSON object with findings, confidence scores, and reasoning.
Example Usage
User: "Analyze this chest X-ray for pneumonia."
Agent Action:
bash
python3 Skills/Clinical/Medical_Imaging/Multimodal_Analysis/multimodal_agent.py \
--image "/path/to/cxr.jpg" \
--prompt "Check for signs of pneumonia and consolidation."
Didn't find tool you were looking for?