Agent skill

multimodal-analysis

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/multimodal-analysis

SKILL.md

---name: multimodal-medical-imaging description: Analyzes medical images (X-ray, MRI, CT) using multimodal LLMs to identify anomalies and generate reports. license: MIT metadata: author: AI Group version: "1.0.0" compatibility:

  • system: Python 3.10+ allowed-tools:
  • run_shell_command
  • read_file

keywords:

  • multimodal-analysis
  • automation
  • biomedical measurable_outcome: execute task with >95% success rate. ---"

Multimodal Medical Imaging Analysis

The Multimodal Medical Imaging Analysis Skill leverages state-of-the-art Vision-Language Models (VLMs) like Gemini 1.5 Pro and GPT-4o to interpret medical imagery alongside clinical text.

When to Use This Skill

  • When you need a preliminary screening of medical images.
  • When correlating visual findings with textual clinical notes.
  • To generate structured reports (DICOM-SR-like) from raw images.

Core Capabilities

  1. Anomaly Detection: Identify potential pathologies in X-rays, CTs, etc.
  2. Report Generation: Draft radiology reports in standard formats.
  3. VQA (Visual Question Answering): Answer specific questions about an image (e.g., "Is there a fracture in the left femur?").

Workflow

  1. Input: Provide an image file path (JPG, PNG) and a specific clinical question or "generate report" instruction.
  2. Analyze: The agent sends the image and prompt to the VLM.
  3. Output: Returns a JSON object with findings, confidence scores, and reasoning.

Example Usage

User: "Analyze this chest X-ray for pneumonia."

Agent Action:

bash
python3 Skills/Clinical/Medical_Imaging/Multimodal_Analysis/multimodal_agent.py \
    --image "/path/to/cxr.jpg" \
    --prompt "Check for signs of pneumonia and consolidation."

Didn't find tool you were looking for?

Be as detailed as possible for better results