---name: multimodal-medical-imaging description: Analyzes medical images (X-ray, MRI, CT) using multimodal LLMs to identify anomalies and generate reports. license: MIT metadata: author: AI Group version: "1.0.0" compatibility:

system: Python 3.10+ allowed-tools:
run_shell_command
read_file

keywords:

multimodal-analysis
automation
biomedical measurable_outcome: execute task with >95% success rate. ---"

Multimodal Medical Imaging Analysis

The Multimodal Medical Imaging Analysis Skill leverages state-of-the-art Vision-Language Models (VLMs) like Gemini 1.5 Pro and GPT-4o to interpret medical imagery alongside clinical text.

When to Use This Skill

When you need a preliminary screening of medical images.
When correlating visual findings with textual clinical notes.
To generate structured reports (DICOM-SR-like) from raw images.

Core Capabilities

Anomaly Detection: Identify potential pathologies in X-rays, CTs, etc.
Report Generation: Draft radiology reports in standard formats.
VQA (Visual Question Answering): Answer specific questions about an image (e.g., "Is there a fracture in the left femur?").

Workflow

Input: Provide an image file path (JPG, PNG) and a specific clinical question or "generate report" instruction.
Analyze: The agent sends the image and prompt to the VLM.
Output: Returns a JSON object with findings, confidence scores, and reasoning.

Example Usage

User: "Analyze this chest X-ray for pneumonia."

Agent Action:

bash

python3 Skills/Clinical/Medical_Imaging/Multimodal_Analysis/multimodal_agent.py \
    --image "/path/to/cxr.jpg" \
    --prompt "Check for signs of pneumonia and consolidation."

Search AI Tools

multimodal-analysis

Install this agent skill to your Project

SKILL.md