Agent skill

unsloth-vision

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/unsloth-vision

SKILL.md

Overview

Unsloth-vision provides optimized support for fine-tuning multimodal models like Llama 3.2 Vision and Qwen2.5 VL. It allows granular control over which layers (vision, language, or both) are updated and includes specialized data collators to handle image padding.

When to Use

When adapting models for specialized visual tasks like medical imaging or OCR-to-code.
When fine-tuning Llama 3.2 Vision models on consumer hardware.
When needing to train specifically on assistant responses in a multimodal context.

Decision Tree

Do you need to update visual feature extraction?
- Yes: Set finetune_vision_layers = True in get_peft_model.
Are your images varying in size?
- Yes: Standardize to 300-1000px and use UnslothVisionDataCollator.
Should the model ignore system/user prompts during loss calculation?
- Yes: Use train_on_responses_only = True in the collator.

Workflows

Vision Model Setup: Load models via FastVisionModel.from_pretrained and enable PEFT targeting all-linear modules.
Multimodal Dataset Preparation: Format data as 'user'/'assistant' conversations with {'type': 'image'} content and standardized dimensions (300-1000px).
Training for Response Accuracy: Initialize UnslothVisionDataCollator with train_on_responses_only = True and specified chat template headers.

Non-Obvious Insights

Unsloth allows selective fine-tuning of just the vision layers, just the language layers, or specific components like attention/MLP layers.
The UnslothVisionDataCollator automatically masks out padding vision tokens, which is essential for stabilizing loss during training.
Standardizing image resolution to the 300-1000px range is the optimal balance between detail preservation and VRAM efficiency.

Evidence

"To finetune vision models, we now allow you to select which parts of the mode to finetune... You can select to only finetune the vision layers, or the language layers..." Source
"It is best to ensure your dataset has images of all the same size/dimensions. Use dimensions of 300-1000px..." Source

Scripts

scripts/unsloth-vision_tool.py: Loading and configuring FastVisionModel.
scripts/unsloth-vision_tool.js: Dataset formatter for multimodal conversations.

Dependencies

unsloth
pillow (for image processing)
torch

References

references/README.md

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/unsloth-vision
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Overview

When to Use

Decision Tree

Workflows

Non-Obvious Insights

Evidence

Scripts

Dependencies

References

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state