Agent skill

unsloth-qlora-cuba6112-skillfactory

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/devops/unsloth-qlora-cuba6112-skillfactory

SKILL.md

Overview

Unsloth-qlora enables the fine-tuning of large-scale models (up to 70B parameters) on consumer-grade hardware. It utilizes dynamic 4-bit quantization which selectively preserves critical weights to maintain higher accuracy than standard quantization methods.

When to Use

  • When training on limited VRAM hardware (e.g., 24GB or 48GB cards).
  • When seeking to match full fine-tuning performance while using 4-bit precision.
  • When accuracy loss from standard BitsAndBytes quantization is unacceptable.

Decision Tree

  1. Do you need maximum VRAM savings?
    • Yes: Set load_in_4bit = True.
  2. Is accuracy the priority over VRAM?
    • Yes: Use LoRA (16-bit) if VRAM permits; otherwise use unsloth-bnb-4bit models.
  3. Are you training on all layers?
    • Yes: Target q, k, v, o, gate, up, down modules for optimal performance.

Workflows

  1. Setting Up QLoRA: Load models with the -unsloth-bnb-4bit suffix and initialize with load_in_4bit = True.
  2. Optimizing Batch Size: Use low per_device_train_batch_size (e.g., 2) with high gradient_accumulation_steps (e.g., 8) to maintain stability on low VRAM.
  3. Verifying Weight Updates: Compare pre and post-training tensors using MD5 hashes or absolute differences instead of standard np.allclose().

Non-Obvious Insights

  • Unsloth's Dynamic 4-bit quantization recovers approximately 70% of accuracy lost during standard quantization by preserving critical parameters.
  • Masking out input tokens and training specifically on assistant completions can boost QLoRA accuracy by roughly 1%.
  • To match full fine-tuning (FFT) performance, LoRA must be applied to all major linear layers (q, k, v, o, gate, up, down).

Evidence

  • "Unsloth dynamic 4-bit quants... consume slightly more VRAM than standard BitsAndBytes 4-bit models but offer significantly higher accuracy." Source
  • "QLoRA allows a 70B parameter model to fit in less than 48GB of VRAM." Source

Scripts

  • scripts/unsloth-qlora_tool.py: Script for 4-bit model loading and linear layer targeting.
  • scripts/unsloth-qlora_tool.js: Node.js helper for batch size calculation.

Dependencies

  • unsloth
  • bitsandbytes
  • accelerate

References

Didn't find tool you were looking for?

Be as detailed as possible for better results