Agent skill
unsloth-long-context
Install this agent skill to your Project
npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/unsloth-long-context
SKILL.md
Overview
Unsloth enables training on extreme context lengths (up to 89K+ on a single 80GB GPU) by utilizing manually derived Triton kernels for RoPE and attention. It optimizes memory usage by a further 30% compared to Flash Attention 2, allowing for 4x longer context windows.
When to Use
- When training on long documents, codebases, or books.
- When building models that require large retrieval windows or multi-document reasoning.
- When standard Flash Attention 2 results in OOM errors on long sequences.
Decision Tree
- Is context > 32K?
- Yes: Set
use_gradient_checkpointing = 'unsloth'(mandatory for stability).
- Yes: Set
- Are you seeing quality degradation on long context?
- Yes: Ensure your dataset includes samples with long-range dependencies and adjust RoPE base frequency.
- Using A100/H100 80GB?
- Yes: You can push context lengths toward 89K tiers.
Workflows
Setting Up Extreme Context Training
- Load model with high
max_seq_length(e.g., 65536+). - Ensure
use_gradient_checkpointing='unsloth'is passed toget_peft_model. - Use high-VRAM GPUs (A100/H100 80GB) to enable the highest context tiers.
RoPE Scaling Configuration
- Set
max_seq_lengthinfrom_pretrained; Unsloth automatically adjusts the base frequency internally. - Include samples with long dependencies in the dataset to prevent performance degradation.
- Increase batch size or accumulation to ensure sufficient tokens per step for stable long-range learning.
Non-Obvious Insights
- Unsloth's performance in long context comes from custom Triton kernels that handle RoPE scaling more efficiently than standard libraries, allowing for 13x longer context than the HF+FA2 combination.
- The 'unsloth' gradient checkpointing mode is not optional for long contexts; it is mandatory for sequences exceeding 32K to prevent activation memory from crashing the system.
- Flex Attention is an experimental feature in Unsloth that allows training massive models (like 120B) on reduced VRAM by optimizing the attention patterns specifically for memory efficiency.
Evidence
- "Unsloth supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2." Source
- "We cut memory usage by a further 30% and now support 4x longer context windows!" Source
Scripts
scripts/unsloth-long-context_tool.py: Script to initialize models with specific RoPE and context length settings.scripts/unsloth-long-context_tool.js: Utility to calculate token counts for long documents.
Dependencies
- unsloth
- triton
- torch
References
- [[references/README.md]]
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
agent-ops-spec
Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-testing
Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.
agent-ops-state
Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.
Didn't find tool you were looking for?