Agent skills
unsloth-long-context

Agent skill

unsloth-long-context

View SKILL.md on GitHub Repository

Stars 163

Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/unsloth-long-context

SKILL.md

Overview

Unsloth enables training on extreme context lengths (up to 89K+ on a single 80GB GPU) by utilizing manually derived Triton kernels for RoPE and attention. It optimizes memory usage by a further 30% compared to Flash Attention 2, allowing for 4x longer context windows.

When to Use

When training on long documents, codebases, or books.
When building models that require large retrieval windows or multi-document reasoning.
When standard Flash Attention 2 results in OOM errors on long sequences.

Decision Tree

Is context > 32K?
- Yes: Set use_gradient_checkpointing = 'unsloth' (mandatory for stability).
Are you seeing quality degradation on long context?
- Yes: Ensure your dataset includes samples with long-range dependencies and adjust RoPE base frequency.
Using A100/H100 80GB?
- Yes: You can push context lengths toward 89K tiers.

Workflows

Setting Up Extreme Context Training

Load model with high max_seq_length (e.g., 65536+).
Ensure use_gradient_checkpointing='unsloth' is passed to get_peft_model.
Use high-VRAM GPUs (A100/H100 80GB) to enable the highest context tiers.

RoPE Scaling Configuration

Set max_seq_length in from_pretrained; Unsloth automatically adjusts the base frequency internally.
Include samples with long dependencies in the dataset to prevent performance degradation.
Increase batch size or accumulation to ensure sufficient tokens per step for stable long-range learning.

Non-Obvious Insights

Unsloth's performance in long context comes from custom Triton kernels that handle RoPE scaling more efficiently than standard libraries, allowing for 13x longer context than the HF+FA2 combination.
The 'unsloth' gradient checkpointing mode is not optional for long contexts; it is mandatory for sequences exceeding 32K to prevent activation memory from crashing the system.
Flex Attention is an experimental feature in Unsloth that allows training massive models (like 120B) on reduced VRAM by optimizing the attention patterns specifically for memory efficiency.

Evidence

"Unsloth supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2." Source
"We cut memory usage by a further 30% and now support 4x longer context windows!" Source

Scripts

scripts/unsloth-long-context_tool.py: Script to initialize models with specific RoPE and context length settings.
scripts/unsloth-long-context_tool.js: Utility to calculate token counts for long documents.

Dependencies

unsloth
triton
torch

References

[[references/README.md]]

Maintainer

majiayu000 Core maintainer

Source details

Full Name: majiayu000/claude-skill-registry
Branch: main
Path in repo: skills/data/unsloth-long-context
License: MIT License

Featured Tools

Join Our Newsletter

Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.

Recommended Agent Skills

Expand your agent's capabilities with these related and highly-rated skills.

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-spec

Manage specification documents in .agent/specs/. Use when user provides requirements, acceptance criteria, or feature descriptions that need to be tracked and validated against implementation.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

163 31

Explore

majiayu000/claude-skill-registry

agent-ops-state

Maintain .agent state files. Use at session start, after meaningful steps, and before concluding: read/update constitution/memory/focus/issues/baseline consistently.

163 31

Explore

Didn't find tool you were looking for?

Search AI Tools

Install this agent skill to your Project

SKILL.md

Overview

When to Use

Decision Tree

Workflows

Setting Up Extreme Context Training

RoPE Scaling Configuration

Non-Obvious Insights

Evidence

Scripts

Dependencies

References

Recommended Agent Skills

agent-ops-spec

agent-ops-state

agent-ops-spec

agent-ops-testing

agent-ops-testing

agent-ops-state