Petals
Run large language models at home, BitTorrent‑style.

What is Petals?

Petals introduces a collaborative approach to running large language models (LLMs). It allows users to operate demanding models such as Llama 3.1 (up to 405B parameters), Mixtral (8x22B), Falcon (40B+), and BLOOM (176B) without requiring high-end enterprise hardware. The system operates in a distributed, peer-to-peer manner, similar to BitTorrent. Users load a segment of the desired model onto their machine (compatible with consumer-grade GPUs or Google Colab) and connect to a network where other participants host the remaining parts.

This distributed structure facilitates inference speeds suitable for interactive applications like chatbots, achieving up to 6 tokens per second for Llama 2 (70B). Beyond standard inference, Petals offers enhanced flexibility compared to typical LLM APIs. It supports various fine-tuning methods, custom sampling techniques, and allows users to execute specific computational paths through the model or inspect its hidden states. This integration with PyTorch and 🤗 Transformers provides API-like convenience coupled with deep model access and control.

Features

  • Distributed LLM Execution: Runs large models across a network of user devices.
  • Support for Major LLMs: Compatible with Llama 3.1, Mixtral, Falcon, BLOOM, and others.
  • Consumer Hardware Compatibility: Operates on consumer-grade GPUs or Google Colab.
  • Interactive Inference Speed: Delivers speeds suitable for chatbots and interactive apps (e.g., up to 6 tokens/sec for Llama 2 70B).
  • Advanced Model Control: Allows fine-tuning, custom sampling, custom execution paths, and access to hidden states.
  • PyTorch & Transformers Integration: Offers flexibility through integration with popular ML frameworks.

Use Cases

  • Running large-scale language models on standard hardware.
  • Developing and testing interactive AI applications and chatbots.
  • Fine-tuning large language models for specific tasks.
  • Conducting AI research requiring deep access to model internals.
  • Collaboratively hosting and utilizing powerful AI models.
  • Experimenting with custom inference and sampling techniques.

Related Tools:

Blogs:

Didn't find tool you were looking for?

Be as detailed as possible for better results