Featherless favicon
Featherless Instant, Unlimited Hosting for Any Llama Model on HuggingFace

Featherless
Paid
From 10$

Home: https://recursal.ai

  • #hugging-face
  • #LLM hosting
  • #Serverless
  • #Model Deployment
  • #Open Source
  • #Language Models

What is Featherless?

Featherless offers a serverless AI hosting service that simplifies deploying models from Hugging Face. It provides subscribers access to an expanding library of Hugging Face models, with a focus on LLaMA-3-based models, including LLaMA-3 and QWEN-2.

The platform dynamically swaps out models, enabling rapid reconfiguration of infrastructure according to user workload. This allows efficient autoscaling and supports a large number of models, all available for inference in milliseconds.

Features

  • Instant Hosting: Deploy any Llama model from HuggingFace instantly.
  • Unlimited Tokens: No time cap on model usage as long as subscription remain.
  • Dynamic Model Swapping: Rapidly reconfigure infrastructure according to user workload.
  • FP8 Quantization: Maintains output quality while significantly improving inference speeds.
  • Privacy-Focused: No logging of chats, prompts, or completions.
  • Large Model Support: Offers support for large language models, including 70B+ parameter models.

Use Cases

  • Running various language models for experimentation.
  • Deploying language models for application development.
  • Accessing a large catalog of pre-trained models.
  • Testing and comparing different language models.
  • Integrating AI models into applications without managing servers.

FAQs

  • What does it cost?
    We offer two pricing plans at $10 and $25 a month. If the concurrency limits are too restrictive for genuine personal use, please reach out to us via our Discord.
  • Which model architectures are supported?
    At present, we support LLaMA-3-based models, including LLaMA-3 and QWEN-2. Note that QWEN-2 models are only supported up to 16,000 context length. We plan to add more architectures to our supported list soon.
  • How do I get new models added?
    Ping us on our Discord. We continuously onboard new models as they become available on Hugging Face. As we grow, we aim to automate this process to encompass all publicly available Hugging Face models with compatible architectures.
  • Are you running quantized models?
    Yes, we use FP8 quantization. After consulting with the community, we've found that this approach maintains output quality while significantly improving inference speeds.
  • Do you have a referral program?
    Yes! Refer a friend, and when they subscribe and add your email, both of you get $10 OFF your next monthly bill! Refer 12 of your friends and you can have a full year off our basic plan! (The discount stacks!) Details here.

Related Queries

Helpful for people in the following professions

Related Tools:

Didn't find tool you were looking for?

Be as detailed as possible for better results
EliteAi.tools logo

Elite AI Tools

EliteAi.tools is the premier AI tools directory, exclusively featuring high-quality, useful, and thoroughly tested tools. Discover the perfect AI tool for your task using our AI-powered search engine.

Subscribe to our newsletter

Subscribe to our weekly newsletter and stay updated with the latest high-quality AI tools delivered straight to your inbox.

© 2025 EliteAi.tools. All Rights Reserved.