LLM development tool - AI tools

BenchLLM The best way to evaluate LLM-powered apps

BenchLLM is a tool for evaluating LLM-powered applications. It allows users to build test suites, generate quality reports, and choose between automated, interactive, or custom evaluation strategies.

Other

Hegel AI Developer Platform for Large Language Model (LLM) Applications

Hegel AI provides a developer platform for building, monitoring, and improving large language model (LLM) applications, featuring tools for experimentation, evaluation, and feedback integration.

Contact for Pricing

PromptsLabs A Library of Prompts for Testing LLMs

PromptsLabs is a community-driven platform providing copy-paste prompts to test the performance of new LLMs. Explore and contribute to a growing collection of prompts.

Free

Laminar The AI engineering platform for LLM products

Laminar is an open-source platform that enables developers to trace, evaluate, label, and analyze Large Language Model (LLM) applications with minimal code integration.

Freemium
From 25$

LM Studio Discover, download, and run local LLMs on your computer

LM Studio is a desktop application that allows users to run Large Language Models (LLMs) locally and offline, supporting various architectures including Llama, Mistral, Phi, Gemma, DeepSeek, and Qwen 2.5.

Free

LMQL A programming language for LLMs.

LMQL is a programming language designed for large language models, offering robust and modular prompting with types, templates, and constraints.

Free

Flowise Build LLM Apps Easily - Open Source Low-Code Tool for LLM Orchestration

Flowise is an open-source low-code platform that enables developers to build customized LLM orchestration flows and AI agents through a drag-and-drop interface.

Freemium
From 35$

Conviction The Platform to Evaluate & Test LLMs

Conviction is an AI platform designed for evaluating, testing, and monitoring Large Language Models (LLMs) to help developers build reliable AI applications faster. It focuses on detecting hallucinations, optimizing prompts, and ensuring security.

Freemium
From 249$

GPT–LLM Playground Your Comprehensive Testing Environment for Language Learning Models

GPT-LLM Playground is a macOS application designed for advanced experimentation and testing with Language Learning Models (LLMs). It offers features like multi-model support, versioning, and custom endpoints.

Free

Inductor Streamline Production-Ready LLM Applications

Inductor enables developers to rapidly prototype, evaluate, and improve LLM applications, ensuring high-quality app delivery.

Freemium

lm-studio.me Local LLM Running & Download Platform

LM Studio is a user-friendly desktop application that allows users to run various large language models (LLMs) locally and offline, including Llama 2, PN3, Falcon, Mistral, StarCoder, and GEMMA models from Hugging Face.

Free

PromptMage A Python framework for simplified LLM-based application development

PromptMage is a Python framework that streamlines the development of complex, multi-step applications powered by Large Language Models (LLMs), offering version control, testing capabilities, and automated API generation.

Other

ModelBench No-Code LLM Evaluations

ModelBench enables teams to rapidly deploy AI solutions with no-code LLM evaluations. It allows users to compare over 180 models, design and benchmark prompts, and trace LLM runs, accelerating AI development.

Free Trial
From 49$

Superpipe The OSS experimentation platform for LLM pipelines

Superpipe is an open-source experimentation platform designed for building, evaluating, and optimizing Large Language Model (LLM) pipelines to improve accuracy and minimize costs. It allows deployment on user infrastructure for enhanced privacy and security.

Free

Gentrace Intuitive evals for intelligent applications

Gentrace is an LLM evaluation platform designed for AI teams to test and automate evaluations of generative AI products and agents. It facilitates collaborative development and ensures high-quality LLM applications.

Usage Based

Libretto LLM Monitoring, Testing, and Optimization

Libretto offers comprehensive LLM monitoring, automated prompt testing, and optimization tools to ensure the reliability and performance of your AI applications.

Freemium
From 180$

Rig Build Modular and Scalable LLM Applications in Rust

Rig is a Rust-based framework for building modular and scalable LLM applications. It offers a unified LLM interface, Rust-powered performance, and advanced AI workflow abstractions.

Free

W&B Weave A Framework for Developing and Deploying LLM-Based Applications

Weights & Biases (W&B) Weave is a comprehensive framework designed for tracking, experimenting with, evaluating, deploying, and enhancing LLM-based applications.

Other

Braintrust The end-to-end platform for building world-class AI apps.

Braintrust provides an end-to-end platform for developing, evaluating, and monitoring Large Language Model (LLM) applications. It helps teams build robust AI products through iterative workflows and real-time analysis.

Freemium
From 249$

Langfuse Open Source LLM Engineering Platform

Langfuse provides an open-source platform for tracing, evaluating, and managing prompts to debug and improve LLM applications.

Freemium
From 59$

aider AI Pair Programming in Your Terminal

Aider is a command-line tool that enables pair programming with LLMs to edit code in your local git repository. It supports various LLMs and offers top-tier performance on software engineering benchmarks.

Free

GeneratorLLMs Extracts core website content, creates structured text files, improves LLM comprehension, boosts search engine visibility, and delivers quality data for AI training and inference.

GeneratorLLMs is a tool that creates standardized `llms.txt` files by extracting core website content. This improves how Large Language Models (LLMs) understand websites and enhances AI visibility.

Free

DimBase Build LLM-powered apps with no code

DimBase is a no-code platform designed for building and deploying applications powered by large language models (LLMs), simplifying AI-driven tool creation.

Freemium
From 29$

LM-Kit Enterprise-Grade C# Toolkits for AI Agent Integration

LM-Kit provides .NET developers with tools for AI agent customization, creation, and orchestration. It enables multimodal generative AI systems integration in C# and VB.NET applications.

Paid
From 1000$

Langtail The low-code platform for testing AI apps

Langtail is a comprehensive testing platform that enables teams to test and debug LLM-powered applications with a spreadsheet-like interface, offering security features and integration with major LLM providers.

Freemium
From 99$

Agenta End-to-End LLM Engineering Platform

Agenta is an LLM engineering platform offering tools for prompt engineering, versioning, evaluation, and observability in a single, collaborative environment.

Freemium
From 49$

promptfoo Test & secure your LLM apps with open-source LLM testing

promptfoo is an open-source LLM testing tool designed to help developers secure and evaluate their language model applications, offering features like vulnerability scanning and continuous monitoring.

Freemium

LLM Token Counter Secure client-side token counting for popular language models

A sophisticated token counting tool that helps users manage token limits across multiple language models including GPT-4, Claude-3, and Llama-3, with client-side processing for maximum security.

Free

Keywords AI LLM monitoring for AI startups

Keywords AI is a comprehensive developer platform for LLM applications, offering monitoring, debugging, and deployment tools. It serves as a Datadog-like solution specifically designed for LLM applications.

Freemium
From 7$

docs.litellm.ai Unified Interface for Accessing 100+ LLMs

LiteLLM provides a simplified and standardized way to interact with over 100 large language models (LLMs) using a consistent OpenAI-compatible input/output format.