LLM evaluation platform - AI tools

  • PromptsLabs
    PromptsLabs A Library of Prompts for Testing LLMs

    PromptsLabs is a community-driven platform providing copy-paste prompts to test the performance of new LLMs. Explore and contribute to a growing collection of prompts.

    • Free
  • Laminar
    Laminar The AI engineering platform for LLM products

    Laminar is an open-source platform that enables developers to trace, evaluate, label, and analyze Large Language Model (LLM) applications with minimal code integration.

    • Freemium
    • From 25$
  • Autoblocks
    Autoblocks Improve your LLM Product Accuracy with Expert-Driven Testing & Evaluation

    Autoblocks is a collaborative testing and evaluation platform for LLM-based products that automatically improves through user and expert feedback, offering comprehensive tools for monitoring, debugging, and quality assurance.

    • Freemium
    • From 1750$
  • Agenta
    Agenta End-to-End LLM Engineering Platform

    Agenta is an LLM engineering platform offering tools for prompt engineering, versioning, evaluation, and observability in a single, collaborative environment.

    • Freemium
    • From 49$
  • LangWatch
    LangWatch Monitor, Evaluate & Optimize your LLM performance with 1-click

    LangWatch empowers AI teams to ship 10x faster with quality assurance at every step. It provides tools to measure, maximize, and easily collaborate on LLM performance.

    • Paid
    • From 59$
  • Humanloop
    Humanloop The LLM evals platform for enterprises to ship and scale AI with confidence

    Humanloop is an enterprise-grade platform that provides tools for LLM evaluation, prompt management, and AI observability, enabling teams to develop, evaluate, and deploy trustworthy AI applications.

    • Freemium
  • Libretto
    Libretto LLM Monitoring, Testing, and Optimization

    Libretto offers comprehensive LLM monitoring, automated prompt testing, and optimization tools to ensure the reliability and performance of your AI applications.

    • Freemium
    • From 180$
  • GPT–LLM Playground
    GPT–LLM Playground Your Comprehensive Testing Environment for Language Learning Models

    GPT-LLM Playground is a macOS application designed for advanced experimentation and testing with Language Learning Models (LLMs). It offers features like multi-model support, versioning, and custom endpoints.

    • Free
  • Langfuse
    Langfuse Open Source LLM Engineering Platform

    Langfuse provides an open-source platform for tracing, evaluating, and managing prompts to debug and improve LLM applications.

    • Freemium
    • From 59$
  • EleutherAI
    EleutherAI Empowering Open-Source Artificial Intelligence Research

    EleutherAI is a research institute focused on advancing and democratizing open-source AI, particularly in language modeling, interpretability, and alignment. They train, release, and evaluate powerful open-source LLMs.

    • Free
  • Didn't find tool you were looking for?

    Be as detailed as possible for better results
    EliteAi.tools logo

    Elite AI Tools

    EliteAi.tools is the premier AI tools directory, exclusively featuring high-quality, useful, and thoroughly tested tools. Discover the perfect AI tool for your task using our AI-powered search engine.

    Subscribe to our newsletter

    Subscribe to our weekly newsletter and stay updated with the latest high-quality AI tools delivered straight to your inbox.

    © 2025 EliteAi.tools. All Rights Reserved.