Top LLM evaluation AI tools

  • Kili Technology
    Kili Technology Build Better Data, Now.

    Kili Technology is a data-centric AI platform providing tools and services for creating high-quality training datasets and evaluating LLMs. It streamlines the data labeling process and offers expert-led project management for machine learning projects.

    • Freemium
  • EleutherAI
    EleutherAI Empowering Open-Source Artificial Intelligence Research

    EleutherAI is a research institute focused on advancing and democratizing open-source AI, particularly in language modeling, interpretability, and alignment. They train, release, and evaluate powerful open-source LLMs.

    • Free
  • Flow AI
    Flow AI The data engine for AI agent testing

    Flow AI accelerates AI agent development by providing continuously evolving, validated test data grounded in real-world information and refined by domain experts.

    • Contact for Pricing
  • TheFastest.ai
    TheFastest.ai Reliable performance measurements for popular LLM models.

    TheFastest.ai provides reliable, daily updated performance benchmarks for popular Large Language Models (LLMs), measuring Time To First Token (TTFT) and Tokens Per Second (TPS) across different regions and prompt types.

    • Free
  • Radicalbit
    Radicalbit Your ready-to-use MLOps platform for Machine Learning, Computer Vision, and LLMs.

    Radicalbit is an MLOps and AI Observability platform that accelerates deployment, serving, observability, and explainability of AI models. It offers real-time data exploration, outlier and drift detection, and model monitoring.

    • Contact for Pricing
  • Braintrust
    Braintrust The end-to-end platform for building world-class AI apps.

    Braintrust provides an end-to-end platform for developing, evaluating, and monitoring Large Language Model (LLM) applications. It helps teams build robust AI products through iterative workflows and real-time analysis.

    • Freemium
    • From 249$
  • Evidently AI
    Evidently AI Collaborative AI observability platform for evaluating, testing, and monitoring AI-powered products

    Evidently AI is a comprehensive AI observability platform that helps teams evaluate, test, and monitor LLM and ML models in production, offering data drift detection, quality assessment, and performance monitoring capabilities.

    • Freemium
    • From 50$
  • Conviction
    Conviction The Platform to Evaluate & Test LLMs

    Conviction is an AI platform designed for evaluating, testing, and monitoring Large Language Models (LLMs) to help developers build reliable AI applications faster. It focuses on detecting hallucinations, optimizing prompts, and ensuring security.

    • Freemium
    • From 249$
  • EvalsOne
    EvalsOne Evaluate LLMs & RAG Pipelines Quickly

    EvalsOne is a platform for rapidly evaluating Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines using various metrics.

    • Freemium
    • From 19$
  • Showing results 19 out of 9

    Didn't find tool you were looking for?

    Be as detailed as possible for better results
    EliteAi.tools logo

    Elite AI Tools

    EliteAi.tools is the premier AI tools directory, exclusively featuring high-quality, useful, and thoroughly tested tools. Discover the perfect AI tool for your task using our AI-powered search engine.

    Subscribe to our newsletter

    Subscribe to our weekly newsletter and stay updated with the latest high-quality AI tools delivered straight to your inbox.

    © 2025 EliteAi.tools. All Rights Reserved.