Loop v2026.01.29 - Beta Release

This beta introduces datasets and evaluation workflows, enabling you to organize LLM spans, run evaluations with custom evaluators, compare alternative responses with remix, and track results over time.

Features & Enhancements

  • Datasets — Create and manage datasets of LLM spans with full CRUD operations, version history, and multi-row selection. Use datasets to organize spans for evaluation and comparison workflows.
  • Dataset Remix — Generate alternative LLM responses for dataset spans using different models or providers. Compare outputs side-by-side with inline expandable comparison view and track results in a leaderboard.
  • Evaluators & Evaluations — Define custom evaluators with prompt templates including built-in defaults. Run evaluations against spans or entire datasets with live streaming progress updates. Control evaluation runs with stop/restart
    capabilities.
  • Manual Scoring — Manually score dataset spans with custom score titles for human-in-the-loop evaluation workflows.
  • Evaluation Results — View evaluation results with delta comparisons from previous runs, variance statistics per evaluator, and visual stat bars for quick insights.
  • Improved Onboarding — Redesigned welcome page with interactive demo project containing pre-seeded data and automatic navigator expansion for first-time users.

Fixes & Improvements

  • Column Persistence — Fixed column order and visibility not persisting correctly in spans table.
  • Timestamp Handling — Improved nanosecond timestamp handling across the codebase to prevent precision issues.
  • Cost Tracking — Fixed floating-point precision artifacts in cost calculations and improved cost chart accuracy.
  • UI Improvements — Various fixes for table styling, tooltip behavior, context menu focus, and layout stability.
  • Model Prices — Updated model pricing and context window data.

Screenshots

1 Like