Loop v2026.01.29 - Beta Release

jakolehm · January 29, 2026, 12:33pm

This beta introduces datasets and evaluation workflows, enabling you to organize LLM spans, run evaluations with custom evaluators, compare alternative responses with remix, and track results over time.

Features & Enhancements

Datasets — Create and manage datasets of LLM spans with full CRUD operations, version history, and multi-row selection. Use datasets to organize spans for evaluation and comparison workflows.
Dataset Remix — Generate alternative LLM responses for dataset spans using different models or providers. Compare outputs side-by-side with inline expandable comparison view and track results in a leaderboard.
Evaluators & Evaluations — Define custom evaluators with prompt templates including built-in defaults. Run evaluations against spans or entire datasets with live streaming progress updates. Control evaluation runs with stop/restart
capabilities.
Manual Scoring — Manually score dataset spans with custom score titles for human-in-the-loop evaluation workflows.
Evaluation Results — View evaluation results with delta comparisons from previous runs, variance statistics per evaluator, and visual stat bars for quick insights.
Improved Onboarding — Redesigned welcome page with interactive demo project containing pre-seeded data and automatic navigator expansion for first-time users.

Fixes & Improvements

Column Persistence — Fixed column order and visibility not persisting correctly in spans table.
Timestamp Handling — Improved nanosecond timestamp handling across the codebase to prevent precision issues.
Cost Tracking — Fixed floating-point precision artifacts in cost calculations and improved cost chart accuracy.
UI Improvements — Various fixes for table styling, tooltip behavior, context menu focus, and layout stability.
Model Prices — Updated model pricing and context window data.

Loop v2026.01.29 - Beta Release

Features & Enhancements

Fixes & Improvements

Screenshots