UI: Run evals on a dataset

Run evals on a dataset on Athina Platform

This tutorial shows how to create a dataset, add rows, configure and run a suite of evals on Athina in 3 mins. (without writing any code)

Concepts

Datasets contain rows of data that you want to evaluate.

You can add rows to a dataset by uploading a JSONL file, sampling from production logs, or logging rows using our API or SDK.

Each row may contain any or all of the following fields:

  • query: string - this contains the user's input query
  • context: string[] – this contains string chunks from your retrieval
  • response: string - this contains the LLM generated response
  • expected_response: string - this contains a ground truth response to compare against (not required for most evals)

Different evals use different combinations of the above fields to evaluate the quality of your model.

For example, the ResponseFaithfulness eval requires response and context fields, while AnswerCompleteness eval requires query and response.

Steps

1. Create a new dataset

2. Add rows to your dataset

You may add rows to your dataset by:

3. Run evals on your dataset

  • Click the Evaluate button (located on the top right).
  • A sidebar will open, where you can configure a suite of evals to run on your dataset.

4. View eval results

Evaluation metrics will automatically show up as new columns in your dataset.