Running Evals

There are a few ways to run evals using Athina:

Run evals using Python SDK
Run evals on a dataset using Athina Platform
Compare 2 datasets side by side with evaluation metrics
Run evals as real-time guardrails using athina.guard()
Configure evals to run continuously on Production Traces

Running evals programmatically using Python SDK

💡

Here's a 2-minute video tutorial (opens in a new tab) showcasing how you can quickly run pre-built evals, and view the results on the dashboard.

The easiest way to get started is to use one of our Example Notebooks as a starting point.

For more detailed guides, you can follow the links below to get started running evals using Athina.

Configure evals to run continuously on Production Traces

If you configure evaluations in the dashboard at https://app.athina.ai/evals/config (opens in a new tab), they will run automatically against all logged inferences that meet your filters.

How to configure automatic evals

Note: Logs may be sampled to ensure that evaluations run within your configured limits. You can adjust these limits in the Settings (opens in a new tab) page.

Note: Continuous evaluation is only available for paid plans. Contact hello@athina.ai to upgrade your plan.

Running evals as guardrails around inference using `athina.guard()`

This is useful if you want to run evaluations at inference time to prevent bad user queries or bad responses.

Keep in mind there may be latency impacts here. We recommend running only low-latency evaluations if you're using athina.guard().

Follow this example notebook (opens in a new tab)

Run a single eval manually from the Inference (Trace) page.

Open the inference you want to evaluate, and click the "Run Eval" button (located towards the top-right).
Choose the evaluation you want to run (Note: function evals cannot be run from the inference page).
Choose the LLM engine for your evaluation.

Eval results will appear shortly in the Evals tab on the right.

Run an eval manually from the inference page

Why Athina Evals Python: Run a single eval

Running Evals

Running evals programmatically using Python SDK

Configure evals to run continuously on Production Traces

Running evals as guardrails around inference using athina.guard()

Run a single eval manually from the Inference (Trace) page.

Running evals as guardrails around inference using `athina.guard()`