Running Evals
There are a few ways to run evals using Athina:
- Run evals using Python SDK
- Run evals on a dataset using Athina Platform
- Compare 2 datasets side by side with evaluation metrics
- Run evals as real-time guardrails using
athina.guard()
- Configure evals to run continuously on Production Traces
Running evals programmatically using Python SDK
Here's a 2-minute video tutorial (opens in a new tab) showcasing how you can quickly run pre-built evals, and view the results on the dashboard.
The easiest way to get started is to use one of our Example Notebooks as a starting point.
For more detailed guides, you can follow the links below to get started running evals using Athina.
- Quick Start Guide
- Run an eval
- Run an eval suite
- Customize an eval
- View Results on Athina Dashboard
- Loading Data for Evals
Configure evals to run continuously on Production Traces
If you configure evaluations in the dashboard at https://app.athina.ai/evals/config (opens in a new tab), they will run automatically against all logged inferences that meet your filters.
Note: Logs may be sampled to ensure that evaluations run within your configured limits. You can adjust these limits in the Settings (opens in a new tab) page.
Note: Continuous evaluation is only available for paid plans. Contact hello@athina.ai to upgrade your plan.
Running evals as guardrails around inference using athina.guard()
This is useful if you want to run evaluations at inference time to prevent bad user queries or bad responses.
Keep in mind there may be latency impacts here. We recommend running only low-latency evaluations if you're using athina.guard()
.
Follow this example notebook (opens in a new tab)
Run a single eval manually from the Inference (Trace) page.
- Open the inference you want to evaluate, and click the "Run Eval" button (located towards the top-right).
- Choose the evaluation you want to run (Note: function evals cannot be run from the inference page).
- Choose the LLM engine for your evaluation.
Eval results will appear shortly in the Evals tab on the right.