Loading data for Evals

For this example, we will consider the RagLoader. Other loaders also work similarly.

You can load data for evals from a CSV, JSON, Python Dictionary, or directly from your logged inferences on Athina.

Loading from a JSON file

from athina.loaders import RagLoader
 
dataset = RagLoader().load_json(json_filename)

That's all you need to do to load your data!

To view the imported dataset as a pandas DataFrame:

pd.DataFrame(dataset)

Loading from a CSV file

from athina.loaders import RagLoader
 
dataset = RagLoader().load_csv(csv_filename)

Loading from a Python Dictionary

from athina.loaders import RagLoader
 
# Create batch dataset from list of dict objects
raw_data = [
    {
        "query": "What is the capital of Greece?",
        "context": "Greece is often called the cradle of Western civilization.",
        "response": "Athens",
    },
    {
        "query": "What is the price of a Tesla Model 3?",
        "context": "Tesla Model 3 is a fully electric car.",
        "response": "I cannot answer this question as prices vary from country to country.",
    },
    {
        "query": "What is a shooting star?",
        "context": "Black holes are stars that have collapsed under their own gravity. They are so dense that nothing can escape their gravitational pull, not even light.",
        "response": "A shooting star is a meteor that burns up in the atmosphere.",
    }
]
 
dataset = RagLoader().load_dict(raw_data)

Loading logged inferences from Athina

Instead of generating inferences, you can just load inferences that you have already logged to Athina.

# Load last 50 logged inferences
dataset = RagLoader().load_athina_inferences()

You can optionally apply filters to load a different subset of inferences from Athina

# Load 100 logged inferences matching these filters
filters = AthinaFilters(
    environment="production",
    prompt_slug="refund_prompt_5.4",
    language_model_id="gpt-4",
    customer_id="nike-usa",
    topic="refunds",
)
dataset = RagLoader().load_athina_inferences(filters=filters, limit=100)

Output Format

The output format will be different for different Loaders.

The RagLoader will return a List[RagDataPoint] type after you call the load function of choice.

class RagDataPoint(TypedDict):
    query: str
    context: str
    response: str

Bring your own eval Running Evals in CI/CD