Groundedness

Github (opens in a new tab) | Example Notebook (opens in a new tab)

❊ Info

💡

Note: this evaluator is very similar to Faithfulness but it returns a metric between 0 and 1.

This evaluator checks if the LLM-generated response is grounded in the provided context.

For many RAG apps, you want to constrain the response to the context you are providing it (since you know it to be true). But sometimes, the LLM might use its pretrained knowledge to generate an answer. This is often the cause of "Hallucinations".

How does it work?

For every sentence in the response, an LLM looks for evidence of that sentence in the context.
If it finds evidence, it gives that sentence a score of 1. If it doesn't, it gives it a score of 0.
The final score is the average of all the sentence scores.

View the source code on Github → (opens in a new tab)

Default Engine: gpt-3.5-turbo

Required Args

context: The context that your response should be grounded to
response: The LLM generated response

Metric:

Groundedness: Number of sentences in the response that are grounded in the context divided by the total number of sentences in the response.
- 0: None of the sentences in the response are grounded in the context
- 1: All of the sentences in the response are grounded in the context

Example

Context: Y Combinator was founded in March 2005 by Paul Graham and Jessica Livingston as a way to fund startups in batches. YC invests $500,000 in 200 startups twice a year.
Response: YC was founded by Paul Graham and Jessica Livingston. They invests $500k in 200 startups twice a year. In exchange, they take 7% equity.

🚫

Eval Result

Result: Fail
Score: 0.67
Explanation: There is no evidence of the following sentence in the context:
- "In exchange, they take 7% equity"

On Athina platform, in the Develop view, sentences that are not grounded in the context are highlighted in red.

▷ Run the eval on a dataset

Load your data with the RagLoader

from athina.loaders import RagLoader
 
# Load the data from CSV, JSON, Athina or Dictionary
dataset = RagLoader().load_json(json_file)

Run the evaluator on your dataset

from athina.evals import Groundedness
 
Groundedness().run_batch(data=dataset)

▷ Run the eval on a single datapoint

Groundedness().run(
    context=context,
    response=response
)

Conversation Evals Custom Evals