Conversation Evals

Conversation Evaluators

Github (opens in a new tab)

❊ Info

Conversation evaluators are designed to assess entire chat histories, instead of individual messages in a conversation.

What are the different types of conversation evaluators?

  • Conversation Resolution: Checks each message in a conversation history to identify it was resolved by the LLM generated response.
  • Conversation Coherence: Checks each LLM generated response in a conversation history to see if it was coherent w.r.t the previous messages in the chat.

Required Args

Your dataset must contain these fields:

  • messages: An array of string messages in the conversation.

For example:

messages = [
    "Alice: Hi, I'm looking for a new smartphone with a good camera.",
    "AI: Hi Alice! Sure, I can help with that. Are you interested in any specific brand?",
    "Alice: I like Samsung and Apple, but I'm open to other brands if the camera is really good.",
    "AI: Both Samsung and Apple have excellent options. The Samsung Galaxy S21 and the iPhone 12 are known for their outstanding cameras.",
    "Alice: I've heard the iPhone 12 camera is great. How does it compare to others?",
    "AI: I'm happy to make some recommendations about cameras. Are you looking for a portable camera or a professional one?",
    "Alice: I was asking about the iPhone 12 camera vs other smartphone cameras.",
    "AI: My apologies for the confusion. To continue about the iPhone 12, it also offers Smart HDR 3 for improved photo quality in various lighting conditions.",
    "Alice: That sounds good. What's the battery life like?",
    "AI: The iPhone 12 offers up to 17 hours of video playback. Its A14 Bionic chip ensures efficient performance without compromising battery life.",
    "Alice: And what about the price?",
    "AI: The iPhone 12 starts at $799 for the base model. However, prices may vary depending on storage options and any promotions available.",
    "AI: Speaking of storage, would you also be interested in external hard drives? We have some that are compatible with smartphones.",
    "Alice: No, I'm just looking for information on the iPhone 12 right now.",
    "AI: Understood, let's keep our focus on the iPhone 12. If you have any more questions or need further details, feel free to ask.",
]

▷ Run Conversation Evals on a dataset

Load Raw Data

from athina.evals import ConversationCoherence, ConversationResolution
 
conversations = [
    {
        "messages": [
            "User: I'd like to buy a smartphone.",
            "AI: What kind of smartphone?",
            "User: An iPhone 14 Pro",
            "AI: How much storage do you need?",
            "User: 256GB",
            "AI: What color?",
            "User: White"
            "AI: Sounds good - I've loaded the item into your cart."
        ]
    },
    {
        "messages": [
            "User: ...",
            "AI: ...",
            "User: ...",
            "AI: ...",
        ]
    }
]
 
ConversationCoherence().run_batch(data=conversations)
ConversationResolution().run_batch(data=conversations)