Member-only story

Leveraging RAGAS to Evaluate a Robust Medical ChatBot with Retrieval-Augmented Generation (RAG)

Abhik Seal
11 min readSep 10, 2024

In today’s fast-paced healthcare environment, leveraging artificial intelligence (AI) to assist in medical decision-making has become increasingly important. This blog post explores a Medical ChatBot, which is designed to evaluate and visualize various performance metrics of medical AI models, provides valuable insights into model accuracy, relevance, and overall effectiveness. In this blog post, we’ll take a deep dive into how ChatBot achieves these goals through the RAGAS evaluation tools which has been instrumental in providing a comprehensive, multi-faceted view of our chatbot’s performance. It has highlighted our results strengths in faithfulness and context precision, while also pointing out areas for improvement, particularly in answer relevancy and context recall.

For the current post we evaluated a medical question-answering (QA) model using the PubMed QA dataset. PubMed QA is a dataset composed of medical questions and corresponding answers, usually extracted from biomedical literature. This type of dataset is commonly used to train and evaluate models designed to answer medical or biomedical questions. We leverage pinecone, which is a vector database service that helps in managing and querying large sets of vector embeddings efficiently. In order to used it one needs to…

--

--

Abhik Seal
Abhik Seal

Written by Abhik Seal

Data Science / Cheminformatician x-AbbVie , I try to make complicated things looks easier and understandable www.linkedin.com/in/abseal/

No responses yet