Observing LLMs in Production to Automatically Catch Issues

In this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, he is joined by Amber Roberts and Xander Song who provide a technical deep dive into the major challenges (such as drift) that A.I. systems (particularly LLMs) face in production. They also detail solutions, such as open-source ML Observability tools.

Both Amber and Xander work at Arize AI, an ML observability platform that has raised over $60m in venture capital.

Amber:
• Serves as an ML Growth Lead at Arize, where she has also been an ML engineer.
• Prior to Arize, worked as an AI/ML product manager at Splunk and as the head of A.I. at Insight Data Science.
• Holds a Masters in Astrophysics from the Universidad de Chile in South America.

Xander:
• Serves as a developer advocate at Arize, specializing in their open-source projects.
• Prior to Arize, he spent three years as an ML engineer.
• Holds a Bachelors in Mathematics from UC Santa Barbara as well as a BA in Philosophy from the University of California, Berkeley.

This episode will appeal primarily to technical folks like data scientists and ML engineers, but they made an effort to break down technical concepts so that it’s accessible to anyone who’d like to understand the major issues that A.I. systems can develop once they’re in production as well as how to overcome these issues.

In the episode, Amber and Xander detail:
• The kinds of drift that can adversely impact a production A.I. system, with a particular focus on the issues that can affect Large Language Models (LLMs).
• What ML Observability is and how it builds upon ML Monitoring to automate the discovery and resolution of production A.I. issues.
• Open-source ML Observability options.
• How frequently production models should be retrained.
• How ML Observability relates to discovering model biases against particular demographic groups.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Observing LLMs in Production to Automatically Catch Issues

Data Science

Getting Value From A.I.

The Chinchilla Scaling Laws

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU