Large Language Model Leaderboards and Benchmarks

Data Science

Llamas, Alpacas, Koalas, Falcons… there is a veritable zoo of LLMs out there! In this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, Caterina Constantinescu breaks down the LLM Leaderboards and evaluation benchmarks to help you pick the right LLM for your use case.

Caterina:
• Is a Principal Data Consultant at GlobalLogic, a full-lifecycle software development services provider with over 25,000 employees worldwide.
• Previously, she worked as a data scientist for financial services and marketing firms.
• Is a key player in data science conferences and Meetups in Scotland.
• Holds a PhD from The University of Edinburgh.

In this episode, Caterina details:
• The best leaderboards (e.g., HELM, Chatbot Arena and the Hugging Face Open LLM Leaderboard) for comparing the quality of both open-source and proprietary Large Language Models (LLMs).
• The advantages and issues associated with LLM evaluation benchmarks (e.g., evaluation dataset contamination is an big issue because the top-performing LLMs are often trained on all the publicly available data they can find… including benchmark-evaluation datasets).

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Getting Value From A.I.

In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on “Getting Value from A.I.” to open the second day of Hg Capital’s “Digital Forum” in London.

read full post

The Chinchilla Scaling Laws

The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.

read full post

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU

The folks who open-sourced Stable Diffusion have now released “StableLM”, their first Language Models. Pre-trained on an unprecedented amount of data for single-GPU LLMs (1.5 trillion tokens!), these are small but mighty.

read full post