LLaMA 2 — It’s Time to Upgrade your Open-Source LLM

Data Science

If you’ve been using fine-tuned open-source LLMs (e.g. for generative A.I. functionality or natural-language conversations with your users), it’s very likely time you switch your starting model over to Llama 2.

Here’s why:

• It’s open-source and, unlike the original LLaMA, can be used commercially.

• Like the Alpaca and Vicuña models that used LLaMA 1 as pretrained starting point, the “Llama 2-chat” variants are fine-tuned for chat applications (using a data set of over 1 million human annotations).

• For both pre-trained and chat-fine-tuned variants, the Llama 2 model family has four sizes: 7 billion, 13 billion (fits on a single GPU), 34 billion (not released publicly) and 70 billion model parameters (best performance on NLG benchmark tasks).

• The 70B chat-fine-tuned variant offers ChatGPT-level performance on a broad range of natural-language benchmarks (it’s the first open-source model to do this convincingly; you can experience this yourself via the free Hugging Face chat interface where Llama-2-70B-chat has become the default) and is generally now the leading open-source LLM.

• See the Llama 2 page for a table of details across 11 external benchmarks, which (according to Meta themselves so perhaps take with a grain of salt) shows how 13B Llama 2 is comparable to 40B Falcon, the previous top-ranked open-source LLM across a range of benchmarks. The 70B Llama 2 sets the new state of the art, on some benchmarks by a considerable margin (N.B.: on tasks involving code or math, Llama 2 is not necessarily the best open-source option out there, however.)

• Time awareness: “Is earth flat or round?” in 2023 versus “in 800 CE context” relates to different answers.

• Has double the context window (4k tokens) of the original LLaMA, which is a big jump from about eight pages to 16 pages of context.

• Uses a two-stage RLHF (reinforcement learning from human feedback) approach that is key to its outstanding generative capacity.

• A new method called “Ghost Attention” (GAtt) allows it to perform especially well in “multi-turn” (ongoing back and forth) conversation.

• Extensive safety and alignment testing (probably more extensive than any other open-source LLM), including (again, Meta self-reported) charts from the Llama 2 technical paper showing A.I. safety violation percentages far below any other open-source LLM and even better than ChatGPT. (The exception being the 34B Llama 2 model, which perhaps explains why this is the only Llama 2 model size that Meta didn’t release publicly.)

Like Hugging Face, at my company Nebula.io we’ve switched to Llama 2 as the starting point for our task-specific fine-tuning and have been blown away.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Getting Value From A.I.

In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on “Getting Value from A.I.” to open the second day of Hg Capital’s “Digital Forum” in London.

read full post

The Chinchilla Scaling Laws

The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.

read full post

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU

The folks who open-sourced Stable Diffusion have now released “StableLM”, their first Language Models. Pre-trained on an unprecedented amount of data for single-GPU LLMs (1.5 trillion tokens!), these are small but mighty.

read full post