Code Llama

Data Science

Meta’s Llama 2 offered state-of-the-art performance for an “open-source”* LLM… except on tasks involving code. Now Code Llama is here and it magnificently fills that gap by outperforming all other open-source LLMs on coding benchmarks.

Key Takeaways

Code Llama can revolutionize how we code, making workflows faster and more efficient for data scientists and other folks who write software. And, like Llama 2, Code Llama is free for both research and commercial use.


Code Llama is trained on 500 billion tokens of code and related data. With its “infilling code training,” it excels at filling gaps in code. For instance, if you provide a prompt like “generate a neural network in PyTorch for object detection,” Code Llama will produce the necessary code.


Code Llama supports a wide range of programming languages, including Python, C++, Java, Javascript, and Bash.

Model Sizes

Code Llama comes in three sizes: 7B, 13B, and 34B parameters. The largest model (34B) offers the most robust coding support, whereas the smaller models are suitable for real-time applications and can run on a single GPU.

Specialized Versions

At each of the three model sizes, Code Llama also comes in specialized versions:
• “Code Llama — Python” is trained on an extra 100 billion Python tokens.
• “Code Llama — Instruct” is fine-tuned on natural-language instructions.

Which Model is For You?

The size of the model should be chosen based on your specific needs balancing performance, cost, and inference time. Specialized versions should be chosen based on use-cases, e.g.:
• “Code Llama — Python” is ideal for Python projects like most data science.
• “Code Llama — Instruct” is recommended for natural-language applications.


On benchmarks Meta tested themselves (grain of salt!):
• Even the smallest (7B) Code Llama model outperforms the 70B Llama 2 on most coding tasks.
• The 34B model outperforms all open-source LLMs and is comparable to GPT-3.5, but falls short of GPT-4.

Why Use It?

• Code Llama can be used to build tools similar to GitHub Copilot without sharing proprietary code with third parties.
• Fine-tuning Code Llama for specific use-cases is also possible, and it can be quite cheap thanks to techniques like PEFT LoRA.

*Llama 2 (and indeed Code Llama) isn’t fully/truly “open-source” because Meta didn’t release the data or code used to train the model, but they did make model weights publicly available (to anyone with <700m AMUs) so the term “open-source” distinguishes from proprietary models.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at


Getting Value From A.I.

In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on “Getting Value from A.I.” to open the second day of Hg Capital’s “Digital Forum” in London.

read full post

The Chinchilla Scaling Laws

The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.

read full post

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU

The folks who open-sourced Stable Diffusion have now released “StableLM”, their first Language Models. Pre-trained on an unprecedented amount of data for single-GPU LLMs (1.5 trillion tokens!), these are small but mighty.

read full post