StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU

Data Science

In the ever-evolving world of artificial intelligence, Stability AI has once again made headlines with its latest innovation in language models. Known for their widely popular text-to-image generators like Stable Diffusion, the company’s recent release of the first models from their open-source suite of StableLM language models marks a significant advancement in the AI domain. Let’s delve into what makes these models groundbreaking and how they are set to revolutionize the field.

The Concept Behind StableLM

StableLM, short for ‘Stable Language Model’, is an extension of Stability AI’s vision to create efficient, scalable, and accessible AI models. In line with their previous single GPU large language models, StableLM models are designed to be compact enough to be trained on a single large GPU. Remarkably, they can even be quantized for inference on a CPU, highlighting their efficiency.

Parallels with Other Models

StableLM models draw comparisons with other models in the AI landscape, such as GPT4All-J and Dolly 2.0. All three model families share a common trait of being suitable for commercial use. This opens up a realm of possibilities where businesses and individual developers can fine-tune these models for specific use cases, proprietary data, or unique customer needs.

Pre-training and Fine-Tuning

StableLM models undergo a two-step training process: pre-training and fine-tuning. Pre-training involves exposing the model to a broad range of natural language, allowing it to grasp the complexity and variety of human communication. In contrast, fine-tuning is more focused and specialized. Here, the models are refined using reinforcement learning from human feedback, tailoring their outputs to specific scenarios and enhancing their capabilities, especially in conversational AI. The “StableLM-Alpha” models, with their impressive 3 billion and 7 billion parameters, are fine-tuned using the Alpaca procedure from Stanford on five expansive datasets for conversational agents:
• Stanford’s Alpaca itself
• Nomic-AI’s gpt4all
• RyokoAI’s ShareGPT52K datasets
• Databricks labs’ Dolly
• Anthropic’s HH

The Edge of StableLM: Unprecedented Training Data

What sets StableLM apart is the sheer volume of training data used. For instance, the 3 billion parameter model in the StableLM family was trained using 800 billion tokens from a massive 1.5 trillion token dataset. This is significantly larger than datasets used for most single GPU LLMs and even surpasses the benchmark set by the Chinchilla Scaling Laws, which advocated for a training dataset size at least 20 times larger than the number of model parameters.

The Result: Impressive Model Performance

The result of such extensive training is a model family that not only matches the performance of GPT-3 but also approaches the capabilities of GPT-4 in certain benchmarks. The use of a large, diverse dataset in both pre-training and fine-tuning stages ensures that the StableLM models are robust, versatile, and highly intuitive.

Accessibility and Future Developments

In line with Stability AI’s commitment to openness, the StableLM models are open source and readily available for experimentation. This accessibility is a game-changer, allowing developers and businesses to interact with and modify these models to suit their specific needs. The company also plans to expand the StableLM family, with models ranging from 15 billion to 175 billion parameters in development.

All in all, this means that the new StableLM family could be the perfect starting point for you to develop your own proprietary LLM based on your own data and needs.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Getting Value From A.I.

In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on “Getting Value from A.I.” to open the second day of Hg Capital’s “Digital Forum” in London.

read full post

The Chinchilla Scaling Laws

The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.

read full post

The A.I. and Machine Learning Landscape, with Investor George Mathew

The A.I. and Machine Learning Landscape, with Investor George Mathew.

read full post