GPT-4 Has Arrived

Data Science

Nebula’s Chief Data Scientist, Jon Krohn, talks about OpenAI’s release of GPT-4 in SuperDataScience episode #666 — appropriate for an algorithm that has folks (quixotically) signing a letter to pause all A.I. development. This first episode in Jon’s GPT-4 trilogy; in ten minutes, he introduces GPT-4’s staggering capabilities.

A Leap in AI Safety and Accuracy

GPT-4 marks a significant advance over its predecessor, GPT-3.5, in terms of both safety and factual accuracy. It is reportedly 82% less likely to respond with disallowed content and 40% more likely to produce factually correct responses. Despite improvements, challenges like sociodemographic biases and hallucinations persist, although they are considerably reduced.

Academic and Professional Exam Performance

The prowess of GPT-4 becomes evident when revisiting queries initially tested on GPT-3.5. Its ability to summarize complex academic content accurately and its human-like response quality are striking. In one test, GPT-4’s output was mistaken for human writing by GPTZero, an AI detection tool, underscoring its sophistication. In another test, the uniform bar exam, GPT-4 scored in the 90th percentile, a massive leap from GPT-3.5’s 10th percentile.


GPT-4 introduces multimodality, handling both language and visual inputs. This capability allows for innovative interactions, like recipe suggestions based on fridge contents or transforming drawings into functional websites. This visual aptitude notably boosted its performance in exams like the Biology Olympiad, where GPT-4 scored in the 99th percentile.

The model also demonstrates proficiency in numerous languages, including low-resource ones, outperforming other major models in most languages tested. This linguistic versatility extends to its translation capabilities between these languages.

The Secret Behind GPT-4’s Success

While OpenAI has not disclosed the exact number of model parameters in GPT-4, it’s speculated that they significantly exceed GPT-3’s 175 billion. This increase, coupled with more and better-curated training data, and the ability to handle vastly more context (up to 32,000 tokens), are likely contributors to GPT-4’s enhanced performance.

Reinforcement Learning from Human Feedback (RLHF)

GPT-4 incorporates RLHF, a method that refines its output based on user feedback, allowing it to align more closely with desired responses. This approach has already proven effective in previous models like InstructGPT.

GPT-4 represents a monumental step in AI development, balancing unprecedented capabilities with improved safety measures. Its impact is far-reaching, offering new possibilities in various fields and highlighting the importance of responsible AI development and use. As we continue to explore its potential, the conversation around AI safety and ethics becomes increasingly vital.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at


Getting Value From A.I.

In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on “Getting Value from A.I.” to open the second day of Hg Capital’s “Digital Forum” in London.

read full post

The Chinchilla Scaling Laws

The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.

read full post

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU

The folks who open-sourced Stable Diffusion have now released “StableLM”, their first Language Models. Pre-trained on an unprecedented amount of data for single-GPU LLMs (1.5 trillion tokens!), these are small but mighty.

read full post