Generative Deep Learning, with David Foster

Data Science

In this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, he is joined by bestselling author David Foster who provides a fascinating technical introduction to cutting-edge Generative A.I. concepts including variational autoencoders, diffusion models, contrastive learning, GANs and “world models”.

• Wrote the O’Reilly book “Generative Deep Learning”; the first edition from 2019 was a bestseller while the second edition was released just last week.
• Is a Founding Partner of Applied Data Science Partners, a London-based consultancy specialized in end-to-end data science solutions.
• Holds a Master’s in Mathematics from the University of Cambridge and a Master’s in Management Science and Operational Research from the University of Warwick.

This episode is deep in the weeds on generative deep learning pretty much from beginning to end and so will appeal most to technical practitioners like data scientists and ML engineers.

In the episode, David details:
• How generative modeling is different from the discriminatory modeling that dominated machine learning until just the past few months.
• The range of application areas of generative A.I.
• How autoencoders work and why variational autoencoders are particularly effective for generating content.
• What diffusion models are and how latent diffusion in particular results in photorealistic images and video.
• What contrastive learning is.
• Why “world models” might be the most transformative concept in A.I. today.
• What transformers are, how variants of them power different classes of generative models such as BERT architectures and GPT architectures, and how blending generative adversarial networks with transformers supercharges multi-modal models.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at

Getting Value From A.I.

In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on “Getting Value from A.I.” to open the second day of Hg Capital’s “Digital Forum” in London.

read full post

The Chinchilla Scaling Laws

The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.

read full post

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU

The folks who open-sourced Stable Diffusion have now released “StableLM”, their first Language Models. Pre-trained on an unprecedented amount of data for single-GPU LLMs (1.5 trillion tokens!), these are small but mighty.

read full post