INSIGHTS

Getting Value From A.I.

In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on "Getting Value from A.I." to open the second day of Hg Capital's "Digital Forum" in London.
Read full post

The Chinchilla Scaling Laws

The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.
Read full post

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU

The folks who open-sourced Stable Diffusion have now released "StableLM", their first Language Models. Pre-trained on an unprecedented amount of data for single-GPU LLMs (1.5 trillion tokens!), these are small but mighty.
Read full post

The A.I. and Machine Learning Landscape, with Investor George Mathew

The A.I. and Machine Learning Landscape, with Investor George Mathew.
Read full post

Open-source “ChatGPT”: Alpaca, Vicuña, GPT4All-J, and Dolly 2.0

Want a GPT-4-style model on your own hardware and fine-tuned to your proprietary language-generation tasks?
Read full post

Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)

Large Language Models (LLMs) are capable of extraordinary NLP feats, but are so large that they're too expensive for most organizations to train. The solution is Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA).
Read full post

Llama: GPT-3 Performance, 10X Smaller

By training (relatively) small LLMs for (much) longer, Meta AI's LLaMA architectures achieve GPT-3-like outputs at as little as a thirteenth of GPT-3's size. This means cost savings and much faster execution time.
Read full post

Harnessing GPT-4 for Your Commercial Advantage

How you can leverage GPT-4 to your commercial benefit.
Read full post

How to Be Both Socially Impactful and Financially Successful in Your Data Career

This episode will appeal most to technical listeners that are keen to be outstanding data scientists or software engineers, particularly through engineering scalable ML products.
Read full post

GPT-4 Has Arrived

This first episode in Jon's GPT-4 trilogy; in ten minutes, he introduces GPT-4's staggering capabilities.
Read full post

Astonishing Cicero Negotiates and Builds Trust With Humans Using Natural Language

Meta AI's CICERO algorithm — which negotiates and build trust with humans to perform in the top decile at the game of Diplomacy — is (in our view) the most astounding A.I. feat yet.
Read full post

Designing Machine Learning Systems with Chip Huyen

Mega-bestselling author of the "Designing ML Systems" book, Chip Huyen, joined our Chief Data Scientist, Jon Krohn, to cover her top tips on designing ML systems!
Read full post

Digital Analytics with Avinash Kaushik

In this interview, Avinash Kaushik masterfully describes how A.I. is transforming analytics and how you can capitalize to deliver joy to your customers.
Read full post

Automating Industrial Machines with Data Science and the Internet of Things (IoT)

Mega-bestselling author of the "Designing ML Systems" book, Chip Huyen, joined our Chief Data Scientist, Jon Krohn, to cover her top tips on designing ML systems!
Read full post

Eliminating Hiring Bias From The Recruitment Process Through Machine Learning

With the ever-accelerating deluge of data — the amount of it stored on our planet doubles every 18 months — the capacity for machine learning to facilitate social benefits, such as increased job satisfaction, more extensive leisure time, and medical breakthroughs, accelerates as well.
Read full post

XGBoost: The Ultimate Classifier, with Matt Harrison

XGBoost is typically the most powerful ML option whenever you're working with structured data. In this SuperDataScience episode, our Chief Data Scientist, Jon Krohn, talks to world-leading XGBoost expert, Matt Harrison, on how it works and how to make the most of it.
Read full post

Business Intelligence Tools, with Mico Yuk

In this episode of SuperDataScience, join our Chief Data Scientist Jon alongside Mico Yuk, who pulls absolutely no punches in her assessment of, well, anything! ...but particularly about vendors in the business intelligence and data analytics space.
Read full post

Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller

In this episode of SuperDataScience, our Chief Data Scientist, Jon Krohn, hosts the wildly intelligent Dr. Matar Haller who introduces Contextual A.I. (which considers adjacent, often multimodal information when making inferences) as well as how to use ML to build moat around your company.
Read full post

Get More Language Context out of your LLM

The "context window" limits the number of words that can be input to (or output by) a given Large Language Model. Today's episode introduces FlashAttention, a trick that allows for much larger context windows.
Read full post

Open-Source “Responsible A.I.” Tools, with Ruth Yakubu

Ruth Yakubu details what Responsible A.I. is and open-source options for ensuring we deploy A.I. models — particularly the Generative variety that are rapidly transforming industries — responsibly.
Read full post

Generative Deep Learning, with David Foster

Bestselling author David Foster provides a fascinating technical introduction to cutting-edge Generative A.I. concepts including variational autoencoders, diffusion models, contrastive learning, GANs and "world models".
Read full post

Six Reasons Why Building LLM Products Is Tricky

Six big challenges when bringing LLMs to your users.
Read full post

Observing LLMs in Production to Automatically Catch Issues

In this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, he is joined by Amber Roberts and Xander Song who provide a technical deep dive into the major challenges (such as drift) that A.I. systems (particularly LLMs) face in production.
Read full post

Accelerators: Hardware Specialized for Deep Learning

This SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, and joined by Ron Diamant, is dedicated to the hardware we use to train and run A.I. models (particularly LLMs) such as GPUs, TPUs and AWS's Trainium and Inferentia chips.
Read full post

Lossless LLM Weight Compression: Run Huge Models on a Single GPU

To approach the capabilities of the top commercial LLMs like OpenAI’s GPT-4 and Anthropic’s Claude on a broad range of tasks, you may however need to use a much larger open-source LLM. So, wouldn’t it be nice if you could compress the size of these larger LLMs to be able to fit them on a single consumer GPU? Such compression would enable you to: 
Read full post

CatBoost: Powerful, efficient ML for large tabular datasets

CatBoost is making waves in open-source ML as it's often the top approach for tasks as diverse as classification, regression, ranking, and recommendation. This is especially so if working with tabular data that include categorical variables.
Read full post

NLP with Transformers, feat. Hugging Face’s Lewis Tunstall

Lewis Tunstall — brilliant author of the bestseller "NLP with Transformers" and an ML Engineer at Hugging Face — details how to train and deploy your own LLMs, the race for an open-source ChatGPT, and why RLHF leads to better models.
Read full post

The (Short) Path to Artificial General Intelligence, with Dr. Ben Goertzel

The luminary Dr. Ben Goertzel details how we could realize Artificial General Intelligence (AGI) in 3-7 years, why he's optimistic about the Artificial Super Intelligence (ASI) this would trigger, and what post-Singularity society could be like.
Read full post

How Firms Can Actually Adopt A.I., with Rehgan Avon

Rehgan Avon's DataConnect conference is this week and is getting rave reviews. In this SuperDataScience episode, hosted by our Chief Data Scientist, Jon Krohn, the silver-tongued entrepreneur details how organizations can successfully adopt A.I.
Read full post

Jon Krohn on Last Week in AI Episode #138

The only podcast our Chief Data Scientist, Jon Krohn, listens to, "Last Week in A.I.", had him on as co-host of episode #138. They had a lot of laughs recapping the big A.I. news, including the release of DALL-E 3 and reports that Google's forthcoming Gemini model crushes GPT-4.
Read full post

Jon Krohn on Last Week in AI Episode #130

The only podcast our Chief Data Scientist, Jon Krohn, listens to is "Last Week in A.I.". Tune-in to him co-hosting episode #130. Every episode surveys the previous week's biggest A.I. news; the biggest story in this episode being (of course) LLaMA 2.
Read full post

Generative A.I. without the Privacy Risks (with Prof. Raluca Ada Popa)

Consumers and enterprises dread that Generative A.I. tools like ChatGPT breach privacy by using convos as training data, storing PII and potentially surfacing confidential data as responses. Prof. Raluca Ada Popa has all the solutions.
Read full post

LLaMA 2 — It’s Time to Upgrade your Open-Source LLM

If you've been using fine-tuned open-source LLMs (e.g. for generative A.I. functionality or natural-language conversations with your users), it's very likely time you switch your starting model over to Llama 2. Here's why:
Read full post

How Data Happened: A History, with Columbia Prof. Chris Wiggins

Chris Wiggins — Chief Data Scientist at The New York Times and faculty at Columbia University — talks to our Chief Data Scientist, Jon Krohn, and provides an enthralling, witty and rich History of Data Science. 
Read full post

Jon’s “Generative A.I. with LLMs” Hands-on Training

Jon Krohn introduces my two-hour "Generative A.I with LLMs" training, which is packed with hands-on Python demos in Colab notebooks. It details open-source LLM (Hugging Face; PyTorch Lightning) and commercial (OpenAI API) options.
Read full post

Large Language Model Leaderboards and Benchmarks

Llamas, Alpacas, Koalas, Falcons... there is a veritable zoo of LLMs out there! In this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, Caterina Constantinescu breaks down the LLM Leaderboards and evaluation benchmarks to help you pick the right LLM for your use case.
Read full post

Vicuña, Gorilla, Chatbot Arena and Socially Beneficial LLMs, with Prof. Joey Gonzalez

Vicuna, Gorilla and the Chatbot Arena are all critical elements of the new open-source LLM ecosystem — the extremely knowledgeable and innovative Prof. Joseph Gonzalez is behind all of them. Get the details in this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn.
Read full post

ChatGPT Code Interpreter: 5 Hacks for Data Scientists

The ChatGPT Code Interpreter is surreal: It creates and executes Python code for whatever task you describe, debugs its own runtime errors, displays charts, does file uploads/downloads, and suggests sensible next steps all along the way.
Read full post

Big A.I. R&D Risks Reap Big Societal Rewards, with Meta’s Dr. Laurens van der Maaten

By making big research bets, the prolific Meta Senior Research Director Dr. Laurens van der Maaten has devised or supported countless world-changing machine-learning innovations across healthcare, climate change, privacy and more.
Read full post

Code Llama

Meta's Llama 2 offered state-of-the-art performance for an "open-source"* LLM... except on tasks involving code. Now Code Llama is here and it magnificently fills that gap by outperforming all other open-source LLMs on coding benchmarks.
Read full post

LangChain: Create LLM Applications Easily in Python

Kris Ograbek talks us through how to use LangChain to chat with previous episodes of SuperDataScience! 
Read full post

Jon Krohn on Resilient Recruiter Podcast #198

Our Chief Data Scientist, Jon Krohn, appears as a guest on The Resilient Recruiter Podcast, an excellent program that's in the top 1.5% of podcasts globally. In the episode, Jon discusses how A.I. will transform all work over the coming years, with a particular focus on Talent Acquisition.
Read full post

Jon Krohn on Recruiting Future Podcast #547

Our Chief Data Scientist, Jon Krohn, was a guest on The Recruiting Future Podcast. In it, he introduces the key Generative A.I.-related terms (e.g., Deep Learning, Transformers) and uses examples from Nebula to illustrate how Gen A.I. is overhauling industries.
Read full post

Llama 2, Toolformer and BLOOM: Open-Source LLMs with Meta’s Dr. Thomas Scialom

Thomas Scialom, PhD is behind many of the most popular Generative A.I. projects including Llama 2, the world's top open-source LLM. In this SuperDataScience episode hosted by Our Chief Data Scientist, Jon Krohn, the Meta A.I. researcher reveals the stories behind Llama 2 and what's in the works for Llama 3.
Read full post

Make Better Decisions with Data, with Dr. Allen Downey

Many-time bestselling author Allen Downey, talks about making better decisions with data, including how to prepare for Black Swan events and how your core beliefs will shift over your life.
Read full post

Jon Krohn on Ken’s Nearest Neighbors Podcast #168

Our Chief Data Scientist, Jon Krohn, joins Ken Jee in this episode of the "Ken's Nearest Neighbors" podcast — a popular weekly show that features data professionals digging into their backgrounds, motivations and philosophies.
Read full post

Computational Mathematics and Fluid Dynamics, with Prof. Margot Gerritsen

In this SuperDataScience episode hosted by Our Chief Data Scientist, Jon Krohn, the extremely intelligent and super delightful Prof. Margot Gerritsen returns to the show to introduce what Computational Mathematics is, detail countless real-world applications of it, and relate it to the field of data science.
Read full post

Quantum Machine Learning, with Dr. Amira Abbas

Brilliant, eloquent Dr. Amira Abbas introduces us to Quantum Machine Learning in this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn. She details the key concepts (like qubits), what's possible today (Quantum SVMs) and what the future holds (e.g., Quantum Neural Networks).
Read full post

Seven Factors for Successful Data Leadership

In this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn, he is joined by the jovial EIGHT-time book author, Ben Jones. In it, Ben covers the seven factors of successful data leadership — factors he's gleaned from administering his data literacy assessment to 1000s of professionals.
Read full post

Unmasking A.I. Injustice, with Dr. Joy Buolamwini

In this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn, he's joined by the inimitable Dr. Joy Buolamwini who reveals how she uncovered staggering racial and gender biases in widely used Amazon, Microsoft and IBM algorithms, the firms' varying (sometimes shocking) responses and how to address these A.I. issues.
Read full post

How GitHub Operationalizes AI for Teamwide Collaboration and Productivity, with GitHub COO Kyle Daigle

In this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn, the exceptionally passionate GitHub COO Kyle Daigle details how generative A.I. tools improve not only the way individuals work, but also dramatically transform the way people across entire firms collaborate.
Read full post

OpenAssistant: The Open-Source ChatGPT Alternative, with Dr. Yannic Kilcher

In this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn, Yannic Kilcher — famed Machine Learning YouTuber and creator of OpenAssistant, the best-known open-source conversational A.I., shares where the biggest A.I. opportunities are in the coming years.
Read full post

A.I. Product Management, with Google DeepMind’s Head of Product, Mehdi Ghissassi

The elite team at Google DeepMind cranks out one world-changing A.I. innovation after another. In this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn, their affable Head of Product Mehdi Ghissassi shares his wisdom on how to design and release successful A.I. products.
Read full post

Scikit-learn’s Past, Present and Future, with scikit-learn co-founder Dr. Gaël Varoquaux

In this episode of SuperDataScience, our Chief Data Scientist, Jon Krohn, traveled to Paris to interview Dr. Gael Varoquaux, co-founder of scikit-learn, the standard library for machine learning worldwide (downloaded over 1.4 million times PER DAY).
Read full post

Q*: OpenAI’s Rumored AGI Breakthrough

This SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, is all about a rumored new model out of OpenAI called Q* (pronounced “Q star”) that has been causing quite a stir, both for its purported role in Altmangate and its implications for Artificial General Intelligence (AGI).
Read full post

How to Integrate Generative A.I. Into Your Business, with Piotr Grudzień

Want to integrate Conversational A.I. ("chatbots") into your business and ensure it's a (profitable!) success?
Read full post

2024 Data Science Trend Predictions

What are the big A.I. trends going to be in 2024?
Read full post

Jon Krohn at AI4Talent

On January 24th, our Chief Data Scientist, Jon Krohn, will be on a panel at the "A.I. & The Future of Work" virtual conference discussing the ethical issues around the "double-edged sword" of A.I. in employment.
Read full post

Technical Intro to Transformers and LLMs, with Kirill Eremenko

For this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, the indefatigable SuperDataScience Founder Kirill Eremenko gives a detailed technical intro to Transformers and how they're scaled up to allow Large-Language Models like GPT-4, Llama 2 and Gemini to have their mind-blowing abilities.
Read full post

Jon Krohn on Tech Recruiter Podcast S03-Ep19

Our Chief Data Scientist, Jon Krohn, was a guest on The Recruiting Future Podcast. In it, he introduces the key Generative A.I.-related terms (e.g., Deep Learning, Transformers) and uses examples from Nebula to illustrate how Gen A.I. is overhauling industries.
Read full post

Data Science for Clean Energy, with Emily Pastewka

How can data science and machine learning power the transition toward a sustainable global economy? The ML leader (and exceptional communicator of technical concepts!) Emily Pastewka is our Chief Data Scientist, Jon Krohn's, SuperDataScience guest to fill us in on Green Data Science.
Read full post

A Simulation of Ed Donner: Fine-tuning an LLM on 240k Text Messages

 One of our Nebula co-founders, Ed Donner, has carried out a fascinating project: He fine-tuned an open-source LLM on his text-message history, eerily effectively simulating not only himself but anyone who's messaged him >1000 times.
Read full post

How A.I. is Transforming Science

A.I. is not just a tool, but a driving force in reshaping the landscape of science.
Read full post

How to Speak so You Blow Listeners’ Minds, with Cole Nussbaumer Knaflic

Cole Nussbaumer Knaflic's book, "storytelling with data", has sold over 500k copies... wild!
Read full post

Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko

In February 2024, Kirill Eremenko was on the SuperDataScience Podcast hosted by our Chief Data Scientist, Jon Krohn, to detail Decoder-Only Transformers (like the GPT series). It was Jon's most popular episode ever, so Kirill came right back to detail an even more sophisticated architecture: Encoder-Decoder Transformers.
Read full post

The Best A.I. Startup Opportunities, with venture capitalist Rudina Seseri

How should an A.I. startup find product-market fit? How do some A.I. startups become spectacularly successful?
Read full post