INSIGHTS
Getting Value From A.I.
In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on "Getting Value from A.I." to open the second day of Hg Capital's "Digital Forum" in London.
Read full post
The Chinchilla Scaling Laws
The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.
Read full post
StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU
The folks who open-sourced Stable Diffusion have now released "StableLM", their first Language Models. Pre-trained on an unprecedented amount of data for single-GPU LLMs (1.5 trillion tokens!), these are small but mighty.
Read full post
The A.I. and Machine Learning Landscape, with Investor George Mathew
The A.I. and Machine Learning Landscape, with Investor George Mathew.
Read full post
Open-source “ChatGPT”: Alpaca, Vicuña, GPT4All-J, and Dolly 2.0
Want a GPT-4-style model on your own hardware and fine-tuned to your proprietary language-generation tasks?
Read full post
Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)
Large Language Models (LLMs) are capable of extraordinary NLP feats, but are so large that they're too expensive for most organizations to train. The solution is Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA).
Read full post
Llama: GPT-3 Performance, 10X Smaller
By training (relatively) small LLMs for (much) longer, Meta AI's LLaMA architectures achieve GPT-3-like outputs at as little as a thirteenth of GPT-3's size. This means cost savings and much faster execution time.
Read full post
Harnessing GPT-4 for Your Commercial Advantage
How you can leverage GPT-4 to your commercial benefit.
Read full post
How to Be Both Socially Impactful and Financially Successful in Your Data Career
This episode will appeal most to technical listeners that are keen to be outstanding data scientists or software engineers, particularly through engineering scalable ML products.
Read full post
GPT-4 Has Arrived
This first episode in Jon's GPT-4 trilogy; in ten minutes, he introduces GPT-4's staggering capabilities.
Read full post
Astonishing Cicero Negotiates and Builds Trust With Humans Using Natural Language
Meta AI's CICERO algorithm — which negotiates and build trust with humans to perform in the top decile at the game of Diplomacy — is (in our view) the most astounding A.I. feat yet.
Read full post
Designing Machine Learning Systems with Chip Huyen
Mega-bestselling author of the "Designing ML Systems" book, Chip Huyen, joined our Chief Data Scientist, Jon Krohn, to cover her top tips on designing ML systems!
Read full post
Digital Analytics with Avinash Kaushik
In this interview, Avinash Kaushik masterfully describes how A.I. is transforming analytics and how you can capitalize to deliver joy to your customers.
Read full post
Automating Industrial Machines with Data Science and the Internet of Things (IoT)
Mega-bestselling author of the "Designing ML Systems" book, Chip Huyen, joined our Chief Data Scientist, Jon Krohn, to cover her top tips on designing ML systems!
Read full post
Eliminating Hiring Bias From The Recruitment Process Through Machine Learning
With the ever-accelerating deluge of data — the amount of it stored on our planet doubles every 18 months — the capacity for machine learning to facilitate social benefits, such as increased job satisfaction, more extensive leisure time, and medical breakthroughs, accelerates as well.
Read full post
XGBoost: The Ultimate Classifier, with Matt Harrison
XGBoost is typically the most powerful ML option whenever you're working with structured data. In this SuperDataScience episode, our Chief Data Scientist, Jon Krohn, talks to world-leading XGBoost expert, Matt Harrison, on how it works and how to make the most of it.
Read full post
Business Intelligence Tools, with Mico Yuk
In this episode of SuperDataScience, join our Chief Data Scientist Jon alongside Mico Yuk, who pulls absolutely no punches in her assessment of, well, anything! ...but particularly about vendors in the business intelligence and data analytics space.
Read full post
Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller
In this episode of SuperDataScience, our Chief Data Scientist, Jon Krohn, hosts the wildly intelligent Dr. Matar Haller who introduces Contextual A.I. (which considers adjacent, often multimodal information when making inferences) as well as how to use ML to build moat around your company.
Read full post
Get More Language Context out of your LLM
The "context window" limits the number of words that can be input to (or output by) a given Large Language Model. Today's episode introduces FlashAttention, a trick that allows for much larger context windows.
Read full post
Open-Source “Responsible A.I.” Tools, with Ruth Yakubu
Ruth Yakubu details what Responsible A.I. is and open-source options for ensuring we deploy A.I. models — particularly the Generative variety that are rapidly transforming industries — responsibly.
Read full post
Generative Deep Learning, with David Foster
Bestselling author David Foster provides a fascinating technical introduction to cutting-edge Generative A.I. concepts including variational autoencoders, diffusion models, contrastive learning, GANs and "world models".
Read full post
Six Reasons Why Building LLM Products Is Tricky
Six big challenges when bringing LLMs to your users.
Read full post
Observing LLMs in Production to Automatically Catch Issues
In this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, he is joined by Amber Roberts and Xander Song who provide a technical deep dive into the major challenges (such as drift) that A.I. systems (particularly LLMs) face in production.
Read full post
Accelerators: Hardware Specialized for Deep Learning
This SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, and joined by Ron Diamant, is dedicated to the hardware we use to train and run A.I. models (particularly LLMs) such as GPUs, TPUs and AWS's Trainium and Inferentia chips.
Read full post
Lossless LLM Weight Compression: Run Huge Models on a Single GPU
To approach the capabilities of the top commercial LLMs like OpenAI’s GPT-4 and Anthropic’s Claude on a broad range of tasks, you may however need to use a much larger open-source LLM. So, wouldn’t it be nice if you could compress the size of these larger LLMs to be able to fit them on a single consumer GPU? Such compression would enable you to:
Read full post
CatBoost: Powerful, efficient ML for large tabular datasets
CatBoost is making waves in open-source ML as it's often the top approach for tasks as diverse as classification, regression, ranking, and recommendation. This is especially so if working with tabular data that include categorical variables.
Read full post
NLP with Transformers, feat. Hugging Face’s Lewis Tunstall
Lewis Tunstall — brilliant author of the bestseller "NLP with Transformers" and an ML Engineer at Hugging Face — details how to train and deploy your own LLMs, the race for an open-source ChatGPT, and why RLHF leads to better models.
Read full post
The (Short) Path to Artificial General Intelligence, with Dr. Ben Goertzel
The luminary Dr. Ben Goertzel details how we could realize Artificial General Intelligence (AGI) in 3-7 years, why he's optimistic about the Artificial Super Intelligence (ASI) this would trigger, and what post-Singularity society could be like.
Read full post
How Firms Can Actually Adopt A.I., with Rehgan Avon
Rehgan Avon's DataConnect conference is this week and is getting rave reviews. In this SuperDataScience episode, hosted by our Chief Data Scientist, Jon Krohn, the silver-tongued entrepreneur details how organizations can successfully adopt A.I.
Read full post
Jon Krohn on Last Week in AI Episode #138
The only podcast our Chief Data Scientist, Jon Krohn, listens to, "Last Week in A.I.", had him on as co-host of episode #138. They had a lot of laughs recapping the big A.I. news, including the release of DALL-E 3 and reports that Google's forthcoming Gemini model crushes GPT-4.
Read full post
Jon Krohn on Last Week in AI Episode #130
The only podcast our Chief Data Scientist, Jon Krohn, listens to is "Last Week in A.I.". Tune-in to him co-hosting episode #130. Every episode surveys the previous week's biggest A.I. news; the biggest story in this episode being (of course) LLaMA 2.
Read full post
Generative A.I. without the Privacy Risks (with Prof. Raluca Ada Popa)
Consumers and enterprises dread that Generative A.I. tools like ChatGPT breach privacy by using convos as training data, storing PII and potentially surfacing confidential data as responses. Prof. Raluca Ada Popa has all the solutions.
Read full post
LLaMA 2 — It’s Time to Upgrade your Open-Source LLM
If you've been using fine-tuned open-source LLMs (e.g. for generative A.I. functionality or natural-language conversations with your users), it's very likely time you switch your starting model over to Llama 2. Here's why:
Read full post
How Data Happened: A History, with Columbia Prof. Chris Wiggins
Chris Wiggins — Chief Data Scientist at The New York Times and faculty at Columbia University — talks to our Chief Data Scientist, Jon Krohn, and provides an enthralling, witty and rich History of Data Science.
Read full post
Jon’s “Generative A.I. with LLMs” Hands-on Training
Jon Krohn introduces my two-hour "Generative A.I with LLMs" training, which is packed with hands-on Python demos in Colab notebooks. It details open-source LLM (Hugging Face; PyTorch Lightning) and commercial (OpenAI API) options.
Read full post
Large Language Model Leaderboards and Benchmarks
Llamas, Alpacas, Koalas, Falcons... there is a veritable zoo of LLMs out there! In this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, Caterina Constantinescu breaks down the LLM Leaderboards and evaluation benchmarks to help you pick the right LLM for your use case.
Read full post
Vicuña, Gorilla, Chatbot Arena and Socially Beneficial LLMs, with Prof. Joey Gonzalez
Vicuna, Gorilla and the Chatbot Arena are all critical elements of the new open-source LLM ecosystem — the extremely knowledgeable and innovative Prof. Joseph Gonzalez is behind all of them. Get the details in this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn.
Read full post
ChatGPT Code Interpreter: 5 Hacks for Data Scientists
The ChatGPT Code Interpreter is surreal: It creates and executes Python code for whatever task you describe, debugs its own runtime errors, displays charts, does file uploads/downloads, and suggests sensible next steps all along the way.
Read full post
Big A.I. R&D Risks Reap Big Societal Rewards, with Meta’s Dr. Laurens van der Maaten
By making big research bets, the prolific Meta Senior Research Director Dr. Laurens van der Maaten has devised or supported countless world-changing machine-learning innovations across healthcare, climate change, privacy and more.
Read full post
Code Llama
Meta's Llama 2 offered state-of-the-art performance for an "open-source"* LLM... except on tasks involving code. Now Code Llama is here and it magnificently fills that gap by outperforming all other open-source LLMs on coding benchmarks.
Read full post
LangChain: Create LLM Applications Easily in Python
Kris Ograbek talks us through how to use LangChain to chat with previous episodes of SuperDataScience!
Read full post
Jon Krohn on Resilient Recruiter Podcast #198
Our Chief Data Scientist, Jon Krohn, appears as a guest on The Resilient Recruiter Podcast, an excellent program that's in the top 1.5% of podcasts globally. In the episode, Jon discusses how A.I. will transform all work over the coming years, with a particular focus on Talent Acquisition.
Read full post
Jon Krohn on Recruiting Future Podcast #547
Our Chief Data Scientist, Jon Krohn, was a guest on The Recruiting Future Podcast. In it, he introduces the key Generative A.I.-related terms (e.g., Deep Learning, Transformers) and uses examples from Nebula to illustrate how Gen A.I. is overhauling industries.
Read full post
Llama 2, Toolformer and BLOOM: Open-Source LLMs with Meta’s Dr. Thomas Scialom
Thomas Scialom, PhD is behind many of the most popular Generative A.I. projects including Llama 2, the world's top open-source LLM. In this SuperDataScience episode hosted by Our Chief Data Scientist, Jon Krohn, the Meta A.I. researcher reveals the stories behind Llama 2 and what's in the works for Llama 3.
Read full post
Make Better Decisions with Data, with Dr. Allen Downey
Many-time bestselling author Allen Downey, talks about making better decisions with data, including how to prepare for Black Swan events and how your core beliefs will shift over your life.
Read full post
Jon Krohn on Ken’s Nearest Neighbors Podcast #168
Our Chief Data Scientist, Jon Krohn, joins Ken Jee in this episode of the "Ken's Nearest Neighbors" podcast — a popular weekly show that features data professionals digging into their backgrounds, motivations and philosophies.
Read full post
Computational Mathematics and Fluid Dynamics, with Prof. Margot Gerritsen
In this SuperDataScience episode hosted by Our Chief Data Scientist, Jon Krohn, the extremely intelligent and super delightful Prof. Margot Gerritsen returns to the show to introduce what Computational Mathematics is, detail countless real-world applications of it, and relate it to the field of data science.
Read full post
Quantum Machine Learning, with Dr. Amira Abbas
Brilliant, eloquent Dr. Amira Abbas introduces us to Quantum Machine Learning in this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn. She details the key concepts (like qubits), what's possible today (Quantum SVMs) and what the future holds (e.g., Quantum Neural Networks).
Read full post
Seven Factors for Successful Data Leadership
In this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn, he is joined by the jovial EIGHT-time book author, Ben Jones. In it, Ben covers the seven factors of successful data leadership — factors he's gleaned from administering his data literacy assessment to 1000s of professionals.
Read full post
Unmasking A.I. Injustice, with Dr. Joy Buolamwini
In this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn, he's joined by the inimitable Dr. Joy Buolamwini who reveals how she uncovered staggering racial and gender biases in widely used Amazon, Microsoft and IBM algorithms, the firms' varying (sometimes shocking) responses and how to address these A.I. issues.
Read full post
How GitHub Operationalizes AI for Teamwide Collaboration and Productivity, with GitHub COO Kyle Daigle
In this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn, the exceptionally passionate GitHub COO Kyle Daigle details how generative A.I. tools improve not only the way individuals work, but also dramatically transform the way people across entire firms collaborate.
Read full post
OpenAssistant: The Open-Source ChatGPT Alternative, with Dr. Yannic Kilcher
In this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn, Yannic Kilcher — famed Machine Learning YouTuber and creator of OpenAssistant, the best-known open-source conversational A.I., shares where the biggest A.I. opportunities are in the coming years.
Read full post
A.I. Product Management, with Google DeepMind’s Head of Product, Mehdi Ghissassi
The elite team at Google DeepMind cranks out one world-changing A.I. innovation after another. In this episode of SuperDataScience hosted by our Chief Data Scientist, Jon Krohn, their affable Head of Product Mehdi Ghissassi shares his wisdom on how to design and release successful A.I. products.
Read full post
Scikit-learn’s Past, Present and Future, with scikit-learn co-founder Dr. Gaël Varoquaux
In this episode of SuperDataScience, our Chief Data Scientist, Jon Krohn, traveled to Paris to interview Dr. Gael Varoquaux, co-founder of scikit-learn, the standard library for machine learning worldwide (downloaded over 1.4 million times PER DAY).
Read full post
Q*: OpenAI’s Rumored AGI Breakthrough
This SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, is all about a rumored new model out of OpenAI called Q* (pronounced “Q star”) that has been causing quite a stir, both for its purported role in Altmangate and its implications for Artificial General Intelligence (AGI).
Read full post
How to Integrate Generative A.I. Into Your Business, with Piotr Grudzień
Want to integrate Conversational A.I. ("chatbots") into your business and ensure it's a (profitable!) success?
Read full post
2024 Data Science Trend Predictions
What are the big A.I. trends going to be in 2024?
Read full post
Jon Krohn at AI4Talent
On January 24th, our Chief Data Scientist, Jon Krohn, will be on a panel at the "A.I. & The Future of Work" virtual conference discussing the ethical issues around the "double-edged sword" of A.I. in employment.
Read full post
Technical Intro to Transformers and LLMs, with Kirill Eremenko
For this SuperDataScience episode hosted by our Chief Data Scientist, Jon Krohn, the indefatigable SuperDataScience Founder Kirill Eremenko gives a detailed technical intro to Transformers and how they're scaled up to allow Large-Language Models like GPT-4, Llama 2 and Gemini to have their mind-blowing abilities.
Read full post
Jon Krohn on Tech Recruiter Podcast S03-Ep19
Our Chief Data Scientist, Jon Krohn, was a guest on The Recruiting Future Podcast. In it, he introduces the key Generative A.I.-related terms (e.g., Deep Learning, Transformers) and uses examples from Nebula to illustrate how Gen A.I. is overhauling industries.
Read full post
Data Science for Clean Energy, with Emily Pastewka
How can data science and machine learning power the transition toward a sustainable global economy? The ML leader (and exceptional communicator of technical concepts!) Emily Pastewka is our Chief Data Scientist, Jon Krohn's, SuperDataScience guest to fill us in on Green Data Science.
Read full post
A Simulation of Ed Donner: Fine-tuning an LLM on 240k Text Messages
One of our Nebula co-founders, Ed Donner, has carried out a fascinating project: He fine-tuned an open-source LLM on his text-message history, eerily effectively simulating not only himself but anyone who's messaged him >1000 times.
Read full post
How A.I. is Transforming Science
A.I. is not just a tool, but a driving force in reshaping the landscape of science.
Read full post
How to Speak so You Blow Listeners’ Minds, with Cole Nussbaumer Knaflic
Cole Nussbaumer Knaflic's book, "storytelling with data", has sold over 500k copies... wild!
Read full post
Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko
In February 2024, Kirill Eremenko was on the SuperDataScience Podcast hosted by our Chief Data Scientist, Jon Krohn, to detail Decoder-Only Transformers (like the GPT series). It was Jon's most popular episode ever, so Kirill came right back to detail an even more sophisticated architecture: Encoder-Decoder Transformers.
Read full post
The Best A.I. Startup Opportunities, with venture capitalist Rudina Seseri
How should an A.I. startup find product-market fit? How do some A.I. startups become spectacularly successful?
Read full post