Radek Osmulski

Radek Osmulski

How did I get here?

How did I get here?

An introductory chapter to a book on learning machine learning that I wrote.

Introduction to Proximal Policy Optimization (PPO)

Introduction to Proximal Policy Optimization (PPO)

The previous blog post looked at the Vanilla Policy Gradient (VPG) method. Trust Region Policy Optimization and Proximal Policy Optimization build on top of VPG and aim to address its shortcomings. The two main problems of training on-policy reinforcement learning algorithms that PPO addresses are: 1. Training data distribution shifts

Understanding Policy Gradient - a fundamental idea in RL

Understanding Policy Gradient - a fundamental idea in RL

How do you begin to learn Reinforcement Learning? My preferred approach is to study code. Reading and analyzing code can help disambiguate many ideas and concepts in papers or blog posts that can be hard to understand otherwise. Crafting an optimal policy by learning value functions is a very straightforward

Diving into Diffusion Policy with LeRobot

Diving into Diffusion Policy with LeRobot

In a recent blog post, we looked at the Action Chunking Transformer (ACT). At the heart of ACT lies an encoder-decoder transformer that when passed in * an image * the current state of the robot * and an optional style variable z generates the next chunk_size number of actions. But even

Meta Learning: Addendum or a revised recipe for life

Meta Learning: Addendum or a revised recipe for life

In 2021 I published Meta Learning: How To Learn Deep Learning And Thrive In The Digital World. The book is based on 8 years of my life where nearly every day I thought about how to learn machine learning and how to do machine learning efficiently and at a high

How to teach your computer to play video games

Teaching your computer to play video games has all the components of a sublime storyline: * it is ingenious * beautiful in its simplicity * and utterly surprising I will explain the main ideas of deep Q learning, using as few big and scary nouns as possible, as we teach our computer to

An Introduction to the Action Chunking Transformer

An Introduction to the Action Chunking Transformer

This is a gentle introduction to training two robotic arms using transformers. If you would rather jump straight into code, please find a batteries-included notebook here. Humans have figured out how to do a lot of neat and useful things. Wouldn't it be great if we could teach

How to train an Alpaca?

How to train an Alpaca?

There used to be a time when fine-tuning LLMs on off-the-shelf hardware wasn't a thing. Then the Llama weights got leaked, Stanford Alpaca was released, and the rest is history. So how was Alpaca fine-tuned? And why might we care? On one hand, Alpaca is where the Cambrian

How to fine-tune a Transformer (pt. 2, LoRA)

How to fine-tune a Transformer (pt. 2, LoRA)

In part 1 of this series, I fine-tuned a Transformer using techniques straight from Universal Language Model Fine-tuning for Text Classification published in 2018. But so much has happened in the last 5 years! My plan was to read a couple of papers next, but I stumbled across LoRA: Low-Rank

How to fine-tune a Transformer?

How to fine-tune a Transformer?

I only started to learn about LLMs and in this blog post, I share how I would approach fine-tuning a Transformer today. Which of the techniques I learned years ago still work in the era of the Transformer? Also, toward the end of the blog post, I address the training

SLURM survival guide

SLURM survival guide

This post is written from the perspective of someone learning to run SLURM jobs. There might be some inaccuracies but the idea is to get you up and running fast. Unfortunately, the only material on SLURM I have been able to find was written by MLOps folks for MLOps folks

How to evaluate an LLM on your data?

How to evaluate an LLM on your data?

Being able to evaluate the outputs of an LLM model on your test set is a very valuable problem to solve. Imagine this scenario. You work at ACME Inc and one morning you pour yourself a great cup of coffee, open Slack, and see this: Employee #342526, we need you

Use ChatGPT inside Jupyter Notebook

personal project

Use ChatGPT inside Jupyter Notebook

Bringing the new tool as close as possible to where people already do their work is key.

An IDE for the era of AI

An IDE for the era of AI

So much code that I would have to write by hand automagically appears on my screen!

There is something weird about the current generation of AI — better pay attention

There is something weird about the current generation of AI — better pay attention

Hype aside, there is something very uncanny about the most recent generation of AI models.

How to reach the top of the imagenette leaderboard?

How to reach the top of the imagenette leaderboard?

How to make your NNs more shift-invariant? What are some hyperparameter changes worth considering when training with a limited budget of epochs?

Going From Not Being Able to Code to Deep Learning Hero

Going From Not Being Able to Code to Deep Learning Hero

A detailed plan for going from not being able to write code to being a deep learning expert. Advice based on personal experience.

How to build a Deep Learning system that will answer questions about the Harry Potter universe?

How to build a Deep Learning system that will answer questions about the Harry Potter universe?

Riva is a set of APIs into a very complex, very well staffed AI research organization.

20 Years of Tech Startup Experiences in One Hour by Jeremy Howard

20 Years of Tech Startup Experiences in One Hour by Jeremy Howard

There is no such thing as business... there is only such a thing as making things people want and selling them to them.

How to use the power of the community to learn faster

How to use the power of the community to learn faster

Community is the most powerful force behind online learning. It is the reason why MOOCs have a limited impact and tight-knit communities like fast.ai consistently produce unbelievable results.

How to train and validate on Imagenet

How to train and validate on Imagenet

Training on Imagenet is something that is completely trivial after you do it once, but if you are just someone on the Internet without such prior experience, it is an insurmountable task. Up until a couple of days ago, I didn't even know how to get the data!

Machine Learning and Testing

Machine Learning and Testing

The rewards of testing can be immense, but so can be the price that one would need to pay for testing poorly.

How to train your neural network

How to train your neural network

Evaluation of cosine annealing.

Why take the log of a continuous target variable?

Why take the log of a continuous target variable?

In this article, we’ll look at a simple but useful concept that often gets overlooked.

How to do machine learning efficiently

How to do machine learning efficiently

The only way to maintain your sanity in the long run is to be paranoid in the short run.