Radek Osmulski

Radek Osmulski

How did I get here?

How did I get here?

An introductory chapter to a book on learning machine learning that I wrote.

How to Structure an ML Project in the Era of LLM Assistants

How to Structure an ML Project in the Era of LLM Assistants

I joined the Yale/UNC-CH - Geophysical Waveform Inversion Kaggle competition to test drive and improve my ML workflow. My primary objective was to integrate LLM coding tools at every stage of the workflow and to gain a better understanding of what these tools are capable of. Below, I discuss:

How to Do ML on a Remote Machine with Ease

How to Do ML on a Remote Machine with Ease

The article below consists of two parts. In the first part, I share my workflow for working on remote machines. I am using Lambda Labs, but the steps can be easily adapted to working on a remote machine of any kind. In the second part, I share more general thoughts

Introduction to Proximal Policy Optimization (PPO)

Introduction to Proximal Policy Optimization (PPO)

The previous blog post looked at the Vanilla Policy Gradient (VPG) method. Trust Region Policy Optimization and Proximal Policy Optimization build on top of VPG and aim to address its shortcomings. PPO addresses two main problems with training on-policy reinforcement learning algorithms: 1. Training data distribution shifts In on-policy methods,

Understanding Policy Gradient - a fundamental idea in RL

Understanding Policy Gradient - a fundamental idea in RL

How do you begin to learn Reinforcement Learning? My preferred approach is to study code. Reading and analyzing code can disambiguate many ideas and concepts that are hard to grasp from papers or blog posts alone. Crafting an optimal policy by learning value functions is a very straightforward idea. A

Diving into Diffusion Policy with LeRobot

Diving into Diffusion Policy with LeRobot

In a recent blog post, we looked at the Action Chunking Transformer (ACT). At the heart of ACT lies an encoder-decoder transformer that when passed in * an image * the current state of the robot * and an optional style variable z generates the next chunk_size number of actions. But even

How to teach your computer to play video games

Teaching your computer to play video games has all the makings of a great story: * it is ingenious * beautiful in its simplicity * and utterly surprising I'll explain the main ideas behind deep Q learning, using as few big and scary nouns as possible, as we teach our computer

An Introduction to the Action Chunking Transformer

An Introduction to the Action Chunking Transformer

This is a gentle introduction to training two robotic arms using transformers. If you would rather jump straight into code, please find a batteries-included notebook here. Humans have figured out how to do a lot of neat and useful things. Wouldn't it be great if we could teach

How to train an Alpaca?

How to train an Alpaca?

There used to be a time when fine-tuning LLMs on off-the-shelf hardware wasn't a thing. Then the Llama weights got leaked, Stanford Alpaca was released, and the rest is history. So how was Alpaca fine-tuned? And why might we care? On one hand, Alpaca is where the Cambrian

How to fine-tune a Transformer (pt. 2, LoRA)

How to fine-tune a Transformer (pt. 2, LoRA)

In part 1 of this series, I fine-tuned a Transformer using techniques straight from Universal Language Model Fine-tuning for Text Classification published in 2018. But so much has happened in the last 5 years! My plan was to read a couple of papers next, but I stumbled across LoRA: Low-Rank

How to fine-tune a Transformer?

How to fine-tune a Transformer?

I only started to learn about LLMs and in this blog post, I share how I would approach fine-tuning a Transformer today. Which of the techniques I learned years ago still work in the era of the Transformer? Also, toward the end of the blog post, I address the training

SLURM survival guide

SLURM survival guide

This post is written from the perspective of someone learning to run SLURM jobs. There might be some inaccuracies but the idea is to get you up and running fast. Unfortunately, the only material on SLURM I have been able to find was written by MLOps folks for MLOps folks

How to evaluate an LLM on your data?

How to evaluate an LLM on your data?

Being able to evaluate the outputs of an LLM model on your test set is a very valuable problem to solve. Imagine this scenario. You work at ACME Inc and one morning you pour yourself a great cup of coffee, open Slack, and see this: Employee #342526, we need you

Use ChatGPT inside Jupyter Notebook

personal project

Use ChatGPT inside Jupyter Notebook

Bringing the new tool as close as possible to where people already do their work is key.

An IDE for the era of AI

An IDE for the era of AI

So much code that I would have to write by hand automagically appears on my screen!

There is something weird about the current generation of AI — better pay attention

There is something weird about the current generation of AI — better pay attention

Hype aside, there is something very uncanny about the most recent generation of AI models.

How to reach the top of the imagenette leaderboard?

How to reach the top of the imagenette leaderboard?

How to make your NNs more shift-invariant? What are some hyperparameter changes worth considering when training with a limited budget of epochs?

Going From Not Being Able to Code to Deep Learning Hero

Going From Not Being Able to Code to Deep Learning Hero

A detailed plan for going from not being able to write code to being a deep learning expert. Advice based on personal experience.

How to build a Deep Learning system that will answer questions about the Harry Potter universe?

How to build a Deep Learning system that will answer questions about the Harry Potter universe?

Riva is a set of APIs into a very complex, very well staffed AI research organization.

20 Years of Tech Startup Experiences in One Hour by Jeremy Howard

20 Years of Tech Startup Experiences in One Hour by Jeremy Howard

There is no such thing as business... there is only such a thing as making things people want and selling them to them.

How to use the power of the community to learn faster

How to use the power of the community to learn faster

Community is the most powerful force behind online learning. It is the reason why MOOCs have a limited impact and tight-knit communities like fast.ai consistently produce unbelievable results.

How to train and validate on Imagenet

How to train and validate on Imagenet

Training on Imagenet is something that is completely trivial after you do it once, but if you are just someone on the Internet without such prior experience, it is an insurmountable task. Up until a couple of days ago, I didn't even know how to get the data!

Machine Learning and Testing

Machine Learning and Testing

The rewards of testing can be immense, but so can be the price that one would need to pay for testing poorly.

How to train your neural network

How to train your neural network

Evaluation of cosine annealing.

Why take the log of a continuous target variable?

Why take the log of a continuous target variable?

In this article, we’ll look at a simple but useful concept that often gets overlooked.