I was kinda disappointed by the World Humanoid Robot Games.1 As fun as real-life rock-em-sock-em robots is, what people really care about is robots doing their chores. This is why robot laundry folding videos are so popular. Those laundry videos are super impressive: we didn’t know how to do that even a few years ago. And it is certainly something that people want! But as this article so nicely articulates, basic laundry folding is in a sweet spot given the techniques we have now. It might feel like if our AI techniques can fold laundry maybe they can do anything, but that isn’t true, and we’re going to have to invent new techniques to be really general purpose and useful.
The Challenge
With that in mind I am issuing a challenge to roboticists: here are my Humanoid Olympics. Each event will require us to push the state of the art and unlock new capabilities. I will update this post as folks achieve these milestones, and will mail actual real-life medals to the winners.
Brief intro: Current state of the art.
In order to talk about why each of these challenges pushes the state of the art, let's talk about what's working now. What I’m seeing working is learning-from-demonstration. Folks get some robots and some puppeteering interfaces (standard seems to be two copies of the robot where you grab & move one of them and the other matches, or an Oculus headset + controllers or hand tracking) and record some 10-30 second activity over and over again (100s of times). We can then train a neural network to mimic those examples. This has unlocked tasks that have steps that are somewhat chaotic (like pulling a corner of a towel to see if it lays flat) or high state space (like how a wooden block is on one of 6 sides but a towel can be bunched up in myriad different ways). But thinking about it, it should be clear what some of the limitations are. Each of these has exceptions, but form a general trend.
No force feedback at the wrists.2 The robot can only ever perform as well as the human teleoperation and we don’t yet have good standard ways of getting force information to the human teleoperator.
Limited finger control.3 It's hard for the teleoperator (and AI foundation model) to see and control all the robot fingers with more finesse than just open/close.
No sense of touch.4 Human hands are packed absolutely full of sensors. Getting anywhere near that kind of sensing out of robot hands and usable by a human puppeteer is not currently possible.
Medium precision.5 Guessing based on videos I think we’ve got about 1-3 cm precision for tasks.
Folding towels and t-shirts doesn’t depend on high wrist forces. You can get away with just hand open/close by using pinch grasps to pull and lift and open hands to spread. You can visually see how your grasp is so you don’t need finger sensing. 1-3 cm precision is just fine.
So what comes next? On to the events!
Event 1: Full Body (aka doors)
Doors are tricky because of the asymmetric forces: you need to grasp and twist quite hard, but if you pull hard outside of the arc of the door you tend to slip your grasp. Also, they require whole body manipulation, which is more than I’ve seen from anyone yet.
Bronze Medal: Entering a round-knob push door.
I think this is very close to state of the art (or maybe has happened and I didn’t see it). I expect this one to be claimed by December.
Silver Medal: Entering a lever-handle self-closing push door.
Adding self-closing makes this significantly more challenging, though the lever handle is arguably easier (I just don’t see many round-knob self-closing doors).6
Gold Medal: Entering a lever-handle self-closing pull door.
The bossfight of doors.7 You need to either use a second limb to block the door from re-closing or go fast enough to use dynamics.
Event 2: Laundry
We’re just getting started on laundry.
Bronze Medal: Fold an inside-out T-shirt
This is probably doable using the techniques we have now, but is a longer horizon task and might require some tricky two handed actions to pull the shirt through to right-side-out.8
Silver Medal: Turn a sock inside-out
I think both the hand-insertion and the action of pinching the inside of the sock are interesting new challenges.
Gold Medal: Hang a men’s dress-shirt
The size medium shirt starts unbuttoned with one sleeve inside-out. Must end up on the hanger correctly with the sleeve fixed and at least one button buttoned. I think this one is 3-10 years out, both because buttons are really hard and because getting a strong, dextrous hand small enough to fit in a sleeve is going to be hard.
Event 3: Basic Tool Use
Humans are creatures of technology and, as useful as our hands are, we mostly use them to hold and manipulate tools. This challenge is about building the strength and dexterity to use tools.
Bronze Medal: Windex and paper towels
The windex ammonia-based-window-cleaning-fluid bottle is super forgiving in terms of how you grasp it, but you do need to independently articulate a finger (and the finger has to be pretty strong to get fluid to spray out).9
Silver Medal: Peanut butter sandwiches
The challenge here is to pick up a knife and then adjust the grasp to be strong and stable enough to scoop and spread the peanut butter. Humans use a ‘strong tool grasp’ for all kinds of activities but it is very challenging for robot grippers.10
Gold Medal: Use a key
A keyring with at least 2 keys and a keychain is dropped into the robot’s waiting palm/gripper. Without putting the keys down, get the correct key aligned and inserted and turned in a lock. This requires very challenging in-hand manipulation, high precision and interesting forceful interaction.
Event 4: Finger tips
We humans do all kinds of in-hand manipulation using the structure of our hands to manipulate things we are holding..
Bronze Medal: Roll matched socks.
Requires dexterity and some precision but not very much force.
Silver Medal: Use a dog poop bag
When I use a dog-bag I have to do a slide-between-the-fingertips action to separate the opening of the bag which is a tricky forceful interaction as well as a motion that I’m not even sure most robot hands are capable of. Also tricky is tearing off a single bag rather than pulling a big long spool out of the holder, if you choose to use one.11
Gold Medal: Peel an orange
Done without external tools. This is super tricky: high force yet high precision fingertip actions.
Event 5: Slippery when wet
If you sit down and write out what you might want a robot to do for you: a lot of tasks end up being kind of wet. Robots usually don’t like being wet: but we’ll have to change that if we want to have them clean for us.
Bronze Medal: Wet a sponge at a sink and wipe a counter-top
Mildly damp, but with exciting risk of getting the whole hand in the water if you aren’t careful. Probably requires at least splash resistant hands (or a whole bunch of spares).
Silver Medal: Clean peanut butter off your manipulator
This one naturally follows after the sandwich one. Water everywhere. Seems like an important skill to have after a few hours collecting training data on the dog-poop task.
Gold Medal: Use a sponge to wash grease off a pan in a sink
Water, soap, grease, and an unpleasant task no one wants to do.
Discussion:
Complain, comment and discuss on hacker-news.
And you should subscribe because I will post as folks achieve these challenges!
Terms and conditions:
To be eligible to win it must be a 1x speed video with no cuts, featuring a general purpose mobile(?) manipulator robot running autonomously. (Wheels and centaur robots: totally fair game; industrial automation orange peelers don’t count.)
You are allowed 10x the time I took (e.g. a 4 second task can take 40 seconds). I reserve the right to be arbitrary in deciding if things aren’t following in the spirit of the challenge. First robot wins the prize.
To claim your medallion email bmholson+olympics@gmail.com with an address for me to ship it to. If you give me a photo of your robot wearing a medal I will be tickled pink.
I will also accept future challengers that are at least 25% faster than the current winner. Good luck and may the odds be ever in your favor.
Is “Mobile” Important?
I’m conflicted on if I should allow “arms bolted to a table” entries or require that all entrants must be mobile manipulators (obviously the door entrants have to be.) I’ll let y’all decide: challenge will be locked in based on this poll.
Thanks to Jeff Bingham for advice, fact checking and cool robot videos. Thanks to my patient wife for spending an hour filming me doing silly things in a silly costume.
As far as I can tell, kickboxing was just the Unitree mini-humanoid robot, and everyone had the same code running, so… I guess it won?
TRI has some pretty cool stuff with force control using a big training rig: https://medium.com/toyotaresearch/tris-robots-learn-new-skills-in-an-afternoon-here-s-how-2c30b1a8c573
Tesla’s Optimus has 22 degrees of freedom using cable drives (cause you can’t fit those motors in a hand). In 2008 I worked on this robot which also had 22 degrees of freedom and controlling it was crazy hard (as was keeping all the cables correctly tensioned). The other hand was a big two-finger gripper which I ended up using for most teleop tasks.
MetA has been working with some in-finger vision systems which seem cool:
This is likely more a teleoperation precision limitation than a model limitation. Here is a video of Generalist Robotics doing sub-cm precision tasks:
(Love that hockey sticks have become the traditional ‘mess with a robot’ tool even for ridiculous things like this)
Yes, I did wear this at my workplace in order to get this video. You’re welcome.
I have programmed (not trained) a general purpose mobile manipulator to pass through a self-close pull door, but it took over 4 minutes (disqualified for taking too long) and required a special doorstop. Also the video isn’t public (also disqualified). Also it's really tacky to put up a competition and award yourself gold before it even starts.
T-Shirt starts fully inside-out in a wad. Finishes tolerably folded, right-side out.
You must spray 3 good spritzes on the window, and wipe them up with paper towels so there are no ugly streaks. Paper towels start on the paper-towel roll, not pre-torn and pre-wadded.
Peanut butter jar starts and ends closed. Sandwich should be cut in half. (Triangle or rectangular cuts are both acceptable, though your three-year-old might disagree).
Mock Poo allowed. Bag starts on the roll but can be in a standard dog-bag holder, held by the robot.
Hi Benjie, I hope you've been great!
One approach to enabling robots is the video training approach for AI/Neural Net systems. In the case of self driving cars, this amounts to '1 million hours' of training video fed into the training supercluster and then outputting the resulting parameters to the inference engines.
If instead we're training a robot to do the laundry, I see the entire chore in it's full activity:
- Roam the house looking for the kid's socks
- Gather the dirty clothes from baskets & hampers
- Sort the laundry into hot/warm/cold loads, per the desires of the owner
- Load the washer
- Add detergent. liquid or capsule
- Set the cycle parameters, per make/model of washer and owner-desired settings
- Start the load
- Promote the load to the dryer when finished.
- Set the dryer cycle parameters, per make/model and owner-desired settings
- Remove clothes when dryer completes
- Empty lint screen
- fold clothes and sort into owner
- Replace clothes, hang on hanger, place into drawer, based on each owner's desired placements
Then, the fun begins...
- Handle exceptions like, load out of balance, power outage interrupt, washer starts leaking, ...
My question, (thanks for reading through all that!), do you think it would require more, less or the same amount of training video as self driving cars to enable this complete approach to 'doing the laundry', so that when a robot owner purchases the 'do the laundry' ability, the parameters can be downloaded to their robot and then never have to care about the laundry again?