If you follow robotics at all you may have noticed that there has been an explosion of humanoid robot startups. What's with that? Why humanoids? Why now?
There's a lot to unpack so this will be a two part post. In this part we’ll talk about what we even mean by “humanoid”, and the advantages and challenges of mimicking various human body parts. In part two we’ll dive into the economics of humanoid deployments and how LLMs (Large Language Models, like ChatGPT) enter the picture. By the end we won’t have any magical answers, but hopefully you’ll leave with some frameworks for thinking about humanoid robots.
What is a humanoid?
The question of “why humanoids” gets confusing because “humanoid robots” make a bunch of different promises. We’ll talk about a few different axes.
Generally when people think of a humanoid robot they imagine a robot that…
Looks like a human,
Works in human environments,
Is multi purpose,
Is able to be instructed and interacted with like a human.
Why look like a human?
When you ask folks why they are working on humanoids you usually get an answer about how the world is designed for humans so robots shaped like humans should succeed in our environment.
There is an implicit promise that, because it is human shaped, it ought to be able to do things that humans do now, and more, ought to be able to do them without requiring big changes to your facility.
There is also a promise that, because it is shaped like a human, it will be flexible enough to do many different tasks that humans are able to do.
Obviously just being human shaped isn’t enough: you also need the strength, dexterity, perception and intelligence to accomplish those tasks. Being human shaped is also not required. One could imagine a multi-purpose robot that worked in unstructured human environments that looked nothing like a human. And you certainly don’t need to be perfectly human like to do useful work. So it's an interesting question: if you are trying to make a practical humanoid robot, which parts do you copy?
Legs
Legs let you be tall:
There are a bunch of tasks in human spaces that require you to manipulate things up high (get stuff on shelves) and down low (put things in cabinets). Human spaces also can get tight, so having a small footprint on the floor is useful. Those two things push you towards a tall, skinny robot which means a tippy robot. In order to not fall over you can either try to still have a low center of gravity (think of a cherry-picker. It goes super high, but has a huge amount of mass at the bottom to stay stable) or you can balance dynamically. Legs are one solution to dynamic balance, but you can also have dynamic balance with wheels.
Legs let you handle stairs:
This is the biggest win for legs over wheeled solutions. There are wheeled and track-based things that can use stairs, but they tend to be as exciting1 as legs.
Most of the current batch of humanoids have two legs, though Digit’s backward legs are decidedly non-human in their configuration.
I think it is interesting that X1’s EVE robot has a mermaid-tail torso that ends in wheels, but their next-gen robot has two legs.
But why not a wheeled centaur?
Nearly every one of the humanoid companies claim they are targeting a commercial or industrial application as their first market. These spaces are ADA compliant. I’m super confused by the lack of two-armed torsos on largish, heavy, wheeled bases. Legs are just really complicated & expensive. Plus they consume a bunch more energy than wheels, move a lot slower, and all of your manipulation needs to either make full body plans that take balance into account or your balance control and manipulation control treat each other like disturbances and they’ll be fighting all the time. And while it's true that legs are omnidirectional (you can step to the side) you can make wheeled bases that are omnidirectional too, and still skip a bunch of the challenges that legged balancing gives you.
Two Arms
Many tasks just need two hands:
After working for 6 years with this robot:
I’ve come to appreciate how much only having one hand constrains you2. There are things you can do one handed, but many of them are much easier two handed3, especially for not-that-dexterous robots with their clumsy grippers. Which brings us to…
Five Fingered Hands
Human hands are frikkin awesome. I could write an entire post about human hands and the ways that they outshine anything we can build for robots. We have 5 fingers and each finger has at least 3 degrees of freedom4 and the thumb and palm pinkie have at least one more each (you can move the base of your pinkie and thumb forward) for a total of 17 independent controllable degrees of freedom. At my first job we had a 17 DOF (degree of freedom) robot hand and (like a human hand) the only way to package all that motion was to move the actuators (muscles) into the forearm and do everything through cables (tendons). It was a beast and meant we had a huge forearm. The cables were also a maintenance nightmare and were always loosening or breaking.
I don’t think you need all of that motion to make a useful hand. In fact, I’m surprised that so many humanoids have five fingers. People who have lost a finger tend to be just as capable as people with five fingers. I’d think that robots, like cartoon characters, would converge on three5 to four6 fingers7.
Human hands are fast and strong and light. We have about 50 lbs of grip strength and can move fast enough that our eyes can’t track our finger motion. A motor and gearbox that powerful would tend to be big and heavy, which is not what you want at the end of your robot arm.
But the biggest way that robot hands differ from human hands is sensing. Our hands are incredibly sensitive. You can easily feel the height change of a single sheet of paper sitting on a table. When we want to fool a human’s eyes we use video at 30-60 frames per second, but when we want to fool human hands (for giving haptic feedback) we generally want to run at 1000 “frames” per second because the bandwidth of your hands is so much higher than your eyes. Think about how hard it is to do a delicate task with your whole hand numbed.
My money is on a less human looking hand (with fewer and bigger fingers) with good touch sensing, less than human speed, but similar to human grasp strength.
Size
The last category of human-ness is human sized. The push and pull here is that many tasks have some reaching up high and carrying heavy things. However, being sufficiently safe is super hard (for any robot). We are much less forgiving of our machines for bumping into us than we are our human co-workers, and robots tend to fall over on occasion. So your goal of making a tall and strong robot quickly gets into direct conflict with your goal of making a robot that won’t accidentally hurt someone. This is not easy because if your robot is strong enough to move heavy boxes that means it is strong enough to drop heavy boxes on someone, and usually means the robot itself is heavier than the heavy boxes.
You can make robots more safe by adding human-auditable and redundant safety systems in your software, but that is the opposite direction from the promise of generative AI being the engine of multipurpose robot control. You can have parallel systems (AI for control and conventional safety to verify the output) and that is probably what you have to do, but it's going to be really hard for the (dumb, simple) safety system to tell the difference between pushing hard on a box to put it in place and pushing hard on a box that causes a human to be crushed.
I don’t have good answers, just that it's really hard, and it seems like most robot companies are betting on being so useful and valuable that they are able to convince society (and therefore various safety bureaucracies) to accept a slightly higher chance of unpredictable injury. Higher than we’d tolerate from, say, a printer. It's not like humans are guaranteed not to bump into each other, we just usually don’t. It's kind of like the argument that self-driving cars don’t have to be perfect, they just have to be similar to human drivers.
You can also make robots more intrinsically safe by making them lighter, but making things light and strong at the same time is hard. And by hard, I mean expensive. And keeping your robot’s cost down is going to be very, very important to succeeding with a humanoid startup, as we will see in Part 2…
Thanks to Michael Quinlan and Rodney Brooks for reading an early draft of this post.
About the Author: Benjie has programmed robots at startups and Google X’s Everyday Robots and is currently Director of Robotics at Robust AI, where he hopes to ship boring looking non-humanoid robots to customers who pay money for them.
In this context exciting means: “complicated, failure prone, expensive and dangerous”
I grew up working with my family’s puppetry business and we often wished we had three hands.
For example, after some practice I am able to open a soda can one handed while holding it, but it is absolutely not the easiest way to do it.
Wiggle side to side, bend at knuckle, and curl. Plus the metacarpal roll, but that's kinda only a half-dof for your index/pinky.
I just learned that our amphibious ancestors that crawled out of the ocean had 8 fingered flippers that are the origin of our hands. We didn’t get more fingers as we evolved to do more complicated manipulation: we lost fingers. That seems to imply that more than five is too many fingers, but doesn’t really tell us what the lower bound is.
Have you seen PAL Robotics' TIAGo Pro robot? It seems to fit your "wheeled centaur" concept perfectly 😊