How to Pick a Problem

So You Want To Do Robots: Part 7

Mar 11, 2023

About this series

I’ve been working on general purpose robots with Everyday Robots for 8 years, and was the engineering lead of the product/applications group until we were impacted1 by the recent Alphabet layoffs. This series is an attempt to share almost a decade of lessons learned so you can get a head start making robots that live and work among us. Previous posts live here.

What to do first?

Alright, you’re going to make a general purpose robot. Eventually it’s going to do all kinds of things. But right now it doesn’t do anything. What do you try first? In this post we’ll talk about the various pitfalls to avoid when choosing your first tasks.

But first I want to show you the scariest venn diagram in robotics:

Your most important job is to try and push those two circles together. You are trying to create quantitative value so you can escape Pilot Purgatory. If you are an engineer, you are going to have a tendency to laser focus on the “things robots do” circle. And you absolutely need to do work on that, but how much you need to expand it, and therefore how many years it will take, varies greatly depending on what task you are aiming for.

Our first task is going to be folding fitted sheets one handed. We hope to go to market in 2056.

Butlering and in-home elder-care are traps

Folks new to robotics tend to get really excited about in-home “robot butler” applications, but this is a trap.

“With only $5B in investment we can completely disrupt the in-home Butlering industry”

Not only is this hard because you have to create a new product area in a really price sensitive market,2 but the tech is also stupid hard. People’s homes are all very different from each other, they tend to have tight spaces which make navigation hard. And homes are full of unique objects.

Robot, dust my international folk mask collection, please.

Commercial spaces on the other hand are largely ADA compliant. And they tend to be filled with dozens of the same chairs/tables/desks making specializing your ML models to the environment way easier.

The second trap is to pivot from butlering to elder care. This feels really good. You get to be doing robots3 and also making the world a better place. More people get to age at home, which is great, right? The problem is homes are still too hard, plus the risk and cost of failure gets way higher which means you need a very, very reliable stack to be viable.

I’ve fallen and I can’t get up

It's not that the problems are completely insurmountable, but my prediction is that if you could solve all those technical problems in 10 years, you could have gone to market and made $100B doing drudgery in commercial spaces in five years.4

So what should you do? There are two sweet spots for projects for an early stage general-purpose robot company. Quick prototyping (should take you a few days) and real people’s problems (which takes weeks to months). Both of these things are worthwhile.

The value of play

Quick prototyping is really rewarding because it's just plain fun to make robots do things. And it is often very possible to make something that works pretty well in a day or two, which feels really good. At Everyday Robots my team took one day a week as a free-form “hack day” to see new ways that we could push the robot stack to do cool things.

By lowering the stakes, you encourage people to try crazier things and follow their intuition even if they can’t articulate why something is exciting. Since no one knows exactly how we’re going to solve the dexterity problem, encouraging exploration is a good thing. The best way to overcome a naysayer’s “that's not possible yet” is to show it happening. Also, doing many things quickly helps you find the edges and rough spots of your APIs. We designed some of my favorite APIs and capabilities because of things we learned in our hack days.

The other thing “hack days” do is avoid a particular kind of tech debt. At one point we were working on a general-purpose API and had built what we thought was a general purpose “grasp” capability. But we were developing and testing it in just one application and even though it worked really well in that application, each week during hack day we’d find ourselves wanting to grasp something else and realize that the “grasp” we had didn’t quite work right. We'd accidentally built that particular application all through our tech stack including places that were supposed to be “general purpose.”5 By paying more attention to hack days you can save months of polishing APIs that don’t need to exist.

The value of value

The other thing that is worth doing is solving real people’s problems with robots. After all, that is the point. Folks have written a ton about the importance of shipping an MVP to derisk your product so I’ll focus on something unique to robots: to solve real problems you also have to figure out how to make things reliable.6 No one goes to a car wash that sometimes only washes half your car. Quick prototyping doesn’t tell you anything about the reliability of the things you could build, and it's easy to get a video of something that almost never works as long as you try enough times.7 You do need to make something that works often enough that it's worth using.

This is going to be much more expensive than quick prototyping because it is a ton of work to make anything reliable, and also because as you get more reliable you need to be doing more and more of the task (and therefore need more and more robots) to even measure how well you’re doing. We had several robots wiping a total of more than 100 tables a day across 4 cafes and it still took a week or two to measure if a change made things better or worse.8 Which is why it's super important to take time to understand your customer’s problem because investing a ton of effort and struggle to slowly make the wrong product reliable is a major bummer.

But reliability is the important tech risk here, because that's the part no one has been able to do yet.

Fake problems & lab benchmarks are a traps

It's tempting to see that ‘high reliability’ is important and try to work on reliability with a toy problem or lab benchmark.9 This might be a reasonable choice for research, but I think it's a bad idea for a startup. Toy problems and lab benchmarks are nice because you can control things. You can do them over and over (to measure how you’re doing) and you can probably put them near your desk (which is really convenient).

Yes, we need an industrial pizza oven placed between our desks. Can you handle the electrical and venting needs? Thanks!

In my experience the hard part of reliability is never what you expect it to be. For trash sorting we thought that the hard part would be “grasp success”, so we built a lab space to measure grasp reliability. Unfortunately, it turns out that the hard part about sorting trash is that people already mostly put trash in the right spots, so it becomes a needle-in-a-haystack problem of finding the trash that is mis-sorted. And if your grasp has a 1-in-10 chance of grabbing the wrong thing (like the cup next to the can you wanted) but things are 95% sorted in general you can actually still make things worse with 90% accuracy. That's something we only learned by running our robots in the wild.

The underlying problem is that it's shockingly hard to define good metrics. And if you have a fake problem, like delivering snacks or something, then you are doomed to never be able to tell if you are doing it well. What's important about delivering snacks? Time to deliver? Successfully making small talk without errors? Snacks delivered per hour? If you are working on a toy problem (one nobody actually cares about) then you can’t know if a metric is more important than another. On the other hand, if you have a real problem then you can talk to the people who care about the task and they will tell you what actually matters. And as a corollary I would not work on a problem where I didn’t have a good customer/partner who cared and understood the problem really well and could tell me what mattered. You want someone who is engaged, not supportive. When we worked with a supportive customer they told us everything was fine and that was completely unhelpful, whereas a good engaged customer/partner can tell you what you’re doing right and wrong.

Think about your Autonomy curve

The other thing to think about when choosing a task is what your autonomy curve looks like. The poor souls working on self driving cars have an absolutely brutal autonomy curve that looks like this:

Self driving car autonomy curve

Self driving cars have almost no value at all until you get to 99.9% or more autonomy. If you have to be ready to jump in at any second it's not actually much better than just driving. Getting to 99.9% autonomy means you need 99.9% reliability on 99.9% of scenarios10 and that's frikken11 expensive. That's why autonomous cars have been such a slog: they have to solve everything before they go to market.

On the other hand, if you can find a place where three or four humans are doing a job and you can add a robot that does the most repetitive 30%-40% of that, then the humans can cover for the robot when it hits a weird edge case and you can still save them time. That value curve looks more like this:

Robot-on-a-team autonomy curve

Being on that value curve is fantastic because you get partial credit for partial work. If you are trying to replace all the people in the front-of-house at a restaurant there is a near infinite long tail of things to do, and you’re back in in the realm of AGI.12

Hey, Waiterbot, can you put this large ice cream cake in a deep freezer until after dinner?

But if your robots can deliver food to the table while working with the humans to do the rest, you probably have a real startup on your hands. In general, you automate tasks not jobs so the trick is to automate a task that saves a team time and money.

TLDR: Make sure someone cares

When you do a quick prototype or hackathon you are the person that cares. You are having fun and you can evaluate how well things work relative to what you wanted to do. If you have a real problem you can tell if you are doing well because some customer is either thrilled or politely supportive. If you work on a toy problem for a long time you will stop caring about it because it will inevitably become a slog. And no one else will care about it either and it will suck. And if you solve a real problem you might have a business, but if you solve a toy problem all you have is a toy.

If you liked this, you might enjoy part 7: keeping the code clean.

Thanks to Sara Ahmadi for reading an early draft of this.

3-2-1 Impact!

If you want a reality check on the home market, figure out what your wildest dream cheapest BOM cost (Bill Of Materials: the stuff you need to buy to build each robot) and then ask some family what they would spend for a robot that works. My wife says things like, “Ooh, if you could make a robot that can do dishes and laundry I bet people would totally pay a few hundred dollars for it.” And every robot I’ve ever worked on has been targeting $5k sale price but actually cost more than 10k to build.

And doing robots is hella sweet, obvi.

And if you go for the 10 year plan, there is nothing to say that the folks doing the 5 year plan don’t also beat you to elder care with their new ginormous robot company, mature tech stack, mountains of ML data and army of engineers.

This mistake is worse than it sounds because you lose in two ways: 1) you move slower because you think you are writing “generic capability” code not “throwaway application code” so you take more time to do a better job. 2) You do a worse job at the problem you are trying to solve because you ignore some assumptions you could be making because you are trying to make it more “general purpose”. Slow progress towards mediocre performance. Ugh!

I mean, other products need to be reliable too, but it is much harder to make robot applications reliable than with other kinds of products.

Unless it made it way worse. That’s easy to tell.

By “toy problem” or “lab benchmark” I mean something that you pick for your robots to do that is similarish to someone’s real problem but simpler and more controlled than real reality. Rule of thumb: if you haven’t gone to a place and watched a human do a job for a few hours you are working on the ‘toy’ version of that problem.

Math pedants in the audience might be thinking that each should be more than that because aren’t they multiplicative? Even more intense math pedants are thinking that it's probably a more complicated relationship than simple multiplication. But like, this is a blog with cartoons. Take your fancy math back to your brownbag ML reading groups or whatever.

Please interpret the word frikken as “very” or “fucking” to your preference.

Artificial General Intelligence (aka magic)

General Robots

Discussion about this post