RoboTurk

A Crowdsourcing Platform For Robotic Skill Learning Through Imitation


Learn more Paper Dataset News release

Welcome to the RoboTurk Platform!

Imitation learning has allowed recent advances in learning robotic manipulation tasks but has been limited due to the scarcity of high quality demonstrations to learn from.

RoboTurk is a system that help solve this problem by enabling the rapid crowdsourcing of high-quality demonstrations. This allows the creation of large datasets for manipulation tasks that we show improves the quality of imitation learning policies.

RoboTurk is a research project and dataset from the Stanford Vision Lab. The research paper and sample data are available publicly.

Use RoboTurk from anywhere. Even the top of a mountain.

RoboTurk in action.

System Overview

RoboTurk is designed to be modular and enables quick imitation guided skill learning. All you need to do is:

  1. Create a task using your favorite physics simulator
  2. Deploy on the cloud to gather a large-scale dataset
  3. Use demonstration guided RL to learn a policy
  4. Deploy the policy on real robots

Behind the scenes, every time a user connects to the RoboTurk platform

  1. A low-latency connection is made to the user
  2. Users control the robot with their phone with real-time video feedback
  3. Every demonstration is stored in the cloud

Task Design for RoboTurk: Maximizing Flexibility

The "picking" task

The "assembly" task

RoboTurk in the real world

We designed and used two simulated environments made using MuJoCo: a "picking" task where users are asked to sort objects on one side into corresponding bins on the other, an "assembly" task where users are asked to put a "nut" on the correct peg. However, RoboTurk is not constrained to a specific simulator, environment, or even a simulator at all. All you need in order to gather data in the real world is a few robots.

The RoboTurk Dataset: By the Numbers

137.5

4

22

1071

1147

3224

Hours of Demonstrations

Days To Collect

Hours of System Usage

Successful Picking Demonstrations

Successful Assembly Demonstrations

Attempted Demonstrations

Using RoboTurk, we were able to collect a dataset of demonstrations on the two tasks that we created. We used the same annotators over the course of the 4 day span it took to gather this dataset and the rate at which they produced successful demonstrations increased over time as they became more adept at using the system.

System Characterization

User Interface Comparison

We tested RoboTurk using 4 different user interfaces (UIs). With allow different types of movement.

Based on task completion times, the VR and phone interfaces are comparable while the phone is significantly more widely accessible.

Worldwide Operation

We tested RoboTurk by controlling robot simulations hosted on servers in China from our lab in California, a distance of over 5900 miles!

We found that is possible to collect quality demonstrations using RoboTurk regardless of the distance between user and server.

Learning Results

The results we present use the full dataset of demonstrations that we gathered using RoboTurk. We learn policies using a combination of reinforcement learning and imitation learning guided by the data gathered by RoboTurk. Specifically, policies were learned using a distributed version of PPO (DPPO) from Surreal where the initial states for each episode are taken from along trajectories in the dataset. This helps overcome the exploration problem for difficult tasks.

Qualitative Results

An example learned policy on a simplified version of the "picking" task that we call "can picking".

An example learned policy on a simplified version of the "assembly" task that we call "round assembly".

Quantitative Results

Number of Demonstrations

Task

0

1

10

100

1000

Bin Picking (Can)

0±0
278±351
273±417
385±466

641±421

Assembly (Round)

0±0
381±467
663±435
575±470

775±388

This table shows that policies trained with increasing numbers of demonstrations perform better and more consistently.

This table represents policy performance on the task after 24 hours for the can picking task and 48 hours on the round assembly task. Note that the maximum possible return is 1000 which corresponds to instantaneous solving of the task. It is clear that using 1000 demonstrations for each task (nearly all of the RoboTurk dataset) performs the best on both tasks.

Meet the Team

Ajay Mandlekar

Ph.D. Student, EE

amandlek[at]stanford[dot]edu

Yuke Zhu

Ph.D. Student, CS

yukez[at]stanford[dot]edu

Animesh Garg

Postdoctoral Researcher

garg[at]cs.stanford[dot]edu

Jonathan Booher

Undergraduate Student, CS

jaustinb[at]stanford[dot]edu


Max Spero

Master's Student, CS

maxspero[at]stanford[dot]edu

Albert Tung

Undergraduate Student, CS

atung3[at]stanford[dot]edu

Julian Gao

Master's Student, CS

atung3[at]stanford[dot]edu

John Emmons

Ph.D. Student, CS

jemmons[at]stanford[dot]edu


Anchit Gupta

Master's Student, CS

anchitg[at]stanford[dot]edu

Emre Orbay

Master's Student, CS

eorbay[at]stanford[dot]edu

Silvio Savarese

Associate Professor, CS

silvio[at]cs.stanford[dot]edu

Fei-Fei Li

Professor, CS

feifeili[at]cs.stanford[dot]edu

Frequently Asked Questions

What do users need in order to use RoboTurk?

All a user needs to do is:

  1. Open a web browser
  2. Get our app on their iPhone 6s or above
  3. Log onto RoboTurk and begin controlling a robot arm in real time with their phone

The video shows just how easy it is to interact with RoboTurk to generate a demonstration.

Where can I get the data?

The data is available here

How do I cite the dataset or the work?

@inproceedings{mandlekar2018roboturk,
title={RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation},
author={Mandlekar, Ajay and Zhu, Yuke and Garg, Animesh and Booher, Jonathan and Spero, Max and Tung, Albert and Gao, Julian and Emmons, John and Gupta, Anchit and Orbay, Emre and Savarese, Silvio and Fei-Fei, Li},
booktitle={Conference on Robot Learning}, year={2018} }