Nvidia research shows robots that train themselves through AI coding agents
Photo: the-decoder.com

Nvidia research shows robots that train themselves through AI coding agents

Originally reported by The Decoder

"Robots learn tricky tasks with unprecedented speed and accuracy, but real-world challenges persist. Human involvement still required."

Nvidia researchers in Pittsburgh achieved a breakthrough in robot training. They leveraged AI coding agents to teach robots dexterous grasping, a notoriously difficult task. The project, called ENPIRE, aims to reduce human involvement in the training process. By automating the collection of training data and tweaking of algorithms, the researchers hope to accelerate the development of more sophisticated robots.

The ENPIRE project involves a fleet of eight dual-arm YAM robot stations, each equipped with its own hardware, computer, and coding agent. The agents work in two phases: the first phase requires some human feedback to set up the working environment, including safety boundaries and automated success checking. In this phase, the agent develops a reward function to evaluate its own performance, using example videos of successful and failed attempts. For instance, when learning to insert pins, the agent combined visual alignment, gripper height, and estimated force to determine success.

In the second phase, the agent works entirely on its own, reading research papers, forming hypotheses, and editing the training code directly. The agent selects the most effective method, such as behavior cloning or reinforcement learning, based on real-world success signals. This autonomous approach enables the agents to test different hypotheses simultaneously and share results through Git, a standard version control tool. Successful training recipes are adopted, while ineffective ones are discarded.

The results are impressive: the agents achieved up to 99 percent success on demanding tasks like the Push-T test, sorting pins into a box, and cutting a cable tie with a cutter. In some cases, the strategy converged to 100 percent faster than comparable human-in-the-loop methods. The researchers also found that scaling up the number of agents significantly reduced the time to achieve full success. For example, going from one to eight agents cut the time to solve the Push-T test from about five hours to two.

The researchers tested three current coding agents: Codex with GPT-5.5, Claude Code with Opus 4.7, and Kimi Code with Kimi K2.6. Codex performed best in most cases, but the results also highlighted the challenges of transferring skills from simulation to the real world. While all three agents solved the Push-T test in simulation, two out of three failed in the real environment. The researchers attribute this to unpredictable and variable conditions like robot dynamics, friction, and object movement.

The ENPIRE project has significant implications for the development of more advanced robots. By automating the training process, researchers can focus on higher-level tasks, such as designing more sophisticated robot architectures or developing new applications. The project also demonstrates the potential of AI coding agents to accelerate the development of robotics and other fields. As the researchers note, the real world is still far harder than simulation, but the ENPIRE project shows that with the right approach, robots can learn complex tasks with unprecedented speed and accuracy.

One of the key challenges in robotics is the transfer of skills from simulation to the real world. The ENPIRE project highlights the difficulties of this transfer, but also suggests potential solutions. By using AI coding agents to automate the training process, researchers can reduce the gap between simulation and reality. The project also demonstrates the importance of feedback loops, where the agent can evaluate its own performance and adjust its strategy accordingly.

The researchers propose two metrics to evaluate the efficiency of the ENPIRE project: Mean Robot Utilization (MRU) and Mean Token Utilization (MTU). MRU tracks how much research time the robot actually spends working, while MTU counts language model usage per minute. These metrics provide a way to measure the effectiveness of the ENPIRE project and identify areas for improvement.

The ENPIRE project is not without its limitations, however. The researchers note that the agents spend a significant amount of time reading logs, writing code, and debugging, which reduces the overall efficiency of the system. Additionally, the project requires significant computational resources, which can be a bottleneck for larger-scale deployments.

Despite these limitations, the ENPIRE project represents a significant breakthrough in robot training. By leveraging AI coding agents, researchers can automate the training process, reduce human involvement, and accelerate the development of more advanced robots. The project has far-reaching implications for fields like manufacturing, logistics, and healthcare, where robots are increasingly being used to perform complex tasks. As the researchers continue to refine the ENPIRE project, we can expect to see even more impressive results in the future.