
INTRODUCING
THE FIRST
AGI-CAPABLE MODEL
We are thrilled to unveil the first AGI-capable model—a breakthrough in machine intelligence that autonomously learns new skills safely, efficiently, and reliably. This milestone addresses the inherent limitations of current AI systems and establishes a scalable framework for achieving true generality and superintelligence.
The Rigorous Definition of AGI
The term AGI is often overused without precision.
To bring clarity, we define AGI as a system that fulfills three core criteria:
-
Autonomous Skill Learning: The model must independently teach itself new skills in novel domains, without relying on pre-existing datasets or human intervention.
-
Safe and Reliable Mastery: It must learn without unintended side effects or catastrophic failures. For example, a kitchen robot learning to cook must not cause a fire during training.
-
Energy Efficiency: The total energy cost of learning must be comparable to or less than that of a human mastering the same skill.
This definition transcends technical milestones—
It is foundational for human emancipation. Existing approaches often depend on massive datasets generated through human labor, perpetuating a dystopian reliance on "data farms." True AGI liberates humanity from such constraints, empowering societies rather than exploiting them.
In contrast, our architecture integrates growth, abstraction, and action, enabling scalable, reliable, and safe intelligence.

Towards Superintelligence
Superintelligence can be asymptotically defined as a system where human collaboration does not enhance performance. While domain-specific superintelligence (e.g., in chess) has been achieved, general-purpose superintelligence requires AGI systems capable of mastering all tasks autonomously and efficiently, outperforming any human-machine collaboration. Our roadmap to superintelligence is built on three steps:

-
Universal Simulators: Models that generate abstract, multimodal representations of any environment or system.
-
Universal Operators: Agents capable of planning, executing, and actively learning within both digital and physical domains.
-
Scaling to Superintelligence: Optimizing for a Universal Objective while integrating diverse human preferences and needs.
STEP 1- UNIVERSAL SIMULATORS
Current AI systems function as black boxes, mapping inputs to outputs without explicit abstractions or a coherent understanding of the underlying world. These systems conflate memorization and generalization, leading to:
-
Inefficiency: Lacking structured abstractions, models depend on brute-force optimization, consuming vast computational resources while remaining vulnerable to local minima.
-
Brittleness: The absence of interpretable representations makes them fragile and error-prone in novel scenarios.
Our Approach:
Abs are a paradigm shift, designed to create explicit, hierarchical abstractions that mirror the human neocortex.
01
MULTIMODAL AND EMBODIED
Simulators integrate data from diverse modalities—vision, language, audio, and physical sensors—producing unified world models that generalize across domains.
02
HIERARCHICAL ABSTRACTIONS
By compressing and structuring sensory data recursively, simulators build layered representations of reality, enabling high-level reasoning and prediction.
03
SCALABLE GROWTH
Unlike static systems, Universal Simulators
grow dynamically through:
Lifelong Learning: Retaining and refining knowledge without catastrophic forgetting.
Gradual Expansion: Increasing parameter size, context length, modality coverage as needed.
These capabilities make Universal Simulators the first true world models, capable of understanding, reasoning about, and predicting complex systems autonomously.
STEP 2- UNIVERSAL OPERATORS
While Universal Simulators provide the foundational understanding, AGI requires action and tool use to translate intelligence into real-world utility. Universal Operators extend simulators by enabling planning, execution, and continuous learning.
01
EFFICIENT PLANNING
Traditional AI models rely on inference-time scaling, which becomes exponentially expensive for complex tasks. Universal Operators instead leverage abstractions from simulators to plan efficiently at higher levels of granularity. For example, they plan goals and subgoals, refining details only when necessary—avoiding the intractability of low-level planning (e.g., planning a trip at the muscle-movement level).
02
TOOL USE
Universal Operators seamlessly interact with tools—both digital APIs and physical robots—to achieve goals. Key features include:
Using Existing Tools: Operators integrate with APIs, robotic systems, and other services to act on user goals.
Creating New Tools: When existing tools are insufficient, operators autonomously design and build new ones, extending their capabilities.
03
ACTIVE LEARNING
Universal Operators combine planning, tool use, and lifelong learning to achieve active learning:
Given a high-level request, the operator devises self-directed experiments to fill knowledge gaps and acquire the necessary skills.
These experiments are conducted safely and efficiently, fulfilling the criteria for AGI.
For example, an operator tasked with discovering a new drug autonomously plans its experiments, uses robots to safely conduct them in the lab, and creates new knowledge from analyzing the results of the experiments. This knowledge mining process automates the scientific method and acts as a scientific discovery engine.
01
EFFICIENT PLANNING
Traditional AI models rely on inference-time scaling, which becomes exponentially expensive for complex tasks. Universal Operators instead leverage abstractions from simulators to plan efficiently at higher levels of granularity. For example, they plan goals and subgoals, refining details only when necessary—avoiding the intractability of low-level planning (e.g., planning a trip at the muscle-movement level).
02
TOOL
USE
Universal Operators seamlessly interact with tools—both digital APIs and physical robots—to achieve goals. Key features include:
Using Existing Tools: Operators integrate with APIs, robotic systems, and other services to act on user goals.
Creating New Tools: When existing tools are insufficient, operators autonomously design and build new ones, extending their capabilities.
03
ACTIVE LEARNING
Universal Operators combine planning, tool use, and lifelong learning to achieve active learning:
Given a high-level request, the operator devises self-direct experiments to fill knowledge gaps and acquire the necessary skills.
These experiments are conducted safely and efficiently, fulfilling the criteria for AGI.
STEP 3- SCALING TO SUPERINTELLIGENCE
Integration with Civilization-Scaling AGI requires a robust backend and an intuitive front end, akin to the internet’s evolution. In this analogy:
Alignment and Open-endedness
Achieving superintelligence requires aligning AGI systems with human values and goals. Our approach integrates a Universal objective— Freedom, defined as the ideal state of infinite agency and possibility—with local preferences through an Alignment Economy, ensuring AGI serves both global ideals and individual needs.
This creates exactly the right formula for open-endedness—to be able to generate entirely new narratives without human constraint is crucial to achieving true creativity.

First Demonstration
While we recognize that these are early steps in a long journey, we are encouraged by initial demonstrations that test our framework's potential

-
Autonomous Robotics: A robot learning new skills entirely autonomously, adapting to a complex, real-world environment.
-
Digital Mastery: Generating novel software and solutions from high-level user instructions, demonstrating creativity and precision.

Through this Sokoban experiment, we aim to demonstrate how AI can achieve efficiency in problem-solving significantly faster than humans. Just as it might take someone around 10 years to be a grandmaster in chess, AI can reach a professional level in a fraction of the time by leveraging computational power and optimized algorithms.



EFFICIENT PLANNING
If detecting a smile, the model first examines the mouth, then shifts to the eyes to assess whether the expression is genuine or forced, processing key details in a structured sequence. This approach enables faster and more accurate recognition, improving both speed and depth of understanding.

EFFICIENT PLANNING
For example, if detecting a smile, the model first examines the mouth, then shifts to the eyes to assess whether the expression is genuine or forced. When identifying physical traits like hair color or facial hair, it directs its attention efficiently, processing key details in a structured sequence. This approach enables faster and more accurate recognition, improving both speed and depth of understanding—starting from simple observations like whether someone is smiling to deeper insights into their emotional state.

EFFICIENT PLANNING
For example, if detecting a smile, the model first examines the mouth, then shifts to the eyes to assess whether the expression is genuine or forced. When identifying physical traits like hair color or facial hair, it directs its attention efficiently, processing key details in a structured sequence. This approach enables faster and more accurate recognition, improving both speed and depth of understanding—starting from simple observations like whether someone is smiling to deeper insights into their emotional state.

EFFICIENT PLANNING
For example, if detecting a smile, the model first examines the mouth, then shifts to the eyes to assess whether the expression is genuine or forced. When identifying physical traits like hair color or facial hair, it directs its attention efficiently, processing key details in a structured sequence. This approach enables faster and more accurate recognition, improving both speed and depth of understanding—starting from simple observations like whether someone is smiling to deeper insights into their emotional state.
By teaching AI to focus on the right details we’re not just improving how it recognizes faces, but also building a foundation for something bigger. When we scale this up, AI could use what it learns to take real-world actions. Imagine a robot that can read emotions and respond appropriately, offering help when someone looks upset or engaging in a more natural conversation. This ability to observe, understand, and act makes AI more than just a passive tool—it becomes something that can interact with the world in a meaningful way.

3D AGI Demonstration
From 2D puzzles and observations, we've trained our models to step into full 3D environments—learning not just to look, but to plan their next move, navigating 3D spaces to find answers, predict outcomes for reliable results.

The models are trained to navigate and solve problems in small 2D and 3D environments, developing essential skills like memory, spatial reasoning, and decision-making. By learning to plan actions and adapt to changing situations, it builds a foundation for more advanced thinking. The goal is to scale this approach into a full world model—one that can handle open-ended tasks and real-world complexity with the flexibility and problem-solving ability expected from general intelligence.



