Key Technologies to AGI - World Model

Published on

01/30/2024

Share to:

With the emergence of ChatGPT, the rapid development of artificial intelligence, and the strong entry of global tech giants into AIGC, human civilization is transitioning from the digital era to the era of computational intelligence.

From Alan Turing's conceptualization of artificial intelligence (AI) in 1950 to the latest concept of "World Model" proposed by Turing Award winner and father of convolutional neural networks, Yann LeCun, in 2022, and the successful launch of the humanoid model I-JEPA in 2023, the field of artificial intelligence has continually achieved groundbreaking milestones.

Since last year, artificial intelligence technology is advancing at an astonishing rate, with the World Model technology gaining significant attention. It is considered a crucial component for achieving "strong artificial intelligence," aiming to enable machines to infer and predict the environment's dynamics with the knowledge models they possess in the absence of external environmental information.

In the AI era, everyone needs to have some basic knowledge of artificial intelligence to understand and keep pace with this new era. This article aims to provide a comprehensible explanation, bypassing the complex and difficult-to-understand foundational knowledge in areas such as mathematics, Python programming, neural networks, and deep learning frameworks related to artificial intelligence. Whether you are a technical or non-technical person, this article aims to help you understand the underlying logic and basic principles of the intelligent era.

Key Technology Leading to General Artificial Intelligence - World Model

1）What is World Model?

World Model is an advanced model in the field of artificial intelligence, commonly used in reinforcement learning. It constructs and predicts the dynamic characteristics of the environment, enabling AI to simulate and predict future states to assist in decision-making. World Model can be seen as the "mental model" of an AI system, reflecting the system's perception and expectations of itself and the external world.

The primary purpose of World Model is to design a neural network module capable of updating states, memorizing and modeling the environment. It achieves this by utilizing current observations (images, states, etc.) and anticipated actions to predict the next possible observations (images, states). The model is trained through self-supervision, comparing the actual observations after taking actions with the predicted observations.

World Model was proposed as early as the 1960s and has been widely applied in computer science, cognitive science, psychology, and other fields. As AI technology has advanced, World Model has evolved to adapt to more complex and diverse environments and tasks. For example, in the field of reinforcement learning, World Model is used to establish model-based approaches to improve the learning efficiency and generalization ability of AI systems. In computer vision, World Model is employed in methods based on generative adversarial networks to enhance AI systems' image generation and understanding capabilities.

World Model is one of the key technologies for achieving AGI, providing AI systems with the following capabilities:

Abstraction: World Model allows AI systems to extract high-level features and concepts from raw sensory data, facilitating abstract representations of the environment and easier handling of complex tasks.
Prediction: World Model enables AI systems to predict future states and rewards based on current states and actions, enhancing dynamic prediction capabilities for more effective planning and decision-making.
Simulation: World Model allows AI systems to actively simulate virtual scenarios and situations based on their internally constructed models, promoting autonomous exploration, learning, and creative problem-solving.
Interaction: World Model enables AI systems to infer the intentions and beliefs of themselves and other intelligent agents based on their behaviors and goals, fostering social interaction, collaboration, and understanding.

2）Latest Theoretical and Applied Research on World Model

In February 2022, Yann LeCun, Turing Award winner, father of convolutional neural networks, and Chief AI Scientist at Meta, published "A Path Towards Autonomous Machine Intelligence." He proposed the idea of building a "World Model" based on AI: creating a machine that can learn the internal model of how the world operates, enabling it to learn faster, plan for complex tasks, and adapt to unfamiliar situations.

LeCun believes that to bring AI close to human-level intelligence, it needs to learn how the world operates like a baby. Therefore, he proposed the solution to "World Model" hypothesis, namely JEPA (Joint Embedding Prediction Architecture). JEPA uses a series of encoders to extract abstract representations of the world state and different levels of WorldModel predictors to predict different states of the world, making predictions at different time scales.

Artificial intelligence originates from the simulation of human intelligence. Initially, babies have relatively minimal interaction with the external world, mainly learning the rules of the world through observation. As they grow, they gradually accumulate a vast amount of background knowledge about the world's structure. How can machines learn like humans and animals do?

LeCun proposed an autonomous intelligence architecture consisting of six independent modules:

The perception module receives information
The cost module considers behavior-driven and intrinsic motivation
The configurator is responsible for control
The world model module fills in missing information and predicts
The participant module provides action recommendations
The short-term memory module is responsible for recording

It is assumed that each module can easily compute some objective functions and their corresponding gradient estimates, and propagate gradient information to upstream modules. This set of highly humanized system design.

In June 2023, a personalized model I-JEPA based on this theory was released. Although there are differences in actual performance compared to the initial concept, it possesses learning features similar to advanced abstractions in the human brain, offering hope for large models to break free from dependence on anticipated big data.

World Models and Large Language Models (LLM)

Large Language Models (LLM) are deep learning models trained on massive text data. Language models with billions (or more) parameters, trained on extensive text data, such as models like GPT-3, PaLM, Galactica, and LLaMA, typically undergo self-supervised or semi-supervised training.

In 2022, ChatGPT triggered a frenzy of breaking through limitations in large language models. Despite the continuous improvement in the capabilities of large language models (LLM) in recent years, limitations in credibility, poor timeliness, and learning mainly superficial statistical patterns due to reasons such as data limitations and labeling strategies persist. ChatGPT can understand and memorize context within a certain range, but this memory is brief and limited, sometimes generating inaccurate or entirely non-existent information. ChatGPT's knowledge base is limited to training data up until January 2022, and it is unaware of events and information that occurred afterwards.

World Model imitates the brain, modeling a large intelligent neural network system, and excels in constructible understanding and creativity.

WorldBrain, based on the World Model, is a comprehensive and versatile decentralized artificial intelligence system initiated by the WorldBrains Foundation, an innovation project under OpenAI. Unlike ChatGPT, it adopts World Model, using Web3 to empower AI, combining artificial intelligence, neuroscience, and blockchain technology. WorldBrain proposes the concept of empowering AI with Web3 technology, based on "World Model" theory, aiming to create genuinely universal artificial intelligence.

The intelligence ceiling of animals is far below that of humans because animal brains lack a powerful "learning network," meaning underdeveloped neocortex. If artificial neural networks cannot model the world like the brain, then any deep learning network cannot achieve the goal of general artificial intelligence. True intelligent machines and general artificial intelligence will, like the brain's neocortex, use maps and reference frames similar to those in the human brain to learn the world model.

Each part of the neocortex works based on the same principle. From vision, touch, and language to advanced thinking, everything we consider intelligent is fundamentally the same. WorldBrain adopts the following methods and technologies in imitating the neocortex system:

Neuronal network system: WorldBrain can use neural network models to mimic neurons and neural network structures in the neocortex system. The neocortex system is responsible for advanced cognitive functions such as language processing, decision-making, and abstract reasoning. By constructing complex deep neural network models, WorldBrain can simulate information processing and transmission in the neocortex system, achieving similar advanced cognitive abilities.

Memory and learning mechanisms: The neocortex system plays a crucial role in learning and memory. WorldBrain can use similar mechanisms to imitate the learning and adaptive capabilities of the neocortex system. For example, using appropriate learning algorithms and models, WorldBrain can simulate long-term memory and associative learning in the neocortex system, achieving similar learning and reasoning functions.

Language processing and understanding: The neocortex system plays a critical role in processing and understanding language. WorldBrain can imitate the language processing capabilities of the neocortex system by using natural language processing techniques and language models. This includes simulation of semantic understanding, syntactic analysis, and language generation, enabling WorldBrain to engage in language communication and understanding similar to humans.

Abstract reasoning and decision-making: The neocortex system possesses advanced capabilities in abstract reasoning and decision-making. WorldBrain can imitate these functions of the neocortex system by constructing logical reasoning and decision models. This involves the application of technologies such as symbolic logic, rule engines, and decision trees to achieve similar abstract reasoning and decision-making capabilities.

By using maps and reference frames in the human brain as the foundation for learning the world model, WorldBrain can gain more accurate spatial perception and environmental understanding capabilities. This will enable WorldBrain to better interact with the physical world and achieve higher-level cognitive and intelligent functions.

World Model is a significant advancement in the field of artificial intelligence, taking AI to a more intelligent and autonomous level, ultimately achieving the ultimate goal of general artificial intelligence: accurately predicting the future or precisely simulating this world, evolving into a deterministic future.

The future world will be an intelligent, faster, and more convenient world. We must be aware of the challenges and opportunities brought about by artificial intelligence and prepare and plan for the future. Maintaining a keen sense and a long-term perspective, paying close attention to important factors such as AI technology development trends and policy dynamics, discovering more potential opportunities, and actively participating in the development process of the artificial intelligence industry in a more positive and open manner.