The Rabbit research team just accomplished a massive technical feat. If you’re unaware, rabbit, a new AI startup just released a keynote announcing their “AI Pocket Companion” – a device built on their proprietary Large Action Model (LAM). The LAM is the breakthrough technology in this case.
OpenAI set the stage for what’s attainable with scaling large neural networks and although their large-language model (LLM) interface ChatGPT was itself revolutionary, there were still limits to its actionable capabilities. ChatGPT cannot directly interact with the outside world. Rabbit’s LAM can, and the way that it does has massive implications for AI transparency.
Rabbit’s LAM is a universal interface meaning that ideally, the LAM can interact with any application on the internet. Imagine having a centralized application where you can do your banking, communicate with your friends, and watch your favorite shows. From a software engineering perspective, this is technically very difficult (and has never truly been done before). The following explanation is a bit reductive, but this is the case because your banking software, your messaging software, and your streaming software were all built by separate companies whose applications have different security protocols. Because of these security protocols, each company develops their user-interface (UI) differently to funnel the user-experience. Funneling the user-experience not only ensures that a customer’s data is secure but also allows for the developers to create a desirable user-experience. Some applications have application programming interfaces (APIs) which allow for developers to interact with the application programmatically, while the only way to interact with some applications is through the UI. To create a universal interface, you need to programmatically interact with all applications including ones that don’t have built-in APIs. It’s this funneling and exclusion of APIs that make creating a universal interface difficult and rabbit’s LAM more revolutionary.
According to rabbit’s research team, their LAM is a combination of a neural language model which can interpret and write text, and a neuro-symbolic model. For the neural language model, think ChatGPT – you can write language prompts and it can output sophisticated responses. The neuro-symbolic model is the technical breakthrough that I don’t have a deep understanding of yet. According to The Alan Turing Institute, “[Neuro-symbolic AI] can potentially provide a new wave of AI tools and systems that are both interpretable and elaboration tolerant and can integrate reasoning and learning in a very general way”. And according to Wikipedia, "[The arguments for neuro-symbolic AI] attempt to address the two kinds of thinking, as discussed in Daniel Kahneman’s book Thinking Fast and Slow. It describes cognition as encompassing two components: System 1 is fast, reflexive, intuitive, and unconscious. System 2 is slower, step-by-step, and explicit. System 1 is used for pattern recognition. System 2 handles planning, deduction, and deliberative thinking.” The neural language model seemingly is a representation of System 1 while the neuro-symbolic model is a representation of System 2. Rabbit’s LAM is a commercial application combining an LLM and a neuro-symbolic model, the neuro-symbolic model is the part of the system that can supposedly interact with the outside world.
The neuro-symbolic model they created is an application of symbolic programming. Traditional programming languages have inherent restrictions on what can be represented computationally, symbolic languages attempt to bypass these inherent restrictions. Stephen Wolfram and Wolfram Research have been working on symbolic programming since 1988 and have developed a language that can represent abstract mathematics computationally. The work being done at Wolfram Research is truly revolutionary but largely hasn’t gotten mainstream recognition because its applications are in the realm of fundamental science.
Rabbit’s LAM is trained to interact with the UI, which theoretically means that the LAM can interact with every application on the internet. In other words, if you ask Rabbit R1 to call an Uber, it will interact with Uber’s website the same way that a human would and because this interaction isn’t being driven by a neural network, the action will be interpretable and traceable. The rabbit research team said it best, “LAM-learned actions on applications should be highly regular, minimalistic (per Occam’s razor), stable, and explainable.” We still can’t understand how Rabbit R1 understands our commands, but we can understand how a specific command led Rabbit R1 to make an undesirable and potentially harmful decision. The Alan Turing Institute explains that neuro-symbolic AI is built on systems of rules, logic and reasoning and that this leads to behaviors that are more transparent and explainable. This is an important development for AI policy in that even the most potentially consequential consumer-facing applications aren’t “black boxes”.