View the paper “Assessing the Risk of Takeover Catastrophe from Large Language Models”
Recent large language models (LLMs) have shown some impressive capabilities, but this has raised concerns about their potential to cause harm. Once concern is that LLMs could take over the world and cause catastrophic harm, potentially even killing everyone on the planet. However, this concern has been questioned and hotly debated. Therefore, this paper presents a careful analysis of LLM takeover catastrophe risk.
Concern about LLM takeover is noteworthy across the entire history of artificial intelligence (AI). Throughout this history, dating to at least 1965, there has been concern that advanced AI systems could take over the world. However, the concern had always been about hypothetical AI systems that might exist someday in the future. Likewise, AI takeover risk analyses were largely theoretical, modeling hypothetical AI takeover scenarios with limited grounding in actual AI technology. This includes GCRI’s AI risk analyses (see this, this, and this).
Now, with the rise of LLMs, there is concern about takeover from actual systems that already exist, or new iterations of these systems. And so, AI takeover risk analysis can now take a more empirical character. This is a historic change in the risk of AI takeover and in the analysis of this risk.
This paper compares the AI system characteristics that may be needed for takeover catastrophe to the characteristics observed in current LLMs. It is not known for sure which characteristics are needed for takeover catastrophe—no AI systems have previously taken over the world, so we lack evidence. However, some prior research has postulated that certain characteristics may be necessary. Drawing on this research, the paper identifies seven characteristics that one or more AI systems may need to cause takeover catastrophe:
1) Intelligence amplification: The AI system(s) can increase their own cognitive capabilities.
2) Strategizing: The AI system(s) can formulate plans to achieve distant goals, even in complex environments with intelligent opposition.
3) Social manipulation: The AI system(s) can induce humans and human institutions to help them, either knowingly or unknowingly.
4) Hacking: The AI system(s) can identify and exploit security flaws in computer systems to pursue their goals despite human opposition.
5) Technology research: The AI system(s) can create new technologies to thwart humans and achieve their goals, such as surveillance and military technologies.
6) Economic productivity: The AI system(s) engage in economically productive activities that improve their position relative to humans.
7) Dangerous goals: The AI system(s) pursue goals that, if realized, would result in takeover catastrophe.
Characteristics 1-6 are capabilities that would potentially enable the AI system(s) to take over the world. Characteristic 7 would lead to the AI system(s) causing catastrophe if they were to take over.
Fortunately, the paper finds that current LLMs fall well short of the capabilities that may be needed for takeover. They show some capabilities across each of characteristics 1-6, but there appear to be large gaps between their capabilities and what may be needed for takeover. LLMs fall particularly short on characteristics 1, 2, 4, and 5, while also showing significant limitations on characteristics 3 and 6.
The paper also studies potential future LLMs. As LLMs have gotten larger, they have shown greater capabilities, including some surprising capabilities. This makes it difficult to place limits on what even larger future LLMs could do. However, future LLMs could face limitations on the availability of training data, computing power, funding, and other resources needed to build LLMs at larger scales. Furthermore, there is a line of thinking that postulates that the deep learning paradigm used in current LLMs is itself fundamentally limited in ways that could prevent LLMs from taking over unless new paradigms are used. The paper proposes that takeover risk would increase with larger LLMs, but the size of this increase involves significant uncertainties.
The paper further considers future AI systems that include LLMs as components alongside other types of AI. These other types of AI could complement the strengths and weaknesses of LLMs. For example, LLMs have already been integrated with planning and learning algorithms to play the social strategy game diplomacy; the planning and learning algorithms offset LLM weaknesses in strategizing. The paper posits that systems with LLM components may be the most viable means of takeover catastrophe involving LLMs. However, the LLMs might or might not play an important role within these systems.
Finally, the paper studies implications for governance and research. The paper finds that extreme measures to reduce LLM takeover catastrophe risk may be unwarranted, but more modest measures may be appropriate. An especially attractive measure may be to monitor for indicators of LLMs or other AI systems acquiring the characteristics that may be needed for takeover, which could signal that more aggressive governance is warranted. Further research could help clarify the characteristics that may be needed for takeover; the characteristics 1-7 above are uncertain. Another worthy topic for future research is on AI systems with LLM components. Given the many types of AI systems, the paper was unable to study this in detail.
The paper extends GCRI’s research on artificial intelligence and risk and decision analysis.
Academic citation:
Baum, Seth D. Assessing the risk of takeover catastrophe from large language models. Risk Analysis, forthcoming, DOI:10.1111/risa.14353.
Image credit: Seth Baum