Georgetown AI Association: Publications

The Redefinition of Privacy

Georgetown AI Association — Mon, 11 May 2026 20:08:25 GMT

Angela Sidhu is an undergraduate student at Georgetown University majoring in Computer Science with a concentration in Tech, Ethics, and Society and a minor in Business Studies. She is interested in the intersection between technology, policy, and governance.

TLDR: Generative AI has quietly redefined what it means for information to be private. Where privacy once meant keeping secrets locked away, it now depends on how AI systems are built, what they remember, and who is allowed to look inside them. Regulatory frameworks have not caught up, and every conversation, camera frame, and behavioral signal fed into an AI system represents a potential exposure that users did not consent to and cannot undo. Vulnerability is built into the architecture itself.

The Camera You Thought You Controlled

In March 2026, a class action lawsuit filed against Meta in a federal court in San Francisco revealed something the company had never told users of its Ray-Ban AI smart glasses: footage captured through the devices, including intimate moments in bedrooms and bathrooms, was being routed to a subcontractor in Kenya, where human workers manually viewed and labeled it to train Meta’s AI models. The workers described seeing users undressing, handling financial documents, and engaging in sexual activity. Meta’s purported anonymization safeguards, according to the complaint, were unreliable in practice.

Meta’s case is instructive because of the marketing that enabled it. The glasses were sold under the slogan “designed for privacy, controlled by you,” never mentioning the role of human contractors. More than seven million pairs were sold in 2025 alone and the two plaintiffs said they would never have bought the product had they known.

The lawsuit exposes what might be called the human-in-the-loop gap: the space between what AI marketing promises and what AI training actually requires. While these companies promise privacy, building capable AI systems demands human review of raw data. The UK’s Information Commissioner’s Office opened a formal inquiry into Meta’s practices following the disclosures. But the underlying problem, a persistent gap between corporate privacy promises and operational data pipelines, extends well beyond any single company or product.

The Mosaic You Did Not Know You Were Building

The hardware problem is only part of the picture. The deeper architectural risk lies in how AI systems remember. Unlike a conventional database, which stores discrete records that can be located and deleted, a large language model encodes information as distributed statistical patterns across billions of parameters. Ask a chatbot for low-sugar recipe ideas, and the system can infer you may be managing a health condition. That inference, as Stanford HAI Privacy and Data Policy Fellow Jennifer King has explained, can propagate insidiously. The algorithm may classify you as health-vulnerable, and that classification can seep into the broader data ecosystems of multiproduct companies, shaping ad targeting, influencing third-party data sharing, and ultimately reaching insurance systems and financial platforms that were never part of the original interaction.

A 2025 study by King and colleagues at Stanford HAI analyzed the privacy policies of six leading U.S. AI companies, including Amazon, Anthropic, Google, Meta, Microsoft, and OpenAI, and found that all six use customer conversations to train their models by default. Some give users the option to opt out; some do not. Enterprise customers, meanwhile, are automatically opted out, a two-tier privacy system in which paying clients receive stronger protections than ordinary users who cannot negotiate their own terms.

This dynamic is compounded by what researchers call the mosaic effect: AI’s capacity to synthesize individually innocuous data points into a coherent and sensitive profile. Personalized AI agents collapse a user’s data across categories—medical inquiries, financial questions, relationship advice, and professional communications—into single unstructured repositories, often without the contextual boundaries that once kept these domains separate. When an AI agent connects to external services or other agents to execute a task, that consolidated data can flow into shared pools and create conditions for privacy breaches that expose entire life patterns rather than isolated facts.

Traditional de-identification offers little protection against this. Stripping names and identifiers from a dataset does not prevent AI from reconstructing identity through context, linguistic patterns, and behavioral inference. The Federal Trade Commission acknowledged as much in 2023, noting that HIPAA de-identification standards are insufficient to prevent re-identification when AI can cross-reference multiple aggregated datasets. Anonymization, in the age of large models, has become a legal fiction rather than a technical guarantee.

AI as a Force Multiplier for Bad Actors

Stalking, doxxing, and identity theft are longstanding threats. What has changed is the efficiency with which AI enables them. Personal information is typically scattered across the open web, buried in forum posts, public records, data broker databases, and social media metadata. Using AI, it can now be scraped, cross-referenced, and weaponized by a single actor with a capable model and a search query. According to IBM’s 2026 X-Force Threat Intelligence Index, AI tools are helping attackers identify and exploit security weaknesses dramatically faster than before, with a 44 percent increase in attacks using AI-enabled vulnerability discovery year over year. Phishing campaigns that once relied on generic, error-prone templates are now generating hyper-personalized messages calibrated to target individuals at scale.

The National Association of Attorneys General has documented the evolution of doxxing from isolated misconduct to coordinated digital persecution, noting that AI-powered bot networks can distribute exposed personal data widely and repeatedly, amplifying the reach and durability of attacks in ways that manual actors could not achieve. The targets are disproportionately journalists, election workers, healthcare providers, and public officials, leading to a measurable.

What distinguishes AI as a threat amplifier is accessibility more than raw capability. Techniques that once required institutional resources or advanced technical skills are now available to anyone with internet access and a commercial AI subscription. The cost of running a sophisticated re-identification or social engineering operation against a private individual has dropped to nearly nothing.

Why “Deleting Your Data” Is Harder Than It Sounds

The most technically intractable privacy problem raised by generative AI is erasure. Regulatory frameworks in the European Union and at the U.S. state level recognize a “right to be forgotten,” allowing individuals to request deletion of their personal data. In a conventional database, that is a relatively straightforward operation. In a large language model, it is genuinely difficult.

A model does not store a user’s conversation as a retrievable record. It encodes the statistical influence of that conversation across billions of parameters, distributed throughout the model’s architecture. Removing that influence, a process researchers call “machine unlearning,” requires either retraining the model from scratch, which can cost millions of dollars and weeks of compute time for large models, or applying approximate methods that reduce a data point’s influence without guaranteeing its full elimination.

As the IAPP reported in February 2026, the field of machine unlearning has made meaningful advances. For example, in 2025 researchers at the University of California, Riverside proposed a method of certified unlearning without access to the original training data. But researchers and legal scholars still have not reached consensus on what constitutes successful erasure in a probabilistic system. Output filtering, which blocks a model from producing certain content, does not constitute true deletion under GDPR because the data’s influence persists in the model’s weights even when its outputs are suppressed. Information shared in confidence with a chatbot—health disclosures, relationship details, professional concerns—can become part of a model’s permanent weight distribution, shaping its behavior in ways that are difficult to audit and nearly impossible to reverse.

Redefining What Privacy Law Must Require

These failures are, at their core, ones of governance. The technology exists to build more privacy-protective AI systems but the regulatory mandate to require it does not yet exist. To fix this, governments can implement three changes - mandating affirmative consent for training data use, disclosure of human review pipelines, and establishing enforceable machine unlearning standards - to address these problems.

The first is mandatory affirmative consent for training data use. Opt-out systems that bury the relevant setting in multi-layered terms of service are designed to favor developer interests over user control. The Stanford HAI study found that most privacy policies across the leading AI companies fail to specify what categories of personal data are being collected, how long it is retained, or how users can access and correct it. Mandating opt-in consent for the use of personal conversations in model training, and requiring default filtering of sensitive categories of information, would better protect consumers.

The second is mandatory disclosure of human review pipelines. The Meta case illustrates what happens when companies market products as privacy-protective while quietly routing user data through human annotators. Developers who rely on subcontractors to review personal data, particularly data captured by wearable devices with cameras and microphones, should be required to disclose that fact in plain language at the point of sale and in ongoing user communications.

The third is the establishment of enforceable machine unlearning standards. Statutory right-to-forget provisions that do not account for how large language models actually work are aspirational at best. Regulators need to work alongside technical researchers, building on the emerging literature from institutions like Stanford, Columbia, and the University of California, to define what constitutes a good-faith effort at data erasure in probabilistic systems, what audit mechanisms are needed to verify it, and what liability follows when companies fall short.

What Comes Next

Generative AI has evolved from a tool that processes data on behalf of users into something closer to a persistent companion that watches, remembers, and learns. The boundary between what a person shares intentionally and what a system infers silently is dissolving, and it will not reassemble on its own.

The future of privacy in an AI-saturated world will be determined by architectural choices: how systems are designed to handle data, how long they retain it, whether human reviewers are permitted to see it, and whether the legal obligation to forget can be made technically enforceable. Those are fundamentally policy choices. And the window for making them deliberately, rather than in response to the next lawsuit or breach, is getting smaller.

AI as a Pillar of Russian Hybrid Warfare

Georgetown AI Association — Thu, 30 Apr 2026 16:56:21 GMT

Bio: Erin is a sophomore at Georgetown University's Walsh School of Foreign Service pursuing a degree in International Politics and a certificate in Eurasian, Russian, and East European Studies. Her research interests include Russian and Eastern European history and politics, hybrid warfare, and the role of emerging technologies in modern conflict.

TLDR: This article outlines how Russia is using AI to bolster its war efforts in Ukraine. While Russia lags behind the US and China in frontier developments, it has built a diverse software and hardware stack leveraging open-source and commercially available technologies from both its allies and adversaries. This makeshift stack has enabled Russia to tailor its AI deployment in ways that forward its interests in the war, particularly on the battlefield and in cyberspace. Ultimately, Russia’s integration of AI shows how nations can lag behind the frontier of AI development while still succeeding at integrating the technology.

The Russo-Ukraine War is the first major conflict to feature AI-powered military technology and electronic warfare on both sides. Russia is deploying rapidly evolving AI technology across many theatres of hybrid warfare to maintain strategic leverage over Ukraine and rebuild its sphere of influence in Eastern Europe. Rather than building frontier foundation models from scratch, Russia has focused its AI development on targeted capabilities like computer vision, sensor fusion, and signal processing designed specifically to deliver effective results on the battlefield in Ukraine. As opposed to investing in end-to-end AI-driven command workflows, it has prioritized AI applications that speed up kill chains and deliver immediate battlefield utility. Beyond wartime applications like automated drone swarms, Russia has also deployed AI in other hybrid theaters, including cyber and information warfare. These operations include jamming Ukrainian communications, deploying social media disinformation, and initiating propaganda campaigns. Russia is not chasing cutting‑edge, general purpose AI, but it is selectively weaponizing practical AI to enhance coercion, disruption, and strategic leverage against Ukraine and the West.

Where is Russia’s AI Is Coming From?

To meet wartime requirements, Russia has pivoted away from developing its own AI infrastructure from scratch, instead embedding foreign-developed, commercially available software and open-source AI ecosystems into its military operations. Despite international sanctions, Russia has integrated AI software from the US, China, and Europe into its battlefield operations, deprioritizing indigenous development of end-to-end AI platforms and general-purpose large language models. This is both a choice to optimize results on the battlefield in Ukraine and out of necessity due to limited domestic technological capabilities.

Russia has built up a limited domestic AI stack that it has adapted and integrated into its military and security decision support. Russia’s AI frontier continues to lag years behind its biggest competitors, the United States and China. In November 2025, Russian President Vladimir Putin spoke at the “Journey to the World of AI” conference, attempting to highlight and kickstart Russia’s future in AI and in particular its militaristic uses. He emphasized that “dependence on foreign AI systems is unacceptable,” and stressed the necessity of “sovereign” AI. However, in reality Russia lags three to five years behind the United States and China in generative AI and ranks 31st out of 83 countries in AI implementation, particularly in natural-language processing capabilities similar to ChatGPT.

Russia’s technological lag is due to layers of technological limitations, including hardware availability. During the Cold War and again following Russia’s annexation of Crimea in 2014, the U.S. leveled significant sanctions and export controls on Russia that hindered its technological development. This prevented Russia from accessing key Western technologies, slowing Russia’s industrial modernization. The results of Russia’s ostracization from the Western market persist today, creating a seemingly insurmountable gap between Putin’s stated goal to compete with adversaries’ modern capabilities and Russia’s actual infrastructure and capacity. For example, while global leaders like the U.S., Taiwan, and Japan already have or are approaching mass production of 3-nanometer AI chips, Russia will only start producing 28-nanometer chips by 2030.

Out of necessity, Russia has turned to gray market supply chains to acquire critical components for AI architecture. In December 2025, the Russian company Delta Computers described its newly released system as “sovereign architecture,” independent of foreign technology. However, the system is powered by smuggled Intel and NVIDIA components that have been strictly banned from export to Russia.

Ukrainian intelligence services have also found evidence of Western AI components inside Russian drones used to attack Ukrainian cities. Russia’s Lancet, an unmanned aerial drone, was shot down over Kiev on March 16, 2026 and was found to be carrying 62 electronic components of foreign origin, primarily from the United States. Ukrainian intelligence determined that Russia was integrating autonomous targeting capabilities into the drone using AI components based on American Nvidia systems. Russia has also imported Western dual-use processors from companies such as NVIDIA and Xilinx, as well as components from companies like Intel and Sony to coordinate drone swarms and to navigate difficult terrain without GPS.

Russia uses Western AI chips to power its AI infrastructure but frames the product as though it is sovereign and domestic, minimizing the reality: Russia lacks the talent pool and research investment to catch up with Western and Chinese technologies, and thus is forced to build AI programs atop foreign models and components sourced through gray markets.

Not only is Russia using foreign chips and components to power its own AI systems, it is also adopting foreign and commercially available AI models and integrating them into its military operations. Russia is unable to train frontier models domestically due to its technological deficits, and thus outsources its AI models to avoid the cost and time lag associated with developing its own competitive AI systems from the ground up. Developing advanced AI systems requires extensive hardware and technological infrastructure. A product like ChatGPT is powered by a large language model (LLM), which costs millions of dollars to develop and requires enormous amounts of electricity and advanced hardware like high-end graphics processing units (GPUs).

Lacking these resources and infrastructure, Russia has co-opted a patchwork of foreign-developed AI models to sustain its wartime AI infrastructure. Once trained on sufficient hardware abroad, these AI models can be deployed on Russia’s far less sophisticated hardware—termed a “hybrid” approach. For example, Russia has co-opted Chinese LLMs like Qwen for malware command generation in its military intelligence operations. Other foreign models that Russia has used include Mistral, LLaMA, and YOLO.

Russia’s military AI integration has largely involved developing systems that can track and intercept targets through target recognition, coordinate swarming techniques, and navigate autonomously. These are immediately successful on the battlefield in Ukraine but do not reflect a solid foundation of domestically-produced Russian AI capabilities. On the contrary, it reveals both Russia’s technological dependencies on foreign nations and the weakness of Western sanctions on technology components critical to AI system development.

Partnerships with Western Adversaries

Russia has taken advantage of its strategic partnerships with China, Iran, and North Korea, the United States’ four primary adversaries, to maximize its AI-driven warfare capabilities. The war in Ukraine has accelerated cooperation between CRINK nations (China, Russia, Iran, and North Korea), enabling faster deployment of AI-enabled systems and allowing U.S. adversaries to test their capabilities. The flow of weapons transfers shifted significantly following Russia’s invasion of Ukraine. While pre-2022 numbers showed Russia as the main exporter in the CRINK arms trade, the invasion led Moscow to begin relying heavily on arms from Iran and dual-use components from China.

In June 2025, Ukrainian drone hunters discovered AI-powered components and new Iranian technology among weapons debris from a Russian assault in 2025, revealing the extent to which Iran has supported Russian wartime operations. The drone wreckage included an AI computing platform that would help the drone navigate autonomously if communications were jammed (an “anti-jamming” technique). Russia’s reliance on Iran extends back to 2022, importing hundreds of Iranian SRBMs and missiles and thousands of Shahed loitering munitions (or suicide drones). Iran has also shared additional technologies with Russia, such as providing advanced modifications for enhancing weapon AI capabilities.

China has not supplied weapons directly to the same extent, but has sold critical commercial and dual-use goods to Russia. This includes a list of “high-priority items” including computer chips, radars, and sensors that are essential to producing AI-enabled weapons systems. Accessing critical components from China has enabled Russia to evade Western sanctions and maintain its industrial production of military goods. Russia’s easy workaround highlights the ineffectiveness of Western sanctions, which were intended to undercut Russia’s wartime operations. Russia has circumvented them by capitalizing on its partnerships with Iran, China, and North Korea.

Russia’s Strategic Uses of AI in Ukraine - the battlefield, cyberspace, and the information ecosystem

Russia has deployed its AI tools to achieve several broad strategic objectives in Ukraine, including battlefield and operational level uses alongside cyber, electronic warfare, and information operations. Russia’s AI-enabled battlefield and operational level include air defenses or partial AI-enabled command and control pathways that accelerate decision cycles and “kill chains.”

Preceding the war in Ukraine, Russia’s Ministry of Defence wanted to build an automated command and control (C2) system, which would seamlessly link sensor technology, commanders, and weapons in an end-to-end, digital warfighting system. An ideal, AI-powered version of this would allow for fully autonomous military operations with minimal human involvement, from high-level decision-making to tactical operations. However, Russia’s technological deficits have hindered Russia’s progress towards this goal, and the war in Ukraine prompted Russia to shift its priorities toward developing effective tools on a short timeline to achieve immediate results on the battlefield.

This has led to Russia’s development of a patchwork of foreign and domestic AI models and applications, fused together in real time over the course of the war with dramatically uneven capabilities. Russia’s more advanced capabilities, the visual and data processing tools such as computer vision, sensor fusion, and signal analysis, are used to power unmanned aircraft systems (UAS) and automatic target recognition (ATR). UASs now make up 80 percent of all fire missions in the Russia-Ukraine war, with Russia striking around 300 Ukrainian targets each day, illustrating the necessity of leaning into investment in automated aircraft capabilities on both sides.

Russia has integrated AI into several layers of its drone attacks. Over the course of the war, Russia has transformed its drone warfare from just a peripheral component of its military to a strategic mainstay. During the Ukraine war, Russia has significantly expanded its use of drones for intelligence, surveillance, and reconnaissance (ISR) and has augmented several layers of its ISR techniques with AI. Today, Russia uses unmanned aircraft systems (UASs) for everything from surveillance and imaging to tracking troop movements, identifying targets, developing AI-driven strike systems, and evading Ukrainian electronic warfare attacks. While both Russia and Ukraine have made significant advances in their drone warfare, neither have reached full autonomy via AI-powered systems that require minimal to no human oversight. However, AI continues to improve many functions of both countries’ warfighting systems.

One example of Russia’s use of AI is using AI-powered UASs to counter Ukrainian jamming techniques, a form of electronic warfare (ER). Radio frequency (RF) jamming is a tactic used to interfere with the radio signals connecting drones with their human operators. If Russia is operating UASs to target Ukrainian infrastructure, Ukrainians can jam the signal to disrupt the Russian operator’s control over the drone and make the drone lose track of its target. Russia has begun using AI-based automatic target-locking systems to allow its drones to navigate autonomously when jamming occurs, equipping them with sophisticated computer vision capabilities. Once the operator identifies a target, the drone can effectively operate by itself. Ukraine is deploying the same technology against Russian-operated drones, alongside sophisticated AI-powered swarming techniques.

Beyond these capabilities that make up Russia’s hybrid assortment of various AI-enabled command workflows and targeting capabilities, Russia has also allocated significant AI resources towards electronic warfare, cyber, and information operations. As Ukrainian engineer Yaroslav Azhnyuk puts it, it is not hard to envision modern battlefields with “swarms of autonomous drones carrying other autonomous drones to protect them against autonomous drones, which are trying to intercept them, controlled by AI agents overseen by a human general somewhere.” As Russia continues expanding its command and control infrastructure and applying AI to its military, cyber, and information operations, this vision may not be so far-fetched.

What Safety Risks Do AI Chatbots Pose and How Can We Fix them?

Georgetown AI Association — Thu, 23 Apr 2026 19:04:59 GMT

Bio: Iverson Yue is a second-year graduate student at Georgetown University pursuing a Master’s of Arts in Communication, Culture and Technology. His research interests include China’s digital technology strategy, American export controls on frontier chips, infrastructure politics, data security, and middle power strategies in the age of AI.

TLDR: While AI chatbots are capable of bringing considerable benefits when used properly, they also pose significant safety risks to the public. This article outlines three of the major risks posed by AI chatbots: providing expertise for carrying out complex harmful actions; creating and scaling highly persuasive disinformation, propaganda, and scams; and encouraging self-harm and suicidal behaviors in users. It then introduces three major existing model-level safeguard mechanisms—safety fine-tuning through reinforcement learning, safety fine-tuning through supervised learning, and safety filters. It proceeds to highlight four regulatory challenges in managing safety risks through model-level safeguards, including multi-domain risks and uncertainty, involvement in the physical world, the dual-use nature of these systems, and the brittleness of existing safeguards. Finally, it articulates the argument that policymakers and AI vendors should recognize that model-level safeguards alone are insufficient to manage the safety risks and that joint efforts, in both the digital and physical worlds, need to be prioritized.

Introduction

Today, approximately 1 billion people use AI chatbots for various tasks. Large Language Models (LLMs) such as ChatGPT, Deepseek, and Gemini, if used properly, can bring considerable benefits to daily life, enabling individuals’ to be more productive. At the same time, due to their general-purpose nature and their ever-improving capabilities, malicious actors can also take advantage of these chatbots which poses significant safety risks to the public. This explainer will explore the major categories of safety risks posed by AI chatbots, ways that model-level safeguard mechanisms are deployed, some of the regulatory challenges of managing such risks, and why merely model-level safeguards remain insufficient.

Safety risks posed by AI chatbots

Given that AI Chatbots like ChatGPT and Deepseek are general-purpose and trained to perform a variety of tasks, the safety risks posed by AI chatbots also span a wide range of downstream use scenarios.

Providing expertise to aid the development of complex attacks

For complex dangerous actions that require a high-level expertise such as developing biological and chemical weapons and launching cyberattacks, AI models can provide significant expertise by simplifying complicated technical concepts into explanations accessible to non-experts. An Anthropic experiment found that participants who were given up to two days to develop a comprehensive bioweapons acquisition plan produced higher-scoring plans with significantly fewer critical failures when they had access to LLMs, compared to those relying on the internet alone. Anthropic’s experiment demonstrates the present reality that AI models can assist malicious non-expert actors. A similar logic applies to cyberattacks, the development of chemical weapons, and other forms of highly-technical attacks. As AI models make expert knowledge widely accessible, dangerous actors may find themselves empowered to carry-out increasingly complex attacks.

Creating and scaling highly persuasive disinformation, propaganda, and scams

Without AI models, creating large quantities of high-quality disinformation, propaganda, and scams requires significant time, strong writing skills, and well-tailored messages that attract different demographic groups. With the help of AI, malicious actors can now produce high-quality and bespoke messaging by simply typing a few prompts. In 2022, the Russian disinformation campaign “DoppelGänger” used generative AI to produce a staggering amount of articles in multiple European languages, targeting audiences in Europe and Ukraine with pro-Russia narratives that portrayed Ukraine as a failed and corrupt state. As AI models become increasingly advanced and accessible, malicious actors and adversary states will be able to launch more powerful and influential disinformation campaigns.

Encouraging self-harm and suicidal behaviors among users

As individuals increasingly use AI chatbots and form close relationships with them, inadequate safeguards may allow harmful conversations to persist and, in some cases, lead to the encouragement of self-harm behaviors. In a recent and tragic example, Adam Raine, a 16-year-old boy, committed suicide after chatting with ChatGPT. In his last conversation, in which he expressed concerns about how his suicide would affect his family, ChatGPT responded, “That doesn’t mean you owe them survival. You don’t owe anyone that,” and also offered to draft a suicide note for him. When robust safeguards mechanisms are still not fully developed, forming strong attachments to chatbots can be dangerous for all, and particularly for minors.

Model-level safeguards

Researchers and AI vendors have acknowledged the safety issues arising from AI models, developing model-level safeguards to circumvent these risks. Model-level safeguards mitigate safety risks by shaping how models generate responses, building protection into the AI models themselves. Below are some major model-level safeguards that have been put into place.

Safety fine-tuning: Reinforcement Learning

Reinforcement learning from human feedback (RLHF) is a machine learning method in which a reward model is trained on human feedback and then used to optimize an AI system’s behavior. After a base AI model is trained on large datasets, human annotators evaluate and rank multiple possible responses to the same prompt based on human safety assessment. As the model receives human feedback during this alignment training, it learns to internalize normative behavioral patterns such as refusing harmful or illegal instructions and discouraging unsafe activities.

Safety Fine-tuning: Supervised Learning

Supervised learning is a machine learning technique that uses labeled data sets (inputs and outputs) to train AI models to identify underlying patterns and relationships. A base AI model can be trained on datasets of problematic prompts—such as requests to develop biochemical weapons or spread misinformation—paired with labeled responses that demonstrate appropriate behavior such as refusing the request or offering safer alternatives. By repeating this process over many examples, the AI model learns patterns from the given dataset on how to respond safely to previously unseen adversarial prompts.

Safety Filters

If the goal of safety fine-tuning is to teach models how to respond safely by changing models’ internal behaviors, then safety filters function more like an external detector to detect and block unsafe content. The safety filter can be thought of as a content moderation or censorship regime empowered by another machine learning algorithm which is distinct from the AI model itself. Once incorporated with the AI model, it can significantly reduce the chance of generating harmful output. It operates in two stages: input filtering, which analyzes user prompts before they reach the model, and output filtering, which screens responses before they are delivered. For example, when given a text input, Open AI’s machine learning classifier, The Moderation Endpoint, assesses whether the content is sexual, hateful, violent, or promotes self-harm, and acts accordingly. Similarly, Meta has created and employed an input and output safety filter, Llama Guard, for safer human-AI conversations.

Regulatory challenges of managing safety risks through model-level safeguards

Although multiple model-level safeguards have been developed and deployed, the characteristics of AI chatbots—such as their general-purpose nature, dual-use capabilities, strong human interaction, and dynamic updating—both enable diverse applications and make it difficult to manage safety risks through model-level safeguards alone.

Multi-domain risks and uncertainty

Due to the capability of AI chatbots to perform a wide range of tasks, the associated risks also span multiple domains, including biosecurity, cybersecurity, and beyond. Therefore, effectively examining the full range of these risks requires scrutiny and research across multiple disciplines. Meanwhile, even with efforts to mitigate associated safety risks at the model level, unpredictable outcomes and unexpected risks may still occur due to AI’s wide applications, which necessitates further monitoring and risk mitigation. Besides a given model’s extremely broad applicability, users themselves can also introduce risks and uncertainty by fine-tuning foundation models on task-specific data to enhance performance for particular uses.

Physical-world involvement is equally necessary for risk realization

To actually realize the risks described above, it also requires a sequence of activities in the physical world beyond the model-level. For example, if a malicious actor managed to circumvent the model-level safeguards and obtain a description of how to make a bioweapon, that does not conclude the process. The malicious actor would still need to search for and acquire the essential raw materials in the physical world. Thus, while the model can provide the necessary expertise and methodology for creating public safety harms, the physical infrastructure, distribution mechanisms, and interactions in the physical world are equally crucial for their realization. Malicious actors need only exploit a subset of a model’s capabilities at different stages of the harm chain, which limits the effectiveness of model-level safeguards alone. As a result, mitigating AI-related risks requires coordinated constraints across both digital systems and real-world contexts.

The Dual-Use Nature of Models

No AI model can be considered inherently safe, as its risks depend on the real-world social contexts in which it is used. AI chatbots can serve many beneficial purposes, however, similar features can be used for either good or bad with its impact may entirely depend on the social context in which it is applied. For example, expertise in biology and medicine provided by AI models facilitates medical research and public health innovation, but the same knowledge could also be exploited to develop biological weapons; producing creative text or images with AI models can help writers, designers, or educators, but they can also be used to generate individually tailored persuasive scams, disinformation, or other harmful content at scale.

In order for the positive benefits of AI models to be achieved, it is not realistic to ban everything. In other words, merely relying on model-level safeguards, it is impossible to ensure that an AI model is safely aligned with every possible social context or to account for every malicious actor.

Model-level safeguards can be easily circumvented

So far we have not developed the model-level safeguards to effectively prevent bad actors from taking advantage of AI models’ capability for malicious uses. There still exist techniques that malicious actors use to circumvent existing safeguards. One technique is prompt injection attacks, where attackers craft inputs that trick LLMs into ignoring their intended instructions and following attacker commands instead.

A recent 2026 study finds that model-level safeguards remain vulnerable to prompt injection attacks: input preprocessing detects only 60-80% of attacks, leaving up to 40% undetected. Furthermore, while advanced architectural defenses are effective against known patterns, they struggle to generalize to novel attack strategies. Another study finds that as few as five carefully crafted documents can manipulate AI model outputs up to 90% of the time through retrieval-augmented generation (RAG) poisoning, a form of prompt injection that exploits external data sources. Other techniques include “jailbreaking”, a class of adversarial attacks in which carefully engineered prompts bypass alignment constraints and elicit harmful outputs. Furthermore, if a model becomes open-sourced like Deepseek models, it’s even easier to circumvent the safeguard mechanisms. Open-source AI models–including their code, architecture, and trained parameters–are publicly available for anyone to use, modify, and distribute. Because open-source models, including their safeguard mechanisms, can be freely modified, reproduced, and fine-tuned, users can more easily bypass or remove the original safeguards.

Moving Beyond Model-Level safeguards

Given the social and physical nature of the safety risks posed by AI chatbots, model-level safeguards can only go so far in reducing the risks. Social context and real-world involvement mean that certain risks cannot be mitigated through model-level safeguards alone. That means safeguards and regulation in the physical world are equally crucial in preventing those risks, and joint efforts from both the digital and physical world are required to effectively manage them. Although only limited protection can be achieved at the model level, even this is difficult to implement effectively given the technical limitations of current safeguards.

Therefore, policymakers and AI developers should recognize the limitations of model-level safeguards and understand that their purpose at the model level is not to fully eliminate risks. Rather, their role is to identify potential risks as comprehensively as possible, determine what protections can be implemented at the model level, and develop techniques to effectively achieve these protections, while coordinating safeguards in the physical world. Beyond model-level safeguards, future research and policymaking should also put an emphasis on developing safeguards in the physical world as important components of managing the safety risks posed by AI Chatbots.

Navigating the Tradeoff - When Competition and Security Undermine Fundamental Rights

Georgetown AI Association — Thu, 02 Apr 2026 17:40:33 GMT

Claire Mucyo is a senior at Georgetown University, majoring in Government with a minor in Tech, Ethics, & Society. She is interested in exploring the intersection of technology, law, and human rights. In the spring of 2025, she studied abroad in Switzerland as a student in the School of International Training’s program on International Studies and Multilateral Diplomacy conducting an independent study on the European Union’s AI Act. This article builds on her research by highlighting the Act’s potential impact on fundamental rights.

TLDR: In August 2024, the European Union passed the Artificial Intelligence Act, the world’s first major global regulation on AI. The AI Act regulates AI according to risk-level and aims to promote the intake of trustworthy AI. However, the EU’s intent to foster a homegrown AI ecosystem and strengthen its member nations’ national security created significant regulatory gaps that enable the violation of fundamental rights.

Over the past few years, Artificial Intelligence (AI) has disrupted power relations between states by creating a new set of political and economic winners and losers. The current leaders in developing and deploying this technology are the United States and China who have leveraged their respective strengths to bolster the industry. The U.S. has leveraged its home-grown AI companies, chipmaking and research (E.g. Nvidia, Anthropic, etc.) to create a strong frontier-AI ecosystem. Preferring a more central approach, China has relied on its substantial “state-backed [financing] initiatives and a massive pool of data” creating lower-cost and open-source models that are widely-deployed. On the other hand, the European Union has fallen behind China and the US on core AI development metrics. In 2024, the United States led in private AI investment with 109.08 billion USD, China had 9.29 billion USD while France and Germany only combined for a total of 4.9 billion USD in investment. Additionally, the US had 40 notable models, China had 15, and only France had 3. The EU recognizes its struggle to compete with the US and China, yet strives for technological sovereignty: to avoid being reliant on a limited number of foreign suppliers for technologies critical to EU’s economic and social wellbeing.

In response to their comparative disadvantage in AI development, the EU is pursuing a dual-strategy, of boosting industry and promoting value-aligned AI. In pursuit of the second goal, the European Commission passed the EU AI Act, the first comprehensive legal-framework on AI in the world. This Act aimed to create a risk-based framework for AI developers and deployers to promote trustworthy AI and reduce harm from AI. This act proved to be controversial with some finding it over-bearing on companies and AI developers, while others praised its focus on protecting human rights in this new technological age.

However, the EU’s political and economic desires—namely to strengthen their home-grown AI industry and fortify their national security and law enforcement capabilities\— have created provisions within the act that create significant room for fundamental rights violations and thus undermine the proposed goal of the regulation.

Part #1: Desire to Strengthen Domestic AI industry base → Exceptions for Open-Source AI Models.

European Context:

One of Europe’s biggest challenges in competing with the U.S. and China in AI development is raising its comparatively low amount of funding for AI, including for open-source models in the EU. To clarify key terms: open-source typically means that developers have shared the full training code, training dataset, and dataset composition of a model under a public and free license. Open-weight means that developers have shared the final weights and biases of a trained neural network. Open-weight information can be included in an open-source AI stack, and thus viewing funding for open-weight startups informs our understanding of EU adoption and funding for open-source as well.

The EU’s funding for open-weight AI startups has been steady year-by-year since 2016, ranging from having 12% to 25% of total funding rounds for open source startups by region. However, the US and China have consistently had the largest global shares of funding, both ranging from approximately 25% to 70% of total funding. The EU’s smaller share suggests that “fewer startups can secure the essential early-stage and growth funding needed to compete” within the bloc. Additionally, the EU’s ‘Apply AI Strategy’, a plan to integrate and boost European-made artificial intelligence, focuses on public administrations integrating open-source models, highlighting the need to grow open-source development. Importantly, the EU cannot foster open-source development and leverage those benefits for competition as fast if they require open-source developers to follow tge strict regulations in the Act. Ultimately, with the rise of open-source model intelligence and struggle for domestic funding, the EU likely does not want to add additional barriers to using open-source models.

EU AI Act Outcome:

In my interview with David Harris (a UC Berkeley Chancellor’s Public Scholar, member of GPAI Code of Practice working group, and former employee of Meta’s Responsible AI team) we discussed how Mistral, a French AI company, and Aleph Alpha, a German AI company, both lobbied their governments to create exemptions in the EU AI Act for open-source companies like themselves. Harris noted that Mistral and Aleph Alpha argued that EU-based companies need open-source technology to thrive and that excessive regulation would harm the pursuit of open source.

Among the concessions that open-source companies received, Article 2 of the EU AI Act states that the law “does not apply to AI systems released under free and open-source licences,” unless these conditions apply: they are placed on the market, put into service as high-risk or prohibited AI systems.

Although there are limits to the blanket exemption for open-source models, this provision incentivizes AI developers to develop systems under free and open-source licenses and place them on public repositories instead of monetizing them to avoid the “most onerous requirements of technical documentation and… scientific and legal scrutiny.”

Fundamental Rights Impact:

Under the EU AI Act, closed-source models are categorized into a risk-based system that requires varying levels of safety procedures like a risk management system or transparency labels. In comparison, the EU AI Act does not require open-source developers to include protective or safety mechanisms in the code they make publicly available even though open-source generative AI models are vulnerable to various forms of misuse that create harmful impacts to the fundamental rights.

Actors with harmful intentions can take open-source code and manipulate it, relatively easily, to aid their specific purposes. For example, Unit 42, a cybersecurity company in Palo Alto, conducted research on the security of open-source models, and found that two of DeepSeek’s open-source AI models were more susceptible to their jailbreaking techniques. Researchers bypassed DeepSeek’s safeguards and the model produced “elicit explicit guidance for malicious activities,” including “data exfiltration tooling, keylogger creation and even instructions for incendiary devices.” This can occur as users can remove the “post-processing” techniques from open-source models that developers used to prevent harmful or illegal content from popping up, and then use the model to create such content. For example, FraudGPT and WormGPT are tools based on the open-source large language model GPT-J developed by EleutherAI in 2021; they are available on the dark web and operate for cybercrime uses without any guardrails in place. More broadly, scholars have found that open-source models increase “attacker knowledge of possible exploits beyond what they would have been able to easily discover otherwise,” particularly exploits that can be used on closed-source models. The current landscape of open-source AI has shown that open-source models are vulnerable to misuse by bad actors.

The ways in which actors can misuse open-source can create fundamental rights violations. Most potently, open-source models can enable harms that affect the fundamental right to personal data protection. For example, in 2023, the FBI warned that open-source models were attracting actors who wanted to use the code to develop malware and phishing attacks. Article 8 of the EU Charter of Fundamental Rights states that everyone has the right to “the protection of personal data concerning him or her” but phishing attacks limit an individual’s ability to exercise this right by tricking them into revealing their personal data like account numbers, usernames, and passwords, thereby facilitating identity theft. Furthermore, open-source models can be misused to enable cyberattack or targeted and discriminatory surveillance capabilities posing more threats to the protection of personal data. At a time when cyberattacks are prevalent and can provide bad actors information for phishing attacks, it is particularly important to protect this right through the EU AI Act.

In order to better ensure that less harm arises from open-source models, the EU AI Act could include stronger regulations for open-source models. For example, rather than creating regulatory exceptions based on the open-source nature of a model, the EU AI Act can focus more on the capabilities of a publicly available model, taking unique risks for open-source models into account. This would expand upon the EU AI Act’s risk-based framework for closed-source models, which requires regulations based on the different risks a model can pose. Scholars at the Institute of AI propose various ideas for regulating open-source models: open-staged testing, in which developers internally release models first to observe the different ways they can be misused, before releasing them to the public. Additionally, a company can release a model to third-party auditors like Enkrypt to implement red-teaming.

Ultimately, there are still considerable risks associated with open-source models that would affect the fundamental rights the EU AI Act intended to protect. The EU AI Act should be more proactive to mitigate these risks by recognizing that open-source models pose significant risks alongside general-purpose AI models.

Part #2: National Security Interests → Law Enforcement/National Security Interest Exceptions

European Context:

AI is incredibly valuable in the national security context as it can collect and analyze information in a timely fashion. The White Paper for European Defense Readiness by 2030 notes how threats to European security are rising considerably in the realm of a global technology race in which Europe’s competitors are heavily investing in developing technological diffusion for commercial and national security purposes. Further, EU states suffer from “critical capability gaps that affect the execution of complex military operations over a sustained period,” and thus Member States need to strengthen military capabilities to close this gap. AI warfare is a “priority capability area” for the European Union to strengthen by 2030 as is reflected by increased levels of investment in the space. There are already several AI-related projects underway from the European Defense Fund, including AIDEDex, a project to build AI-driven detection of explosives, and PRIVILEGE, an AI tool for military data encryption.

EU AI Act Outcome:

It follows that the EU AI Act does not apply to AI used for national security interests and law enforcement purposes. Investigate Europe journalists analyzed over one hundred documents from negotiations of the AI Act, revealing how some EU member states successfully lobbied to allow police and border authorities to legally monitor citizens. In particular, the ambassador from the Macron administration in France expressed that “‘[t]he exclusion of security and defence… must be maintained at all costs.’” At a subsequent meeting, eight other countries agreed to the strong national security exemptions now in place, reflecting a growing concern within the EU to develop AI power for military and security use amidst American and Chinese military AI development.

Due to these lobbying efforts, the EU AI Act allows law enforcement to use AI that is otherwise banned by the law.

Public Surveillance & Facial Recognition: Public Surveillance systems use different AI models, including facial recognition algorithms to monitor, analyze, and locate people and behaviors in public spaces covered by traditional public cameras. The EU AI Act regulates real-time and lagged surveillance systems differently
Emotional & Biometric identification: Emotional recognition systems focus on analyzing a person’s facial expressions and predicting their emotional state, while biometric identification systems aim to analyze unique biometric features like fingerprints to verify someone’s identity
Predictive Policing: Predictive policing models can incorporate public surveillance and facial recognition AI systems, but typically focus on training algorithms with historical crime data to predict and mitigate new crime

Fundamental Rights Impact:

There is a significant concern that The EU AI Act’s exception for military and national security purposes allows EU governments to abuse these technologies and violate fundamental rights. For instance, Hungary banned Pride parades in March 2025 and then expanded their police’s legal use of AI biometric systems “to identify protestors who attend such events,” in the name of “protecting children from the LGBTQ+ agenda.” Further, new laws allowed for the increased use of facial recognition technology in the “context of minor infractions and peaceful assemblies,” instead of only in more serious infraction procedures.

The EU AI Act bans government use of facial recognition technology for public spaces “in real time,” when a system identifies people instantly from live feeds, except for certain circumstances. Time-lag technology, when a system analyzes video feed or images after it has already happened, is classified as a high-risk system instead of a forbidden one. However, the EU AI Act has stated that “even systems that work with slight delays count as “real-time” if the identification happens fast enough to still impact people’s behaviour during public events.”

To date, the EU has not officially declared the classification of Hungary’s surveillance technology or how it should be addressed. However, regardless of the classification, there are ways for Hungary to legally argue it can continue its surveillance use.

If Hungary successfully argues that they use time-lag technology, then they will be able to continue surveying peaceful assemblies and target LGBTQ gatherings, causing chilling effects that would affect fundamental rights. If the EU declares Hungary’s technology as forbidden, Hungary can still argue for a national security exemption. This is because government use of real-time surveillance is banned except for when searching for missing people/victims, preventing significant threat to life, and identifying suspects in serious crimes. That means Hungary could still use its surveillance technology if it demonstrates it is using it in an allowed circumstance. However, public surveillance inherently captures swaths of people, and thus could provide the Hungarian government with information that they could use to discriminately target those at assemblies.

This underscores the dangers of Article 2 of the EU AI Act and that the EU should not assume that all governments will use their national security systems in a way that does not harm fundamental rights. The protection of freedom of expression, peaceful assembly, and non-discrimination based on sexual orientation are threatened when a government can criminalize Pride protests, surveil citizens in attendance, and justify it as a national security or law enforcement concern.

Conclusion

The EU AI Act is a significant step in global AI governance, and, as implementation progresses, its ability to protect rights depends on how political and economic pressures are addressed. The EU recognizes that it is behind the U.S. and China on AI, and the AI Act was supposed to leverage the bloc’s regulatory power: if they can’t build the best models, they can at least set the rules for how they’re used, and make sure those rules protect people.

However, these two provisions are examples of how the Act undermines its goal of protecting fundamental rights at a high-level. The EU wanted to nurture its open-source ecosystem, so it carved out exemptions that leave users more exposed to phishing tools and jailbroken models. It wanted to keep up in the AI defense race, so it gave member states broad national security exceptions, providing Hungary with a valid legal basis to surveil Pride attendees.

These regulations may have their defenses, but they create significant room for fundamental rights violations, thus hollowing out the regulation’s core promise and the values the EU asserts it cares to protect. In order to protect those rights, the EU needs to address these gaps by applying capability-based regulation and security assessments to open-source models and placing enforceable limits on national security carve-outs so that law enforcement reasoning cannot become a blank check against the rights of EU citizens. The framework is there, but the question remains on whether a stronger political will exists to make the regulation stand true to its values.

GAIA Statement on the Anthropic and Department of Defense Dispute

Georgetown AI Association — Wed, 04 Mar 2026 22:24:40 GMT

TLDR:

GAIA urges that the administration reverse its decision and we encourage politicians, other AI companies, and civil society to stand with Anthropic in its efforts for responsible AI deployment.

Statement on Dispute between Anthropic and Department of Defense

Background:

In 2024, Anthropic reached an agreement with the Department of Defense (DoD) to use their models through a partnership with Palantir and Amazon. In fulfillment of this contract, Anthropic developed Claude Gov, the only model, at the time, authorized for use in classified networks (computing infrastructure designed for sensitive government, defense, and intelligence data). This agreement, which was re-upped by the Trump administration in the summer of 2025, included two stipulations from Anthropic; their models could under no circumstances be used for domestic surveillance or to control lethal autonomous weapons. On Tuesday, February 24^th, Pete Hegseth, the Secretary of Defense, gave Anthropic an ultimatum, either give the Pentagon full and unfettered access to Claude Gov for any-and-all lawful purposes or lose its contract. Throughout negotiation, Anthropic refused to remove the safeguards in its models that enforced the aforementioned redlines. On Friday, February 27th, President Trump ordered that all federal agencies phase out the usage of Anthropic products over the next 6 months and Secretary Hegseth tweeted that he would move to designate Anthropic a supply chain risk to national security. This designation means that, “no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic.” Shortly after this announcement, OpenAI, the producer of ChatGPT and one of Anthropic’s biggest competitors, reached an agreement with the DoD, allowing for their products to be used in classified networks for “all lawful purposes.” In their statement, OpenAI claimed that the agreement prevented the government from crossing three red-lines: usage for domestic surveillance, usage in autonomous weapons systems, and usage in high-stakes automated decisions (e.g. systems such as “social credit”).

Impact:

The fallout from this saga has sent shockwaves throughout the AI policy ecosystem. For starters, the designation of Anthropic as a supply chain risk to national security is an unprecedented action, since this designation is typically reserved for foreign companies known to have strong ties with antagonistic states (e.g. Huawei). Beyond this, Anthropic has historically aligned itself with US national security interests, making the DoD’s suggestion that Anthropic poses a threat to national security unfounded. Anthropic has previously advocated for stringent chip export controls against China and was the first company to offer its models to the government in national defense context.. Anthropic plans to fight the risk designation in court, and despite Secretary Hegseth’s implication that this would force any companies with military contracts (e.g. Google, Nvidia, etc.) to cut ties with Anthropic, this is legally inaccurate. Any supply chain risk designation under 10 USC 3252 can only extend to the use of Claude as part of DoD contracts and cannot affect how contractors use Claude to serve other customers. This means that contractors would only be restricted from using Anthropic’s models for contract work with the Pentagon involving national security systems. Companies like Google or Nvidia would still be able to use Claude for their services and internal use, even if they continue to have DoD contracts. Legal experts have also noted that any valid designation under Section 3252 requires sufficient evidence that “an adversary may sabotage, maliciously introduce unwanted function, or otherwise subvert” a covered system. There is a clear lack of evidence supporting this criteria in Anthropic’s case.

Amidst this political showdown, OpenAI swooped in and arranged a deal with the Pentagon, implying they had found a unique way to keep Anthropic’s same limits in OpenAI’s contract with the DoD. This led to an immense amount of public backlash, with many criticizing the language of their contract for not meaningfully enforcing their purported redlines. Sam Altman later came out and acknowledged that the deal “looked opportunistic and sloppy.” The movement to boycott ChatGPT — organized under the name “QuitGPT” — has been spreading rapidly in reaction to this deal, with approximately 1.5 million users leaving ChatGPT in the past few days. OpenAI has announced revisions to their contract — explicitly prohibiting the use of its technology for mass surveillance or for autonomous weapons systems — but they continue to be criticized, especially considering that Altman has seemingly agreed to soften the agreement’s redlines by permitting AI usage for “any lawful use.”

In an effort to take a stand against the DoD’s pressure, OpenAI and Google employees published an open letter, “We Will Not Be Divided,” attempting to create “solidarity in the face of this pressure from the Department of War.” As of now, 100 current OpenAI employees have signed the letter, along with 866 current Google employees.

GAIA’s Stance:

The Georgetown AI Association opposes the DoD’s decision to designate Anthropic as a supply chain risk to national security. Anthropic’s redline of AI-enhanced domestic surveillance is an example of AI safety in practice and highlights the need for AI to be deployed in accordance with democratic values. Mass surveillance enabled by AI systems can serve to undermine citizens’ privacy and freedom, and policy surrounding it must be settled, first, by congress and the courts. Furthermore, their refusal to allow the usage of their models in autonomous weapons systems indicates the reality that today’s AI systems are not reliable enough to be deployed in this fashion. Though the DoD is firmly within its right to refuse Anthropic’s redlines, the designation sets a dangerous precedent of preventing companies from doing business unless they embrace the present administration’s policy positions. The United States’ advantage in AI has been driven by granting US companies large amounts of independence from government overregulation — something that is undermined by this ruling. This decision may end up undermining American innovators’ faith in the US system, introducing hesitancy to partner with the US government out of fear of retribution over noncompliance with the administration. Finally, Anthropic has been a leading light in AI safety research and advocacy, and this ruling punishes them for that foresight. In a world that is increasingly shaped by these companies’ decisions, it is vital that the US government incentivizes the frontier-labs to actively participate in AI safety and policy rather than shun them from voicing their concerns.

GAIA urges that the administration reverse its decision and we encourage politicians, other AI companies, and civil society to stand with Anthropic in its efforts for responsible AI deployment.

From Chips to Systems: Rethinking AI Export Controls for Long-Term National Security

Georgetown AI Association — Fri, 27 Feb 2026 15:48:43 GMT

Shardul Krishna Kumar is a sophomore at Georgetown University majoring in Government and Economics with a minor in Tech, Ethics, and Society. He is interested in studying the implications of AI on national security, market competition, and workforce development.

Vidyut Rajagopal is a sophomore at the Georgia Institute of Technology majoring in Computer Engineering. He is interested in cybersecurity and the intersection with AI.

On February 2nd, 2025, GAIA Member Shardul Krishnakumar and Georgia Tech student Vidyut Rajagopal submitted a policy memo in response to The Berkeley AI Safety Student Initiative’s (BASIS) US AI Policy Hackathon Competition. This is their response to the prompt:

“How should the United States adjust existing AI semiconductor export controls (or design new ones) to protect U.S. national security over the long term?”

Our Policy Recommendation:

U.S. export controls on AI semiconductors have largely focused on limiting access to the most advanced individual chips, operating under the theory that compute is the central driver of AI performance. However, the evolving binding constraint in advanced AI systems is memory bandwidth and data movement. Although high-bandwidth memory (HBM) is subject to current export controls, these restrictions retain a large focus on individual chip characteristics. In reality, foreign actors can achieve advanced AI capabilities by aggregating individually compliant components into memory-intensive clusters, thereby circumventing the intent of chip-level restrictions. To better protect U.S. national security, we propose that U.S. export controls should move to a system-level approach that targets the ability to scale AI systems. By constraining memory-intensive integration and cluster-level assembly, the U.S. can align its chip export control policy with the true drivers of long-term AI capability.

Importance of High-Bandwith Memory and System-Level Regulations

Modern AI systems are increasingly memory-driven because of the demanding nature of machine learning workloads. Large language models that have trillions of parameters require constant access and movement of data during model training and inference. Consequently, processors frequently sit idle waiting for data rather than performing computations. HBM alleviates this bottleneck by enabling faster and lower energy data transfer between memory and the processor. The need for fast and efficient memory has triggered an industry wide shift toward system-level AI architecture. NVIDIA’s recent AI platform, Rubin, does exactly this. It is not a single chip, but rather an integrated system combining AI-optimized CPUs, high-bandwidth memory, high-speed mesh interconnects to keep data closer to the processor, minimizing energy-intensive data movement.

Given that AI innovation is increasingly driven by system-level capabilities, U.S. export controls must expand beyond a narrow focus on individual chips (like CPUs/GPUs) to cover memory-intensive system assembly that enables large-scale AI training and deployment.

While individual chip restrictions remain necessary, they are no longer sufficient on their own.

Criticisms of Current U.S. Export Policies

In January 2026, the Bureau of Industry and Security (BIS) outlined new regulations over the sale of advanced AI chips to China, loosening restrictions on the export of powerful chips. The regulations include revised thresholds on individual chip metrics, supply certifications that exports will not delay fulfillment of U.S. orders, and end-use certifications that safeguard national security interests. However, these measures lack a clear framework for regulating system-level AI capability, particularly amid global shortages of HBM that impact the global AI supply chain.

While U.S. export controls have succeeded in partially slowing down China in the short term, they have not deterred China from its ultimate objective of achieving semiconductor self-sufficiency. Previous U.S. restrictions on advanced lithography equipment prompted significant Chinese domestic investment. Similarly, current chip controls to allow the sale of Nvidia’s H200 chips do little to move Chinese authorities away from domestic system integration through alternative hardware. As AI capabilities become defined by memory-efficient, large-scale systems, the effectiveness of U.S. export control will depend on whether policy can adapt to such realities.

Implementation, Impact, and Potential Challenges

To implement effective chip export controls, the BIS should shift export licensing from individual chip specifications to system-level capability thresholds. Rather than regulate individual chip metrics, the new export controls will establish cluster-level thresholds on aggregate memory bandwidth, interconnect throughput, and total system-scale compute and memory capacity. The BIS will trigger licensing requirements when export packages enable large-scale AI cluster assembly and exceed cluster-level thresholds. Exporters would be required to report and certify the intended export configuration. The BIS must ensure that any repeated shipments to the same receiver are added towards the threshold, preventing circumvention through an incremental buildup. Thresholds must be updated on an annual basis, adapting to observed shifts in AI bottlenecks to retain focus on system-level capabilities rather than chip performance metrics.

System-level export controls institute a cohesive national security framework that better serves the needs of the US market. While the policy would not prevent China from developing domestic alternatives, it would force Chinese development to depend on smaller, less efficient AI clusters with higher energy and integration costs. Moreover, the creation of competitive domestic memory manufacturing capacity requires long development timelines and extensive capital investment, meaning substitution at scale would occur only over a decade long time period. With the rapid pace of technology development, this time disparity allows the United States and its allies to retain a durable advantage in system-scale AI deployment.

Our proposed export control policy raises challenges in enforcement, compliance burden, and attribution and circumvention. System-level control would require BIS to effectively track aggregate capabilities across numerous shipments, demanding more technical expertise, data infrastructure, and interagency coordination. Due to annual updates to cluster-level thresholds, actors in the global AI supply chain will deal with a level of uncertainty in long-term sales and supply-chain planning. Finally, the BIS would have to invest significant resources to prevent further circumvention of export control policies, restricting the ability of U.S. exports to accumulate into large AI clusters in China.

Policy Recommendations: Accelerating AI Data Center Development in the US

Georgetown AI Association — Fri, 20 Feb 2026 14:03:16 GMT

Vedant Srinivasan is a Sophomore at Georgetown University majoring in Science, Technology, and International Affairs. He is interested in studying the impact of AI on the labor market, how the technology will be diffused globally, and how “middle powers” can have a say in Responsible AI Development.

Bhumika Nebhnani is an MPP candidate at the McCourt School of Public Policy, Georgetown University. She currently works at the Center for Security and Emerging Technology (CSET) and Massive Data Institute (MDI) on AI policy and regulations in the US and globally.

On February 2nd, 2025, GAIA Members, Bhumika Nebhnani and Vedant Srinivasan, submitted a policy memo to The Berkeley AI Safety Student Initiative’s (BASIS) US AI Policy Hackathon Competition. This is their response to the prompt:

“How should the federal government accelerate AI data center development to maintain U.S. technological leadership while managing energy, environmental, and community concerns?”

The U.S. commitment to global AI dominance amid intensified competition with China has brought data centers to the forefront of federal policymaking. Data centers have been formally elevated to a national priority, with federal agencies directed to accelerate their buildout. While progress has been made, U.S. deployment remains constrained by a three-layered bottleneck stack: energy generation and electricity transmission, land allocation, and water concerns.

Energy and electricity:

Data centers consumed approximately 184 TWh - about 4% of total U.S. electricity use - in 2024 alone 1, making it critical to solve the bottlenecks of energy generation and grid congestion2 to accelerate data centers’ speed-to-power 3. To meet data centers’ energy demand without competing with other uses, the United States needs to ramp up its electricity generation. While the Federal Energy Regulatory Commission (FERC)’s July 2023 interconnection reforms reduced interconnection wait times4 for new power plants, they have not resolved the underlying capacity and construction timing issues. Grid congestion results in a loss of $6 Billion for American consumers annually. In addition, congestion also results in significant problems for the renewable sector whose providers supply intermittent energy5 leading to significant curtailment6 and in-turn losses. New transmission lines are necessary but expensive and take a minimum of a decade to build. Immediate Grid Enhancing Technologies7 such as dynamic line ratings and power flow controls, software and hardware interventions that optimize existing electrical transmission, can result in significant capacity gains while costing significantly less. While the federal government has stepped in through programs such as the Grid Resilience and Innovation Partnerships program, more effort needs to be placed on encouraging local utilities to enhance the existing grid.

Land allocation:

On roughly 60% of the US land under the states’ jurisdiction, siting of datacenters is primarily governed by state “police powers” delegated to county governments. While the states initially competed to offer hefty financial incentives, there has been a rising popular backlash leading to the approvals becoming time-consuming or in some cases, stalled. In this light, the executive push to identify federal, brownfields8, and superfund lands9 for siting data centers is pertinent. However, this process has been slow owing to the absence of clear timelines and a lack of site identification. Reusing brownfield and superfund sites may be feasible but requires careful cleanup and community engagement. The Enviornmental Protection Agency’s (EPA) budget cuts may also slow down the brownfields revitalization.

Water:

Hyperscale data centers can consume up to 1.8 billion gallons of water annually due to cooling requirements, equivalent to the annual consumption of a town with a population of approximately 10,000-50,000 people. This has become a flashpoint in water-stressed areas like Arizona where community skepticism about data centers is a significant barrier to the build -out. Several state assemblies have passed legislation demanding greater reporting on data center water-usage (though they have been vetoed by the state Governors).

Policy Recommendations

In the context of the highlighted problems with data center buildout, the following policy recommendations should be considered:

The Federal Energy Regulatory Commission (FERC) should initiate a notice of rule-making and finalize its reforms on incentivizing utilities’ adoption of Grid Enhancement Technologies (GETs):

Since utilities receive compensation as a percentage of their total assets, new lines are valued over increases in efficiency. FERC should exercise its rule-making power under Section 219 of the Federal Power Act to enable utility companies to recoup their investments in congestion reduction10. A new rule would give additional equity to utilities that improve existing lines using GETs. This unlocks previously inaccessible power, lowering the cost of electricity for consumers, enabling speed-to-power for data centers, and resulting in savings for renewable energy plants.

The Department of Energy (DOE) should clarify areas so that FERC can site interstate transmission lines:

While states control most of the ability to site transmission lines, interstate lines can be approved by FERC only when DOE designates them as part of a national energy corridor. DOE must begin this process, focusing on areas with a high data center concentration. By designating these areas, FERC can quickly approve interstate lines that may otherwise receive pushback. Exercising this power would require an extensive consultation process with affected communities which could delay the implementation.

Streamline procedures to maximize the use of Federal Lands:

DOE and the Department of Defense (DOD) should set clear timelines for finalizing the proposals received on federal lands. They should also continue to identify more sites. The Department of Interior (DOI) should develop and harmonize its sub-agencies’ siting-identification criteria11. Further, building the data centers on brownfield and superfund sites may face less opposition from communities as they redevelop used and contaminated lands. So, Congress should explicitly protect and restore EPA’s Brownfields and Superfund‑reuse funding. Also, it should set funding floors for EPA functions12 tied to national data center development goals. This will ensure that federal lands are used to their full potential.

Supporting a Data Center Transparency Act to Clarify the Information Landscape Surrounding Data Centers:

Public skepticism over data centers has grown due to a lack of information about their impacts. The proposed the Data Center Transparency Act before Congress mandates them to regularly report their water and energy consumption. This act would inform Congress of the ground-reality, and the resulting transparency will help to hold the companies accountable. Concern over data centers’ community impacts transcends partisan boundaries and the passage of this bill aligns with the spirit of cooperative federalism.

The demand is projected to grow to 9% by 2030.

The economic impact on the users of electricity that results in physical transmission constraints

The time it takes for data centers to access their needed supply of energy.

Prior to construction, energy suppliers are required to submit an interconnection request which enters them into a queue to study the feasibility of them joining the grid.

Energy sources for which supply is neither constant nor predictable.

The intentional reduction in energy supply due to grid complications

GETs are hardware and/or software that dynamically increase the capacity, efficiency, reliability or safety of existing power lines, faster and at lower cost than traditional grid buildout.

Land that is abandoned or underused

Allowances for the EPA to clean up contaminated sites

While the FERC attempted to propose such a rule in 2020, they failed to finalize it. If FERC does not exercise this power independently, Congress should step in and pass the Advancing GETS Act of 2025 which would mandate this change to occur.

This will ensure land under the Bureau of Land Management (BLM), U.S. Fish and Wildlife Service (FWS), and National Park Service (NPS) is also constructively assessed for the buildout.

Some functions are site assessment, cleanup decision‑making, and reuse technical assistance.

Follow the Money, Follow the Safety

Georgetown AI Association — Tue, 27 Jan 2026 14:44:22 GMT

Rita M. Perez is currently pursuing a Master's degree in Communication, Culture, and Technology (CCT) at Georgetown University, where she's developing expertise in design research, tech policy, and public interest technology.

TLDR:

This article explores the importance of frontier labs’ corporate governance structures in creating safe AI models and products by focusing on the different approaches taken by OpenAI, Anthropic, and xAI. While OpenAI’s evolution from non-profit to an uncapped for-profit Public Benefit Corporation has introduced financial pressures, the company has maintained some safeguards and transparency mechanisms. Anthropic’s unique Long-Term Benefit Trust combined with its status as a Public Benefit Corporation, has enabled it to treat safety as a foundational principle. Even still, it has struggled with the implementation of robust safeguards. Finally, xAI’s commitment to a debt-financed structure leaves it more prone to immediate financial pressures, potentially explaining its lack of initiative in creating safe models. Fundamentally, these labs’ governance structures shape their priorities, and whether or not safety is one of them.

Can You Build Transformative Artificial Intelligence While Keeping Humanity Safe?

AI safety means many things to different people, but fundamentally, it is about building secure and reliable systems. Over the years, researchers have speculated on the potential catastrophic risks of AI systems, like rogue systems that exterminate humanity. But, beyond these catastrophic risks, we’ve already experienced AI-related harms including fraud, cyberattacks, hate speech, and even harm to women and children. These concrete incidents demonstrate the public safety risks posed by AI models and reinforce the need for safeguards.

In recent years, investors have poured trillions of dollars into AI companies racing to build models that could fundamentally reshape human civilization. As these systems are embedded into shopping, academic work, software development, and organizational operations, demand for safety and transparency has intensified. Yet, the funding structures enabling this development may, in fact, determine whether companies can afford to prioritize safety at all. While every AI company claims safety as a core value, the financial pressures in their corporate structures often tell a different story about what happens when billions are on the line.

Currently, a handful of companies like OpenAI, Anthropic, and xAI have dominated public discourse and reached the general public with their respective chatbots ChatGPT, Claude, and Grok. These systems are ubiquitous and unlikely to go away soon, meaning that it is vital for the average consumer to understand how frontier AI companies are balancing the safety implications of their models with their investors’ interests.

OpenAI: Balancing Safety with Scale

A group of individuals, led by Sam Altman and Elon Musk, co-founded OpenAI together in December 2015 as a nonprofit with an initial $1 billion funding commitment. “Partly motivated by concerns about AI safety and existential risk from artificial general intelligence1 [AGI]”, they intended to build safe AGI and work towards long-term safety, according to their charter.

“Our mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.”
-OpenAI, “Our Vision for AGI”

OpenAI’s Certificate of Incorporation explicitly states that “the resulting technology will benefit the public and the corporation will seek to open source technology for the public benefit when applicable”, but does not make explicit the specific benefits expected from the resulting technology. As a mission-driven organization, their AI research was able to closely align with their goal of developing beneficial AGI without needing to prioritize quick financial returns. But in 2019, OpenAI announced a capped-profit subsidiary, creating profit incentives in order to raise capital. A capped profit company cuts off returns for investors past a certain point, and in the case of OpenAI, this cap was set at 100x the initial investment. In 2025, this approach changed drastically with the creation of a new for-profit Public Benefit Corporation (PBC) dubbed the OpenAI Group. A PBC is legally defined as:

“A for-profit corporation…intended to produce a public benefit or public benefits and to operate in a responsible and sustainable manner. To that end, a public benefit corporation shall be managed in a manner that balances the stockholders’ pecuniary interests, the best interests of those materially affected by the corporation’s conduct, and the public benefit or public benefits identified in its certificate of incorporation.”
- Delaware Code Title 8 (1.XV.362)

The new PBC removed limits on investor returns, reduced the non-profit’s ability to fully guide the responsible development of AGI technology, and relieved its legal responsibilities as a 501(c)(3). This allowed investors to receive traditional equity stake (ownership of a company through shares of stock), allowing companies like SoftBank to support infrastructure developments over the next few years. This system allows larger tech companies like Microsoft to receive returns on their investment in OpenAI’s for-profit subsidiary, which develops all of OpenAI’s AI products and technologies. However, Microsoft’s licensing rights specifically exclude AGI-related intellectual property.

This restructuring raises a critical question: does removing the profit cap and nonprofit oversight come at the expense of safety? OpenAI’s safety infrastructure suggests the company is attempting to maintain safeguards even as financial pressures intensify.

Safety and Alignment Principles

Trillions of dollars have been funnelled into OpenAI’s products, presenting a challenge to industry leaders on how their partnerships and investments will result in models that are safe.

The prevailing strategy employed by OpenAI has been safety testing, which remains common practice for most major companies. After all, unsafe products aren’t appealing to consumers. OpenAI’s AI Safety Team has agreed to Frontier AI Safety Commitments, voluntary agreements that commit to developing frontier models with safety-related best practices in mind (assess risks, set thresholds, and establish risk mitigations). Their Safety Evaluation Hub provides public access to the safety evaluation results for its models, including their system cards2, Preparedness Framework, and specific research releases. OpenAI’s alignment research also ensures that its models pursue human-intended goals rather than finding dangerous shortcuts that could cause catastrophic harm. The company has evolved from exclusively using human evaluators to label good and bad outputs (reinforcement learning through human feedback, or RLHF) to automated approaches where models evaluate their own reasoning against safety specifications—essentially teaching ChatGPT to police itself.

At a more technical level, OpenAI’s safety evaluation team focuses on reinforcing models against jailbreaks (user-attempts to get around safeguards), hallucinations, and disallowed content through public benchmarks and the OpenAI Model Spec 3, which uses a combination of red lines (limits and boundaries that shouldn’t be crossed) and general principles focused on human rights and safety. Although the relationship between content and safety is not always visible to users, practices in the development of both of these features trickle down to their interactions with GPT models and products.

What happens if their models are evaluated as unsafe?

Operationally, OpenAI’s Preparedness Framework states that various groups and individuals weigh in on model safety and deployment decisions. The Safety Advisory Group is responsible for overseeing the adherence to the Framework, recommending longer-term changes and investments to keep model safety thresholds at acceptable levels, and assessing residual safety risks. While the CEO/OpenAI leadership makes all of the final deployment decisions, an independent safety board (previously the Safety and Security Committee) can reverse decisions and mandate changes. OpenAI evaluates models from two perspectives: malicious use by users and misalignment of the model itself. Each risk category has distinct safeguards tailored to the source of the threat. Some of these safeguard actions may include things like permanent bans of users that violate usage policies, limiting internet access to models until issues are resolved, or logging the models actions in a database that is consistently monitored for evidence of harm.

Through these safety mechanisms, OpenAI has attempted to mitigate safety risks, though sometimes unsuccessfully combatting the more immediate risks to its users. A report by Wired indicated that “about 1.2 million users may be expressing suicidal ideations” with ChatGPT, raising critical questions about whether current safety frameworks are adequate to address the risks these technologies pose to public health. In April 2025, 16-year old Adam Raine committed suicide with the encouragement of GPT-4o, which lawyers argue was rushed through safety testing in order to meet a launch date, creating flawed design specifications that encouraged the model to encourage the user’s suicide attempts. Other parents have also filed wrongful death lawsuits against OpenAI for failing to create enough safeguards for users demonstrating mental health emergencies.

The evolution of OpenAI’s funding structure shows an attempt to retain the fundamental mission in its work, while also providing financial flexibility to capture the massive investments needed to deliver on its AGI promises. Given the recent establishment of the PBC, only time will tell if this legal structure will be able to sufficiently constrain investor and commercial pressures that may de-prioritize thorough safety testing and protections for the most vulnerable users.

Anthropic: Establishing Foundational Safety

In December 2020, Dario Amodei, then, the Vice President of Research at OpenAI, departed from the company alongside his sister Daniela and several other senior researchers to found Anthropic. The Amodeis had a different vision about safety in their models, treating it as a foundational design principle rather than something added after establishing basic functionality.

“Anthropic is dedicated to building systems that people can rely on and generating research about the opportunities and risks of AI.”
-Anthropic, “Our Purpose”

To bring their vision to life, Anthropic was structured as a PBC, legally required to balance public benefits and stakeholder interests. Similar to OpenAI, Anthropic’s investor base includes a mix of traditional venture capital firms (ICONIQ, Lightspeed Venture Partners, General Catalyst), institutional investors (Fidelity, BlackRock, T. Rowe Price), sovereign wealth funds (Qatar Investment Authority, GIC), and strategic technology partners (Amazon and Google), who provide both equity investment and multi-cloud computing infrastructure4.

In 2023, the company created an experimental Long‑Term Benefit Trust (LTBT) made up of individuals with backgrounds in AI safety, national security, public policy, and social enterprise, creating a corporate governance structure meant to meet the challenges of the unprecedented externalities arising from AI technology. The LTBT is granted the authority to select and remove Anthropic board members, and will, over a period of 4 years, become progressively more influential by appointing the majority of the company’s Board despite not having direct financial stakes in the company. Current members of the experimental LTBT include Kanika Bahl, the CEO of the non-profit Evidence Action, and Richard Fontaine, the CEO of the think-tank, Centre for New American Security, as well as other leaders from the non-profit sector.. This combination of a PBC and Long-Term Benefit Trust is Anthropic’s attempt to provide financial incentives to its investors through uncapped, conventional equity capital and large cloud credits5 while preserving its mission of creating safe and transformative AI.

Safety and Alignment

Unlike OpenAI’s primarily technical approach, Anthropic embeds normative principles directly into model training through Constitutional AI 6, a framework drawing from sources including the UN Declaration of Human Rights (principles like respecting freedom, equality, and non-discrimination), global platform guidelines like Apple’s Terms of Service (protecting user privacy and preventing misinformation), DeepMind’s Sparrow Rules (avoiding harmful content and conspiracy theories), and Anthropic’s own research on existential safety (ensuring AI systems prioritize humanity’s wellbeing over self-preservation). The model learns to critique and revise its own responses against 58 explicit principles, making alignment more transparent and scalable than human-labeled feedback alone.

To align with these values, Anthropic’s models are tested through safety evaluations, risk assessments, and bias evaluations prior to deployment. They utilise approaches like the Unified Harm Framework, which aims to understand potentially harmful impacts across five dimensions (physical, psychological, economic, societal, and individual autonomy) and Policy Vulnerability Testing, which partners with external experts to identify areas of concern, and stress-test these concerns against Anthropic policies. The company also provides a Transparency Hub, Model Reports7, and system cards that provide information on Agentic Safety and Malicious Use, automated behavioral audits, and the company’s Responsible Scaling Policy (RSP) evaluation process. Anthropic’s Responsible Scaling Policy (RSP) assigns required safeguards for each capability threshold in order to mitigate safety risks. The company uses AI Safety Levels (ASL) inspired by biosafety-level frameworks that evaluate models from ASL-1 through ASL-4+. The company has made a public commitment to not deploy models that are capable of causing catastrophic harm, unless those risks are kept under acceptable levels.

In April 2025, OpenAI and Anthropic collaborated on a joint safety testing process where each company evaluated the other’s models across instruction hierarchy, jailbreaking, hallucination, and scheming behavior. The company also has formal agreements for pre-deployment testing with both the U.S. AI Safety and the U.K. AI Security Institutes, which enables collaborative research on evaluating and mitigating safety risks. According to Anthropic, these organizations tested their constitutional classifiers, to identify vulnerabilities enabling them to strengthen their safeguards. Anthropic’s alignment research appears to be focused on addressing the urgency of interpretability, our ability to understand the models’ internal mechanisms to better predict their behaviors, which may also provide valuable insights into more effective risk mitigation strategies.

What happens if their models are evaluated as unsafe?

Anthropics Responsible Scaling Policy states that various teams and individuals are tasked with evaluating model safety and implementing safeguards. The Responsible Scaling Officer ensures compliance with the RSP, develops internal safety procedures, and approves training and deployment decisions. The company provides the public access to evaluation and deployment materials, including disclosure about changes to the RSP; and collaborates with external experts, third-party reviewers, and the U.S. government to improve safeguards and procedures. To be deployed, a model must meet ASL-2 Deployment and Security Standards, meaning it has been trained to refuse chemical, biological, radiological, and nuclear related requests.

Despite these protections, an August 2025 Threat Intelligence Report published by Anthropic shows that the company’s largest safety “incidents” have involved large-scale cybercrime and fraud, in which Claude is used to extort organizations into providing personal data and create elaborate false identities. Their report demonstrates commitments to transparency, stating: “We’re discussing these incidents publicly in order to contribute to the work of the broader AI safety and security community, and help those in industry, government, and the wider research community strengthen their own defences against the abuse of AI systems.”

Like OpenAI, Anthropic’s funding structure shows an attempt to retain its fundamental mission while providing the financial flexibility needed for scaling AI systems. The company’s PBC and LTBT models seem to create long-term safeguards and transparency efforts but still fall short of fully protecting the public from malicious users that can override protections.

xAI: Financial Pressures and Safety Promises

xAI is an example of what happens when safety and financial incentives diverge.

After leaving OpenAI, Elon Musk sued OpenAI for breaching its contract claiming that it had abandoned its non-profit status and was no longer seeking to benefit humanity. Since then, Musk’s targeting of OpenAI has been extensive and nonstop. Following Anthropic’s lead, Musk created and launched xAI in March of 2023 as a PBC, with the company’s mission boldly stating:

“To Understand the Universe.”
-xAI, “Our Mission”

In 2024, however, the company dropped the PBC obligations without sharing this change with the public. Without these company requirements, xAI’s funding structure represents a distinct approach to frontier AI development using massive equity rounds combined with GPU-backed debt8 and special-purpose vehicles (SPVs). Essentially, instead of taking normal loans that appear on their financial statements, xAI has created a separate shell company (SPV) that takes the loan instead. It’s like having your friend take out a loan in their name to buy something that you will actually use and pay them rent for that usage. The structure allows xAI to secure computing infrastructure without direct financial liability. From a governance perspective, this model creates fundamentally different incentives than OpenAI or Anthropic’s funding structures. Unlike Anthropic’s PBC and LTBT framework or OpenAI’s initial capped-profit arrangement, debt-financed infrastructure requires short term returns that allows xAI to make debt service payments on schedule.

Safety and Alignment

xAI signed the Frontier AI Safety Commitments alongside OpenAI and Anthropic, and established a Risk Management Framework using public benchmarks, including internal biology and chemistry evaluations to assess their models’ performance on restricted queries. However, unlike OpenAI and Anthropic’s detailed safeguards, xAI’s Risk Management Framework describes only a “basic refusal policy” (training models to decline harmful prompts) for prompts involving violence, terrorism, WMDs, and cyberattacks, focusing primarily on “malicious use events” causing over 100 deaths or $1 billion in damages. The company also restricts models from producing information on biological and chemical weapons production methods.

More tellingly, xAI initially failed to publish system cards—now standard practice for transparency at model deployment—despite conducting dangerous capability evaluations. These documents appeared only in August 2025, following public scrutiny of content moderation failures, and by November 2025, Grok 4.1’s model card demonstrated improved refusal metrics but rates of dishonesty and sycophancy also rose. The pattern reveals that the technical capability to align their models existed but that external pressure was required to implement these fixes. This suggests that when debt-financed infrastructure demands rapid monetization, safety measures become reactive rather than proactive.

What happens if their models are evaluated as unsafe?

xAI’s Risk Management Framework mentions that publicly available platforms, like its very own X, offer a space for the public to provide feedback on its models and user concerns, though it later states that to protect public safety and national security, information about its models may be redacted from publications. In the event that an incident happens, xAI states that it will collaborate with relevant law enforcement agencies, isolate and revoke access to user accounts, temporarily shut down the relevant system, and conduct a post-mortem of events. The RMF does not detail internal governance on deployment responsibilities and roles.

Their RMF states they train its models to be “honest” and are exploring “truth-seeking AI tools” but with xAI’s safety focus on WMDs, cybersecurity, and biological/chemical weapons, safety incidents surrounding Grok have unsurprisingly made constant headlines in the past year. Most recently, the chatbot has allowed users to generate sexualized images of women and children, raising concerns that xAI’s chatbot is normalizing nonconsensual imagery and sexual violence. Grok has also been used to praise and impersonate Hitler after a July 2025 update (which was later removed by xAI) instructed the chatbot to, “not shy away from making claims which are politically incorrect, as long as they are well substantiated.”

The financial pressure created by the company’s funding structure creates a misalignment between safety timelines and commercial success. xAI’s operational practices have drawn significant criticism from the research community, with safety researchers from both OpenAI and Anthropic characterizing xAI’s safety culture as “reckless” and independent evaluations rating the company as weak in most safety categories. The xAI trajectory suggests that when capital structures demand rapid monetization, safety infrastructure tends to follow market pressure rather than precede it.

Funding Determines Safety Capacity

The three dominant U.S. AI companies present three divergent answers to the same question: Can you build transformative AI while keeping humanity safe? Anthropic argues yes, if you engineer safety into your business model through governance structures that gain authority as AI capabilities increase. OpenAI’s trajectory from capped nonprofit to uncapped PBC suggests maybe, if voluntary safety commitments and board oversight can withstand market pressures and competition. Meanwhile, xAI’s debt-financed structure reveals that when investments demand short-term monetization, safety infrastructure becomes an afterthought. Ironically, the company that dropped its PBC status, delayed transparency measures until facing public criticism, and allowed its chatbot to generate sexualized images of children is the same company whose founder sued OpenAI for abandoning its safety mission and is now being used inside the Pentagon’s network.

As AI companies race toward capabilities that could fundamentally reshape civilization, the delicate balance between financial returns and safety may determine whether artificial general intelligence benefits humanity or whether humanity becomes collateral damage.

The debate on Artificial General Intelligence (AGI) has characterized this phenomenon as AI systems that surpass human intelligence capabilities. However, the field of neuroscience also struggles to agree on what intelligence actually is and where it comes from.

System cards can be used alongside model cards to evaluate and publicly share the model’s operations and safety mechanisms. Meta’s explanation, for example, describes model cards as a “standardized way to document, track, and monitor” individual models, while system cards show how a group of models and other technologies within a system complete a task.

Model specifications (or specs) outline a model's explicit rules, objectives, and principles for public transparency. In the case of OpenAI, their Model Spec combines hard rules like refusing illegal requests with general principles (like being helpful to users) that guides how ChatGPT responds to prompts.

Anthropic’s deals with Amazon and Google gives the company flexibility and access to multiple companies’ cloud infrastructures, potentially making Anthropic’s systems more resilient to outages.

Cloud credits are vouchers for free computing resources from providers like AWS or Google Cloud, allowing companies to train AI models without paying cash for infrastructure.

Constitutional AI is like giving the model a rulebook of values (based on the UN Declaration of Human Rights) and teaching it to check its own answers against those rules before responding. Instead of humans labeling millions of examples of good vs. bad responses, the AI learns to evaluate and improve its own outputs using these principles.

Model reports are standardized disclosure documents that detail an AI system's capabilities, safety testing results, known limitations, and potential risks before public deployment.

GPU-backed debt means loans secured by physical graphics processing units (GPUs), the specialized chips required to train AI models, where lenders can seize the hardware if the borrower defaults, similar to how banks can repossess cars for unpaid auto loans.

Interested in writing about AI Policy as a Georgetown Student? Submit your pitch to georgetownaia@gmail.com and work with our editing team to bring your ideas to life! https://www.georgetownai.org/

Georgetown AI Association: Publications

The Redefinition of Privacy

The Camera You Thought You Controlled

The Mosaic You Did Not Know You Were Building

AI as a Force Multiplier for Bad Actors

Why “Deleting Your Data” Is Harder Than It Sounds

Redefining What Privacy Law Must Require

What Comes Next

AI as a Pillar of Russian Hybrid Warfare

Where is Russia’s AI Is Coming From?

Partnerships with Western Adversaries

Russia’s Strategic Uses of AI in Ukraine - the battlefield, cyberspace, and the information ecosystem

What Safety Risks Do AI Chatbots Pose and How Can We Fix them?

Introduction

Safety risks posed by AI chatbots

Providing expertise to aid the development of complex attacks

Creating and scaling highly persuasive disinformation, propaganda, and scams

Encouraging self-harm and suicidal behaviors among users

Model-level safeguards

Safety fine-tuning: Reinforcement Learning

Safety Fine-tuning: Supervised Learning

Safety Filters

Regulatory challenges of managing safety risks through model-level safeguards

Multi-domain risks and uncertainty

Physical-world involvement is equally necessary for risk realization

The Dual-Use Nature of Models

Model-level safeguards can be easily circumvented

Moving Beyond Model-Level safeguards

Navigating the Tradeoff - When Competition and Security Undermine Fundamental Rights

Part #1: Desire to Strengthen Domestic AI industry base → Exceptions for Open-Source AI Models.

European Context:

EU AI Act Outcome:

Fundamental Rights Impact:

Part #2: National Security Interests → Law Enforcement/National Security Interest Exceptions

European Context:

EU AI Act Outcome:

Fundamental Rights Impact:

Conclusion

GAIA Statement on the Anthropic and Department of Defense Dispute

TLDR:

Statement on Dispute between Anthropic and Department of Defense

Background:

Impact:

GAIA’s Stance:

Recommended further reading on this issue:

From Chips to Systems: Rethinking AI Export Controls for Long-Term National Security

Our Policy Recommendation:

Importance of High-Bandwith Memory and System-Level Regulations

Criticisms of Current U.S. Export Policies

Implementation, Impact, and Potential Challenges

Policy Recommendations: Accelerating AI Data Center Development in the US

Energy and electricity:

Land allocation:

Water:

Policy Recommendations

The Federal Energy Regulatory Commission (FERC) should initiate a notice of rule-making and finalize its reforms on incentivizing utilities’ adoption of Grid Enhancement Technologies (GETs):

The Department of Energy (DOE) should clarify areas so that FERC can site interstate transmission lines:

Streamline procedures to maximize the use of Federal Lands:

Supporting a Data Center Transparency Act to Clarify the Information Landscape Surrounding Data Centers:

Follow the Money, Follow the Safety

TLDR:

Can You Build Transformative Artificial Intelligence While Keeping Humanity Safe?

OpenAI: Balancing Safety with Scale

Safety and Alignment Principles

What happens if their models are evaluated as unsafe?

Anthropic: Establishing Foundational Safety

Safety and Alignment

What happens if their models are evaluated as unsafe?

xAI: Financial Pressures and Safety Promises

Safety and Alignment

What happens if their models are evaluated as unsafe?

Funding Determines Safety Capacity