Global Push for AI Safety Enters Next Phase

Governments turn pledges into policy
Governments are moving from promises to action on artificial intelligence. In the past year, new rules, safety institutes, and global agreements have begun to take shape. The goal is simple, but urgent: keep powerful AI systems safe while supporting innovation.
Momentum grew after the 2023 UK AI Safety Summit at Bletchley Park. The resulting Bletchley Declaration set a common tone. It stated that AI should be "safe, human-centric, trustworthy and responsible." In May 2024, the AI Seoul Summit pushed this agenda forward. Countries pledged to keep working on model testing and information sharing.
Regulators are also building tools. The United States launched the U.S. AI Safety Institute at the National Institute of Standards and Technology (NIST). The United Kingdom created a counterpart, the AI Safety Institute in London. The two bodies agreed to cooperate on testing high-capability models. The European Union, meanwhile, adopted the AI Act, the first broad AI law by a major regulator. It uses a risk-based approach and will roll out in phases.
This is not only a story about governments. Big tech firms have signed voluntary commitments too. A 2023 White House fact sheet recorded pledges by major AI developers to carry out "internal and external security testing" before release. Companies also agreed to work on watermarking and abuse reporting.
What the new rules emphasize
Policymakers are converging on several priorities. Most are practical steps aimed at reducing near-term risks while preparing for future threats.
- Red-teaming and evaluation: Independent and adversarial testing is becoming standard for advanced models. Regulators want tests for safety, security, and misuse. They also want reporting on results and fixes.
- Transparency: Users should know when they are interacting with AI. This includes labels for AI-generated content in some settings. It also includes disclosures about model limits and intended uses.
- Data provenance: Watermarks and metadata can help trace synthetic media. The aim is to slow fraud, spam, and disinformation. Standards groups are working on common formats.
- Incident reporting: Developers may need to report serious failures or discovered risks. Policymakers are setting up channels for this, similar to cybersecurity practice.
- Human oversight and accountability: Systems that affect safety or rights should keep a human in the loop. Organizations must define who is responsible for outcomes.
Industry gears up for compliance
Companies are hiring safety engineers and investing in testing. Many now run internal "red teams" to probe models for security holes and harmful behavior. Some invite external researchers to do the same, with structured disclosure and rewards. The growth of evaluation startups shows how fast this niche is maturing.
Voluntary commitments remain a bridge to formal rules. The White House described a common baseline in 2023. Signatories agreed to "internal and external security testing" and to share risk information. Firms also said they would invest in cybersecurity and protect model weights. While not legally binding, these promises set expectations across the industry.
The EU AI Act will add binding duties for certain systems. Providers of high-risk AI will face documentation, quality, and monitoring requirements. They will need to keep logs, validate data, and track performance. General-purpose models that power many apps will also face transparency steps. The details will depend on implementing acts and standards that are still being drafted.
Experts warn about a measurement gap
Testing is only as good as the yardsticks. Many experts say the world lacks mature benchmarks for some of the most serious risks. Those include autonomy, deception, biological misuse, and large-scale social manipulation. NIST’s AI Risk Management Framework highlights key properties for "trustworthy AI." It calls for systems that are "valid and reliable" and "safe, secure, and resilient." It also stresses accountability, transparency, and fairness. Translating these aims into repeatable tests is the hard part.
Academic labs and nonprofits are working on evaluation science. They design probes for dangerous capabilities, bias, and robustness. But models change fast. Capabilities can emerge as systems grow or as users find new prompts. Regulators and industry may need shared testbeds and rapid update cycles. Safety institutes in the U.S. and U.K. say they will publish methods and reference evaluations to help.
Civil society calls for guardrails
Advocates want the benefits of AI to reach everyone and not just a few. The OECD AI Principles, backed by dozens of countries, urge "inclusive growth, sustainable development and well-being." Rights groups warn about risks in hiring, housing, policing, and healthcare. They seek strict bans on abusive uses, stronger privacy protections, and clear paths for redress when things go wrong.
Artists, newsrooms, and educators are also pressing for transparency on data and training. Some want opt-outs from model training or compensation for use of their work. Others push for provenance tools to protect reputations and fight scams. Industry says it is listening, but notes that one-size-fits-all rules could chill useful innovation.
Balancing innovation and caution
Policymakers face a classic trade-off. Too little oversight could allow harm at scale. Too much could slow medical research, climate modeling, and productivity tools. Many are trying a phased approach. They begin with transparency and testing while studying the highest risks. They then add tighter controls for uses that affect safety or fundamental rights.
A major challenge is global coordination. AI markets cross borders. If rules diverge, firms may face a patchwork of demands. International groups, including standards bodies, are working to align definitions and tests. The aim is to keep compliance predictable while raising the safety floor everywhere.
What to watch next
- Testing standards: Expect more technical guidance from safety institutes and standards groups. They will refine methods for red-teaming, content provenance, and model reporting.
- EU AI Act roll-out: Prohibitions on the most harmful practices will arrive first. Obligations for high-risk systems and general-purpose models will follow over time, with guidance to clarify scope.
- U.S. rulemaking: Agencies are drafting sector rules, from healthcare to critical infrastructure. The focus is on safety, non-discrimination, and transparency.
- Open science vs. security: Debates will intensify over when to open-source models. Researchers value transparency. Policymakers worry about misuse and are considering safeguards.
- Benchmarks and audits: Independent evaluations and audits may become routine for powerful models. The credibility of these checks will depend on access and methodology.
The core question is whether governance can keep pace with capability. The answer will hinge on measurement, transparency, and cooperation. For now, the direction is clear. The world is building a common playbook for AI safety. The next phase is execution.