From Voluntary Commitments to Mandatory Governance: What Anthropic's Policy Shift Reveals About AI Safety Architecture

ggstoev
Feb 26
5 min read

Last month, I analyzed Anthropic's new constitution for Claude—an inspirational 47-page document mandating "radical honesty" and ethical standards they describe as "substantially higher than standard visions of human ethics." I contrasted this with their corporate transparency practices, where the 2025 Foundation Model Transparency Index showed Anthropic scoring just 46/100, with particularly poor disclosure on training data and compute resources.

I concluded that this split created a credibility problem: "You can't demand radical honesty from your product while maintaining strategic opacity in your corporate practices."

The situation has evolved significantly since then—in ways that prove the problem is even more fundamental than a simple credibility gap.

Five Weeks That Changed Everything

January 21, 2026: Anthropic releases Claude's Constitution

"Claude should basically never directly lie or actively deceive anyone"
"As AIs become more capable, people need to be able to trust what AIs are telling us"
Extensive commitments to transparency, honesty, and ethical constraints

February 2026: Foundation Model Transparency Index (FMTI) 2025 Results

Anthropic scores 46/100 (down from 51 in 2024)
Upstream transparency (training data, compute): 3/34
Declined to prepare their own transparency report
Industry average declined from 58 to 41

February 24, 2026: Anthropic RSP v3.0 Published

Complete rewrite separating "commitments as a company" from "industry-wide recommendations"
Strongest safety measures moved to non-binding "recommendations" column
Explicit acknowledgment: "We cannot commit to following them unilaterally"

Concurrent: Reports of Pentagon pressure over military AI restrictions and related policy shifts

From Credibility Gap to Structural Problem

My initial analysis focused on trust: if a company won't disclose how it built its AI, why should we trust claims about that AI's ethical behavior?

But Anthropic's RSP v3.0 reveals something more fundamental. This isn't just about trust—it's about the impossibility of maintaining voluntary ethical commitments under competitive pressure.

The new RSP is remarkably candid:

"If one AI developer paused development to implement safety measures while others moved forward training and deploying AI systems without strong mitigations, that could result in a world that is less safe—the developers with the weakest protections would set the pace, and responsible developers would lose their ability to do safety research."

This is the textbook definition of a collective action problem. When safety is costly and competitive pressure is intense, rational actors defect from voluntary commitments—even when they genuinely believe those commitments serve the public good.

The Data Confirms the Pattern

The 2025 FMTI reveals this dynamic across the entire industry:

Companies under competitive pressure maintain opacity:

Consumer-facing companies: 22.5/100 average
Hybrid business models: 30.3/100 average
Frontier Model Forum members (Amazon, Anthropic, Google, Meta, OpenAI): All scored between 31-46

Companies serving enterprise customers with accountability requirements demonstrate transparency:

Enterprise B2B companies: 68.0/100 average
IBM: 95/100
Writer: 72/100
AI21 Labs: 66/100

The difference isn't values—it's structural accountability. Enterprise buyers require transparency to manage vendor risk and compliance. That requirement creates binding pressure that voluntary commitments cannot match.

Anthropic's Own Solution

The most remarkable aspect of RSP v3.0 is this acknowledgment:

"Ultimately, the best way for these recommendations to be implemented is likely via governance of all relevant frontier AI developers by third parties that determine which developers need to provide risk analyses...and determine which such arguments are adequate."

They're explicitly calling for exactly what voluntary frameworks cannot provide: independent, binding, third-party governance.

This validates the central thesis of my earlier analysis—but goes further. It's not just that corporate transparency should match product ethics. It's that neither can be sustained without external enforcement.

Why the EU AI Act Isn't Enough

The EU AI Act represents important progress. Several provisions directly address transparency gaps identified in the FMTI:

Training data summaries required for all general-purpose AI models
Risk management obligations for high-risk AI systems
Transparency requirements for model capabilities and limitations

But three critical limitations remain:

1. Geographic Scope

The EU AI Act governs only the European market. Companies can:

Comply minimally for EU deployment
Maintain different (lower) standards for other markets
Arbitrage between jurisdictions

The FMTI shows this dynamic already: companies optimize transparency disclosure based on perceived regulatory pressure, not universal principles.

2. Enforcement Timeline

Penalties don't begin until August 2, 2026
Regulatory capacity is still developing
Companies can delay substantive compliance

Meanwhile, the FMTI 2025 data shows transparency declining industry-wide (from 58 to 41 average), even as the EU AI Act was being finalized. Voluntary compliance isn't bridging the gap.

3. Scope Constraints

The Act focuses on deployment and use cases rather than comprehensive development transparency. Critical gaps include:

No requirements for compute cost disclosure (only IBM and Writer disclosed this in FMTI 2025)
Limited environmental impact reporting (10 of 13 companies disclosed nothing)
No mandated disclosure of data acquisition methods (most opaque subdomain in FMTI)
Minimal requirements for post-deployment monitoring

What Global Governance Requires

Anthropic's RSP v3.0 demonstrates why voluntary frameworks fail. The FMTI data shows where the gaps are. The EU AI Act shows what's possible.

But truly effective governance requires three components working together:

1. Industry-Wide Standards (Not Company-Specific Policies)

Frameworks like IEEE P7999 (Accountability and Transparency in AI Ethics) must provide:

Outcome-based requirements independent of competitive dynamics
No "competitor-contingent" escape clauses (like those in RSP v3.0 Appendix A)
Auditable, specific criteria—not flexible "make a strong argument" requirements
Binding commitments that survive commercial pressure

2. Third-Party Certification (Not Self-Assessment)

Independent certification must include:

Auditors with no financial interest in certified companies
Published findings with minimal redactions for legitimate security concerns only
Authority to deny/revoke certification that cannot be overridden
Standards enforceable through enterprise procurement and regulatory compliance

The FMTI methodology provides a blueprint: 100 specific, verifiable indicators across the AI development lifecycle. This level of specificity is what certification requires.

3. Multi-Jurisdictional Harmonization (Not Patchwork Regulation)

Anthropic's RSP v3.0 calls for this explicitly:

"Different countries should attempt to harmonize their governance, including standards of evidence, to avoid a race to the bottom."

Without harmonization:

Companies optimize for least-stringent jurisdiction
Regulatory arbitrage undermines all frameworks
The "weakest protections set the pace" (Anthropic's own stated concern)
FMTI scores remain suppressed by competitive dynamics

The Path Forward

We now have converging evidence:

Industry acknowledgment that voluntary frameworks fail (Anthropic RSP v3.0)
Empirical evidence of declining transparency under pressure (FMTI 2025: 58→41 average)
Demonstrated effectiveness of structural accountability (B2B companies: 68/100 vs B2C: 22.5/100)
Explicit calls for third-party governance from companies themselves
Regulatory foundation established (EU AI Act) requiring harmonization and extension

Conclusion

My initial analysis argued that product-level ethics without corporate-level transparency creates credibility gaps. Anthropic's RSP v3.0 proves the problem runs deeper: voluntary ethics—at any level—collapse under competitive pressure.

This isn't about any individual company's failures. Anthropic's candor about the collective action problem is valuable. Their explicit call for third-party governance is important. The constitution's ethical framework is genuinely sophisticated.

But good intentions and sophisticated frameworks aren't enough. The evidence is now overwhelming:

Transparency is declining industry-wide (FMTI data)
Voluntary commitments are weakening (RSP v3.0 conditionality)
Competitive dynamics dominate (B2B vs B2C transparency gap)
Companies acknowledge the solution (third-party governance)

The technology for global coordination exists. Regulatory precedent is established (EU AI Act). Standards frameworks are in development (IEEE P7999). Enterprise buyers have both leverage and incentive.

What's needed now is political will to create truly binding, harmonized, third-party governance before the next crisis forces the issue—and before competitive pressure erodes the safety measures that industry leaders themselves acknowledge are necessary.

The five weeks from Anthropic's constitution to their conditional commitments show how quickly voluntary frameworks fail. We need mandatory ones.

Sources:

Anthropic's Claude Constitution (January 21, 2026)
Anthropic Responsible Scaling Policy v3.0 (February 24, 2026)
2025 Foundation Model Transparency Index (Stanford CRFM, 2025)
EU Artificial Intelligence Act (Effective 2025)