MEASUREMENT AND GOVERNANCE

AI is no longer just evolving—it is active in our daily lives. Safely navigating this shift requires more than intuition; it demands rigorous, quantifiable evaluation. Accurately measuring AI capabilities and enforcing accountable governance isn't just a technical challenge—it is our responsibility to future generations.

This space is our dedicated repository for evidence-based AI safety, tracking the real-world benchmarks, red-teaming reports, and policy frameworks required to ensure secure, reliable intelligence.

03 // Corporate Governance Research

Frontier AI Developers Need an Internal Audit Function

Source: Risk Analysis // Jonas Schuett (Centre for the Governance of AI)

Focus Vector: Institutional Risk Controls & Board Oversight Infrastructure

As enterprise adoption scales and dangerous capabilities arise unpredictably, relying solely on ad-hoc safety patches is no longer viable. This highly influential paper argues that frontier AI deployment demands a formalized corporate governance structure. Rather than reinventing the wheel, it provides a direct blueprint for adapting traditional internal audit principles to provide boards of directors with independent, system-wide assurance over AI risks.

The Third Line of Defense: Adapting the Institute of Internal Auditors’ Three Lines Model to ensure an AI compliance function is organizationally independent of senior management and reports directly to the board's audit committee.
Addressing Unpredictable Risks: Implementing continuous, structured evaluation frameworks to catch hidden operational, control, and system-level failures before deployment.
Bridging Technical & Executive Gaps: Providing senior leadership and corporate boards with a clear, audited understanding of the firm's true risk exposure, clarifying accountability across complex deployment pipelines.

[ Read Full Research Paper]

02 // Threat Intelligence Case Study

Disrupting AI-Orchestrated Cyber Espionage

Source: Anthropic Research

Focus Area: Threat Detection & Model Vulnerability Measurement

As frontier models scale, measuring their vulnerability to exploitation is critical to global security. This landmark report analyzes the first documented disruption of a state-sponsored cyber espionage campaign leveraging AI systems.

Key Insights:

Active Mitigation: A deep dive into how advanced model monitoring detected and neutralized coordinated threat actors.
The New Baseline: Why traditional cybersecurity metrics must evolve to measure AI-specific threat vectors.
Accountability in Action: Proving that frontier developers must move from passive safety guardrails to active, real-time defense.

[ Read Full Research Paper]

01 // Autonomous Systems Research

Measuring AI Agent Autonomy

Source: NeurIPS SoLaR Workshop Research

Focus Vector: Code Inspection & Risk Assessment Frameworks

As AI systems transition from static language models to self-directing agents capable of executing complex workflows, defining and quantifying "decision autonomy" becomes a critical prerequisite for secure deployment. This foundational paper introduces a scalable, code-based inspection methodology designed to evaluate system risks and operational parameters without the inherent dangers or costs of runtime testing.

Evaluating Without Execution: A framework for scoring an agent's orchestration code based on its structured taxonomy, drastically reducing risk vectors during early-stage auditing.
The Autonomy Matrix: Categorizing agent independence across two core dimensions—Impact (what actions can it take and within what boundaries?) and Oversight (how are humans kept in the loop and how visible are the internal processes?).
Governance and Liability: Providing concrete metrics to help organizations safely implement operational guardrails, determine fallback protocols, and effectively allocate accountability across complex software scaffolding.

[ Read Full Research Paper ]

**01 // Power without control: rethinking cybersecurity for the age of agentic AIMeasuring AI Agent Autonomy**

Source: The Economist

Focus Vector: Risk Management

“Conventional cybersecurity has long focused on external threats but this approach will not suffice when it comes to AI agents. External attackers may exploit those systems, but agents can also create risks from within by accessing data without proper authorisation, sharing sensitive information unintentionally or acting beyond their intended scope.” The report sets out to answer the following three questions:

How is agentic AI reshaping the cyber threat landscape?
Where are the gaps in governance, visibility and response that hinder safe deployment at scale?
What defines cyber resilience when failure is inevitable?

[ Read Full Research Paper ]

MEASUREMENT AND GOVERNANCE

This space is our dedicated repository for evidence-based AI safety, tracking the real-world benchmarks, red-teaming reports, and policy frameworks required to ensure secure, reliable intelligence.

03 // Corporate Governance Research

Frontier AI Developers Need an Internal Audit Function

02 // Threat Intelligence Case Study

Disrupting AI-Orchestrated Cyber Espionage

01 // Autonomous Systems Research

Measuring AI Agent Autonomy

**01 // Power without control: rethinking cybersecurity for the age of agentic AIMeasuring AI Agent Autonomy**

Location

Hours

Contact

MEASUREMENT AND GOVERNANCE

This space is our dedicated repository for evidence-based AI safety, tracking the real-world benchmarks, red-teaming reports, and policy frameworks required to ensure secure, reliable intelligence.

03 // Corporate Governance Research

Frontier AI Developers Need an Internal Audit Function

02 // Threat Intelligence Case Study

Disrupting AI-Orchestrated Cyber Espionage

01 // Autonomous Systems Research

Measuring AI Agent Autonomy

01 // Power without control: rethinking cybersecurity for the age of agentic AIMeasuring AI Agent Autonomy

The Gradient Protocol

Location

Hours

Contact

**01 // Power without control: rethinking cybersecurity for the age of agentic AIMeasuring AI Agent Autonomy**