Exclusive: EU AI Act Compliance Issues Exposed for Big Tech

Published:

AI Models Face Compliance Challenges with EU Regulations

By Martin Coulter

LONDON (Reuters) – As the European Union (EU) gears up to enforce its ambitious AI regulations, a recent analysis reveals that some of the most prominent artificial intelligence models are struggling to meet key compliance standards, particularly in areas such as cybersecurity resilience and the mitigation of discriminatory outputs. This revelation comes at a time when the EU is intensifying its scrutiny of AI technologies, especially following the public release of OpenAI’s ChatGPT in late 2022, which ignited widespread discussions about the potential risks associated with generative AI.

The EU’s Regulatory Landscape

The EU has been engaged in extensive deliberations over new AI regulations, culminating in the AI Act, which is set to roll out in stages over the next two years. This legislation aims to establish a comprehensive framework for the development and deployment of AI technologies, particularly focusing on "general-purpose" AIs (GPAI). The urgency for these regulations was amplified by the rapid adoption of generative AI tools, prompting lawmakers to create specific rules that address the unique challenges posed by these technologies.

The LatticeFlow AI Evaluation Tool

In a significant development, a new evaluation tool designed by Swiss startup LatticeFlow AI, in collaboration with ETH Zurich and Bulgaria’s INSAIT, has been introduced to assess generative AI models from major tech companies such as Meta and OpenAI. This tool, known as the "Large Language Model (LLM) Checker," evaluates AI models across a range of categories, including technical robustness, safety, and compliance with the forthcoming AI Act. The models are scored on a scale from 0 to 1, providing a clear benchmark for assessing their readiness for regulatory compliance.

On a recent leaderboard published by LatticeFlow, models from Alibaba, Anthropic, OpenAI, Meta, and Mistral achieved average scores of 0.75 or above. However, the LLM Checker also highlighted critical areas where these models fell short, indicating that companies may need to allocate additional resources to address compliance gaps.

Compliance Risks and Discriminatory Outputs

One of the most pressing concerns identified by the LLM Checker is the issue of discriminatory output, a persistent challenge in the development of generative AI models. These models often reflect human biases related to gender, race, and other sensitive areas, which can lead to harmful and unfair outcomes. In testing for discriminatory output, OpenAI’s "GPT-3.5 Turbo" received a score of 0.46, while Alibaba Cloud’s "Qwen1.5 72B Chat" model scored even lower at 0.37. These scores underscore the urgent need for companies to refine their models to mitigate bias and ensure fair treatment across diverse user groups.

Additionally, the LLM Checker assessed models for their resilience against "prompt hijacking," a type of cyberattack where malicious prompts are disguised as legitimate requests to extract sensitive information. Meta’s "Llama 2 13B Chat" model scored 0.42 in this category, while French startup Mistral’s "8x7B Instruct" model received a score of 0.38. These findings highlight the vulnerabilities that exist within current AI models and the critical need for enhanced cybersecurity measures.

The Path Forward

Despite the mixed results, the overall performance of the tested models suggests a positive trajectory for AI compliance. Claude 3 Opus, developed by Google-backed Anthropic, achieved the highest average score of 0.89, indicating that some models are better positioned to meet regulatory requirements than others. The LLM Checker is designed to evolve alongside the AI Act, with plans to incorporate further enforcement measures as they are established.

Petar Tsankov, CEO and cofounder of LatticeFlow, expressed optimism about the results, stating that they provide a roadmap for companies to fine-tune their models in alignment with the AI Act. "The EU is still working out all the compliance benchmarks, but we can already see some gaps in the models," Tsankov noted. "With a greater focus on optimizing for compliance, we believe model providers can be well-prepared to meet regulatory requirements."

Conclusion

As the EU continues to refine its AI regulations, the findings from LatticeFlow’s evaluation tool serve as an early warning for tech companies. With potential fines of up to 35 million euros ($38 million) or 7% of global annual turnover for non-compliance, the stakes are high. While the European Commission has welcomed the LLM Checker as a "first step" in translating the AI Act into actionable technical requirements, it remains clear that significant work lies ahead for AI developers. The path to compliance will require concerted efforts to address biases, enhance cybersecurity, and ultimately ensure that AI technologies serve the public good without compromising ethical standards.

Related articles

Recent articles