Report from OSS EU 2025 and AI_dev: What’s next for OSAID

Go to article URL

At this year’s Open Source Summit EU (OSS EU 2025) and AI_dev EU 2025, Stefano Maffulli, Executive Director of OSI, and Jordan Maris, OSI’s EU Policy Analyst, presented a summary of 10 months since the release of the Open Source AI Definition v1.0. The two talks at the events hosted by the Linux Foundation in Amsterdam raised questions from the audience that reflected the maturity of the debate.

The AI dilemma

The presentations quickly summarized why AI had to be treated differently from software. The emergence of AI systems capable of generating code, images, and text presented a fundamental challenge to traditional Open Source concepts. While it was relatively straightforward to determine that Open Source AI should provide the same four freedoms as Open Source software (use, study, modify, and share), defining the equivalent of “source code” for AI systems proved complex.

Unlike traditional software where humans write code, AI systems—particularly modern machine learning models—operate as black boxes with behavior that emerges from training processes rather than explicit programming. This raised the critical question: what constitutes the “preferred form” for users and developers to study and modify an AI system?

Enter the Open Source AI Definition

Over the past years, OSI convened over 100 participants from 27 countries—many from the global south—with the support of the Sloan Foundation and other partners. Our goal was to define what “Open Source AI” should mean in practice. The process led to the Open Source AI Definition (OSAID), approved by the OSI Board in October 2024. According to this definition, an Open Source AI must provide unrestricted access to:

Model weights and parameters
Code used to build and train the system
Code for dataset creation
The complete training data (or detailed information enabling its reproduction, when distribution isn’t possible)

This ensures that the four essential freedoms of Open Source—use, study, modify, and share—apply meaningfully to AI.

Maffulli emphasized that while some people describe openness as a “spectrum,” OSI sees Open Source as a binary gate threshold. Just as Linux and BSD represent different licensing models but both qualify as Open Source, AI must meet a minimum set of criteria to be considered genuinely Open Source.

Why policymakers should care

Maris highlighted why this clarity is crucial for lawmakers worldwide. Governments are drafting AI legislation in the EU, Canada, the U.S., and China, often with special provisions for Open Source. Without a clear definition, however, “Open Source” risks being used as a marketing buzzword, undermining trust, competition, and safety.

The definition’s development was significantly influenced by regulatory needs, particularly in the European Union. The EU’s AI Act includes exemptions for Open Source AI, but lawmakers struggled with how to define such systems. This challenge became apparent during the Act’s negotiation process, where well-intentioned attempts to consider Open Source created complexity without clear definitional boundaries.

The Open Source AI Definition matters because it ensures:

1. True freedom to use

The OSAID ensures genuine freedom to use AI systems without hidden restrictions. Many models claiming to be “Open Source” actually impose usage limitations based on user numbers, commercial applications, or other criteria. Such restrictions contradict both the spirit of Open Source and the policy rationale for regulatory exemptions, which assume that Open Source AI can contribute to “research and innovation in the market” and “provide significant growth opportunities.”

2. Legal flexibility through data information

The definition introduces the concept of “data information” as an alternative to requiring complete dataset publication. This approach addresses several critical legal challenges:

Data protection compliance: Particularly relevant for medical AI, where patients may consent to their data being used for beneficial AI development but not for public distribution
Copyright law variations: International differences in copyright law (exemplified by Italy’s state copyright on the statue of David) make universal dataset sharing legally problematic
Text and data mining exceptions: EU copyright law allows AI training on copyrighted material but doesn’t extend to redistribution rights
Accidental copyright inclusion: Large datasets inevitably contain some copyrighted material; discovering this shouldn’t invalidate a model’s Open Source status

3. Downstream risk analysis and compliance

The definition enables critical downstream compliance verification. Developers building derivative AI systems need sufficient information to ensure their creations comply with legal requirements including:

Data accuracy and quality standards (required under the EU AI Act)
Data protection law compliance
Bias and safety risk assessment
Copyright clearance verification
Security validation against data poisoning attacks

Without transparency, derivative AI systems risk legal liability and potential harm to users—slowing innovation and adoption.

What we’ve seen in 2025

Several encouraging developments suggest movement toward true Open Source AI:

Increasing openness: Models like Granite and DeepSeek R1 are progressively releasing more training code and infrastructure
Open dataset growth: More truly open training datasets are becoming available, such as EleutherAI’s Common Pile
Reduced barriers: Government and institutional initiatives are providing compute resources for AI development
State funding: Various governments are funding the development of Open Source AI systems
Open Source AI models: Organizations like AI2 with their OLMo project are developing models trained entirely on open datasets

But challenges remain. Many still misuse the term “Open Source AI” to mean “open weights only,” which OSI continues to push back against. Copyright and data provenance remain complex issues, especially given global legal variation.

The next frontier: data governance

Looking ahead, OSI is doubling down on data governance and interoperability. In October 2025, we will host Deep Dive: Data Governance (October 1–3), a free online event bringing together experts to explore data standards, legal frameworks, and best practices for building trustworthy AI systems.

Get involved

OSI is committed to monitoring the field and evolving the definition as technology and practices mature. We continue to seek broader community engagement, particularly from developers working with non-generative AI systems in fields like biotechnology, medical applications, and computer vision. These diverse use cases help inform the definition’s evolution and ensure broad applicability. Together, we can ensure that AI development remains open, fair, and empowering for everyone—developers, researchers, and society at large.

OSI is calling on the community to:

Share your use cases – Whether in LLMs, robotics, biotech, or beyond.
Attend the Deep Dive: Data Governance conference – Registration is free, and it’s your chance to shape the conversation about public, Open Source AI.

Resources

Watch the full recordings here:

10 September 2025 blog.opensource.org/feed/

internet-npo | reporting