State of the Source at ATO 2025: AI and Data Governance

Go to article URL

In October, the OSI hosted the State of the Source Track at All Things Open designed to connect developers with the big policy conversations shaping our ecosystem. Katie Steen-James and Nick Vidal participated in a fireside chat (Policy: AI / Data Governance) to discuss the latest AI and data governance policy developments.

Policy: AI / Data Governance

Katie Steen-James and Nick Vidal (Open Source Initiative)

AI and discussions about what it should and shouldn’t do are everywhere. The same is true in the policy space as lawmakers try to make sense of the technology and what it means for society. Complex topics like training data and fair use, disclosure and transparency in AI systems are all being debated across state capitols and in Washington. This session will provide an update on the latest policy developments and conversations at the intersection of Open Source and AI. It will include a debrief of OSI’s recent virtual event: Deep Dive: Data Governance, an overview of state legislation, and the White House’s AI Action Plan, among other topics.

Video summary

Opening Remarks

Nick:
Hello everyone, and welcome! We’re going to get started with a fireside chat — a relaxed conversation between Katie and I. We’ll be asking each other questions and exploring some important topics in AI and data governance. Hopefully, you’ll learn a few new things along the way.

Katie:
Sounds great! Hi everyone, I’m Katie Steen-James, Senior U.S. Policy Manager at the Open Source Initiative. I joined OSI in February this year.

For those unfamiliar with our policy work, OSI has been involved in policy for some time, though historically it was handled by volunteers or contractors. Over the past year, we’ve expanded that commitment by hiring full-time policy staff. My colleague, Jordan Maris, is based in Brussels and covers EU policy, while I focus on the U.S. side.

I’m really excited to be here to talk about AI and data governance — topics that sit right at the intersection of technology, transparency, and public interest.

Nick:
Wonderful. I’m Nick Vidal, Community Manager at OSI. I’ve been with the organization for six years, and it’s an exciting time to see our team growing. Together, we’re doing important work to protect developers and organizations that use Open Source software.

I’m based in Brazil — it took me about 15 hours to get here, but I’m thrilled to join this conversation in person!

We’ll be taking turns asking each other questions. Katie’s will be more policy-focused, and mine will lean toward community and collaboration. So, let’s dive in.

The Open Source AI Definition

Nick:
Katie, let’s start with the basics. What is the Open Source AI Definition, and why is it important?

Katie:
That’s a great question. The Open Source AI Definition was released about a year ago at this very conference — before I joined OSI. Since then, I’ve been learning about the process behind it and why it was such a critical step for the community.

Essentially, OSI set out to apply the four freedoms of Open Source software — the freedom to use, study, modify, and share — to AI systems. But AI systems are more than just software. They include training data, weights, models, and infrastructure. So the question was: how do we translate the spirit of Open Source into this broader context?

The Open Source AI Definition (version 1.0) was the first step in that journey. It provides a shared understanding of what “Open Source AI” should mean, helping developers, researchers, and policymakers know what to expect when something carries that label.

For me as a policy professional — not a programmer — this kind of definition is essential. It creates a foundation for community consensus and policy dialogue. As AI continues to influence society and public policy, having clarity around what Open Source AI entails will only become more important.

And of course, this is just version 1.0 — a starting point. The definition will evolve as our understanding of AI systems deepens.

Policy Developments Affecting Open Source

Nick:
What kinds of policies are currently in place that impact developers or companies using Open Source software?

Katie:
In the U.S., most federal policies focus on encouraging the use of Open Source software within government agencies — for example, making sure it’s considered equally in procurement and promoting it internally.

Under the previous administration, there was significant attention on software security, including Software Bills of Materials (SBOMs) and attestation requirements for vendors. Those discussions have quieted a bit under the current administration, but we’re still tracking how these ideas evolve.

In AI policy specifically, the federal government has been cautious about regulation. Instead, we’re seeing more activity at the state level. Some of these proposed laws include requirements for AI developers or systems, but they often fail to distinguish between developers and deployers.

Developers might release code publicly, but they don’t control how others use it downstream. So, imposing legal responsibility on them would be a concern for us.

At OSI, we’re developing materials to help policymakers understand these nuances — making sure they don’t inadvertently harm Open Source development when drafting AI policies.

AI Policy Trends and Federal Initiatives

Nick:
Are there any major trends or proposals emerging around AI and Open Source?

Katie:
Yes, definitely. At the federal level, the White House released an AI Action Plan earlier this year. It includes a section on promoting “Open Source and open-weight AI.” The administration has expressed interest in bringing stakeholders together to help small businesses adopt open and transparent AI tools.

At the same time, as I mentioned, states are introducing their own AI legislation. We’re monitoring these closely to ensure they don’t unintentionally create burdens for Open Source developers.

Internationally, policy coordination is much harder these days given the geopolitical climate. Organizations like the United Nations are hosting discussions on digital public goods and Open Source, but most collaboration is happening directly between countries rather than through centralized frameworks.

That said, open collaboration remains essential. Open Source development, by nature, thrives on global cooperation, and maintaining that openness is critical — especially as nations develop their own AI governance strategies.

Fair Use and AI Training Data

Nick:
What’s happening in the U.S. around the debate on fair use and AI training?

Katie:
It’s a fascinating and complex issue. I’ll note that I’m not a lawyer, but I’ve worked on copyright-related policy before, especially in my previous role advocating for libraries and open access to publicly funded research.

In short, the courts have generally said that using copyrighted material to train AI systems qualifies as fair use — though the way the material was acquired, shared, or stored might still raise copyright concerns.

That position has created tension between major tech companies that rely on large datasets to train AI, and content rights holders — often large publishers or media companies — who argue this use infringes on their intellectual property.

Some of the strongest supporters of fair use are libraries and research institutions, because fair use enables access to knowledge and innovation. So, this debate isn’t just about tech versus artists — it’s also about the public interest.

Ultimately, the courts will continue to shape how fair use applies to AI training, but the outcomes will have major implications for openness and research.

The Importance of Data Governance

Nick:
Why is data governance such an important piece of building trustworthy AI?

Katie:
Because data is the foundation of every AI system. Right now, we’re seeing a lot of fragmented efforts — different groups introducing new licenses, frameworks, or data-sharing initiatives. But we don’t have a lot of consensus across these efforts. .

Without clear frameworks for handling data provenance, copyright, and privacy, it’s difficult to ensure that AI systems are ethical and transparent.

Creating consensus frameworks around data governance — both legal and social — is essential. It helps everyone understand what can be done with data, how it should be managed, and what responsibilities come with it.

Ultimately, good data governance is what allows AI systems to be trusted by the public and policymakers alike.

The Deep Dive: Data Governance Conference

Katie:
Let’s switch gears, Nick. OSI recently hosted the Deep Dive: Data Governance virtual conference. Can you tell us more about it and why OSI organized it?

Nick:
Absolutely. The Deep Dive: Data Governance took place just a couple of weeks ago, and it was an incredible event.

We organized it as part of our ongoing journey to define Open Source AI. Early on, we realized that data is central to that conversation — but OSI isn’t a data governance organization. So, we partnered with groups like Creative Commons, the Open Knowledge Foundation, and the Mozilla Foundation to bring together experts from multiple communities.

The conference gathered developers, data scientists, practitioners, and policy specialists to explore how Open Source principles — transparency, collaboration, and shared knowledge — can inform responsible data practices.

We also discussed a key tension between Open Source development and data governance. For code, openness is straightforward: you can use, modify, and share it. But with data, it’s not that simple. You have to consider privacy, consent, and copyright. Sometimes data can be used but not shared.

That’s why this dialogue is so important — to bring these communities together and find common ground.

Open Source Principles for Trustworthy AI

Katie:
In your view, how can Open Source principles help create more transparent and trustworthy AI systems?

Nick:
Open Source has always been rooted in collaboration, transparency, and sharing — values that are essential for trustworthy AI.

When developing the Open Source AI Definition, we applied the four freedoms of free software — to use, study, modify, and share — to the different components of AI systems. While not every component (like data) can be made fully open, the methods, tools, and processes around data can be.

For example, the code used to clean, filter, or check data for bias should be open and inspectable. That allows others to review it, identify weaknesses, and suggest improvements.

At the Deep Dive conference, two examples really stood out. One keynote emphasized that Open Source AI shouldn’t try to compete with closed AI, but instead raise the bar for transparency and ethics — showing what’s possible when openness guides development.

Another presentation introduced the Data Nutrition Project, which proposes labeling AI systems like food products — with “ingredients lists” showing what data and code were used. That kind of transparency helps build public trust.

Government Use and Public Sector Accountability

Katie:
Let’s talk about governments. What did you hear at the conference about AI use in the public sector and best practices for accountability?

Nick:
That was a big topic. One of our keynote speakers, Alek Tarkowski from Open Future, has been championing the idea of public AI — meaning governments should invest not only in large models but also in the full stack: from high-quality datasets to infrastructure and chips.

He and others argued that governments need to develop open, transparent AI systems that serve citizens responsibly. For example, researchers from the Open Data Institute in the U.K. analyzed how government data is used to train large language models — and found that it’s often underutilized or poorly structured. Improving data quality and accessibility could help models better support real public needs, such as social services.

Another fascinating presentation came from representatives of U.S. tribal nations. They discussed federated learning as a way to retain local control over sensitive data while still benefiting from AI insights. It’s a great example of balancing sovereignty, privacy, and collaboration.

Privacy and Provenance

Katie:
Privacy obviously plays a huge role in data governance. Were there any standout ideas or best practices shared on that front?

Nick:
Definitely. One presenter, from the University of Copenhagen, shared research on medical imaging datasets and found widespread “copycat” issues — data being shared and duplicated without proper metadata or privacy safeguards.

To address this, another speaker, Lisa Bobbit from Cisco, introduced the concept of data provenance standards. These standards help organizations track data throughout its entire lifecycle — where it came from, how it’s been used, and what privacy constraints apply.

Provenance is now becoming a central part of the Open Source AI conversation. Knowing the origin and intended use of data is key to building systems that are both ethical and traceable.

Bias in Training Data

Katie:
Bias in training data is a well-known challenge. How did the conference tackle that issue?

Nick:
Bias came up in several talks. For instance, Stella Biderman from EleutherAI discussed how datasets composed of older, public-domain materials — like 19th-century books — often carry strong biases, particularly against women and people of color.

Similarly, researchers from Harvard talked about the difficulty of balancing copyright compliance with the need for modern, representative data. Using only old, bias-filled datasets creates serious distortions in models, especially those meant for real-world applications.

Everyone agreed that addressing bias requires both technical fixes and policy oversight — and that openness and transparency are crucial to identifying where biases exist in the first place.

International Collaboration in Open Source AI

Katie:
Can Open Source AI serve as a bridge for international collaboration, or is it being pulled into geopolitical tensions?

Nick:
That’s a great question. We actually heard a wonderful keynote from the Chinese Open Source AI community Kaiyuanshe — they’ve been translating OSI’s work into Chinese and actively engaging with the global Open Source movement.

What’s beautiful about Open Source is that it transcends borders. It doesn’t care about nationality, gender, or religion — it’s about collaboration and contribution.

Yes, there are geopolitical tensions between nations, but at the developer level, people just want to build and share. Open Source gives them a framework to do that.

Even within countries, there are challenges — such as data loss or restricted access — but open collaboration remains a unifying force. It reminds us that technology can serve humanity best when it’s built transparently and collectively.

Closing Remarks

Katie:
That’s a wonderful note to end on. Open Source has always been about shared knowledge and collaboration, and applying those principles to AI and data governance feels more important than ever.

Nick:
Absolutely. Thank you all for joining us for this conversation. We hope it’s given you insight into how Open Source principles can help shape a more transparent, trustworthy AI future.

4 November 2025 blog.opensource.org/feed/

internet-npo | reporting