The schedule for the Deep Dive: Data Governance virtual conference is now live. Taking place October 1–3, 2025 this premier three-day event will bring together industry leaders and world-class experts to explore the latest advancements in Data Governance and Open Source AI.
From a strong pool of 50 proposals, we’ve curated 12 standout sessions across three key themes:
- Stewards of the data commons
- Frameworks for data governance
- Building and preserving public datasets
Explore the preliminary schedule below or access it via our mobile website:
| Time (EDT, UTC -4) | Session | Speaker |
| October 1st | Stewards of the data commons | |
| 12:00 PM | Opening Keynote: Data is the key to Open Source AI | Stefano Maffulli |
| 12:15 PM | A data pathway to building public AI | Alek Tarkowski |
| 1:00 PM | Governments as data providers for AI | Neil Majithia |
| 1:45 PM | Copycats and the Commons: Governing Open Data for Trustworthy AI | Natalia-Rozalia |
| 2:30 PM | Sovereign by Design: A Blueprint for Federated, Consent-Based AI Systems | Sal Kimmich |
| 3:15 PM | Wrap-Up + Live Q&A | Nick Vidal |
| October 2nd | Frameworks for data governance | |
| 12:00 PM | Keynote: Trends and Insights of China Open Source Ecosystem in AI Era | Nadia Jiang, Emily Chen |
| 12:15 PM | New licensing initiatives for AI training data | Ramya Chandrasekhar |
| 1:00:00 PM | How Data Provenance Powers Trustworthy AI | Lisa Bobbitt |
| 1:45 PM | The CLeAR Documentation Framework for AI Transparency | Kasia Chmielinski |
| 2:30 PM | Bias Transparency in Human-AI Systems: Open Data Governance Frameworks for AIED | Chaeyeon Lim |
| 3:15 PM | Wrap-Up + Live Q&A | Nick Vidal |
| October 3rd | Building and preserving public datasets | |
| 12:00 PM | Keynote: What should open source AI aspire to be? | Stefan Baack, Kasia Odrozek |
| 12:15 PM | Building Public Data for LLMs | Stella Biderman |
| 1:00 PM | A new paradigm for publishing library collections: Institutional Books 1.0, a 242B token dataset | Greg Leppert, Matteo Cargnelutti, Catherine Brobston |
| 1:45 PM | Beyond Extraction: Building Community-Centered Speech Data | Jessica Rose |
| 2:30 PM | Saving What’s Ours: The Data Rescue Project and the Fight for Public Data | Lynda Kellam, Mikala Narlock |
| 3:15 PM | Live Q&A + Closing Remarks | Stefano Maffulli |
The Deep Dive: Data Governance conference builds on the momentum of past events organized by the OSI, including the Deep Dive: AI webinars held in 2023, the Data in Open Source AI workshop held in 2024, and the early-2025 white paper “Data Governance in Open Source AI: enabling responsible and systematic access.”
Data governance and Open Source AI are evolving rapidly, and this event is your opportunity to stay at the forefront. OSI’s Deep Dive brings together leading experts to share practical insights, emerging trends, and proven strategies that organizations of all sizes can apply. Registration is free and we invite you to join us.
We would like to thank the authors who have submitted their proposals and the Program Committee: Alek Tarkowski (Open Future), Anna Tumadóttir (Creative Commons), Carlo Piana (Open Source Initiative), Julie Hunter (Linagora), Masayuki Hatta (Surugadai University), Maximilian Gahntz (Mozilla Foundation), Nick Vidal (Open Source Initiative), Ramya Chandrasekhar (CNRS – Centre national de la recherche scientifique), Stefano Maffulli (Open Source Initiative), Shane Coughlan (OpenChain), and Malcolm Bain (Across Legal).