Oxylabs Unveils First-of-its-kind YouTube Datasets to Power Responsible AI

1 day ago 3
Suniway Group of Companies Inc.

Upgrade to High-Speed Internet for only ₱1499/month!

Enjoy up to 100 Mbps fiber broadband, perfect for browsing, streaming, and gaming.

Visit Suniway.ph to learn

The datasets fast-track video data from creator consent to AI-readiness

VILNIUS, Lithuania, June 2, 2025 - Oxylabs, a leading web intelligence platform and proxy provider, introduces industry-first YouTube datasets composed entirely of consent-based data. All of the millions of original videos in the datasets have the explicit consent of the creators to be used for AI training, allowing to bridge the gap between creators and innovators.

"In the ecosystem aiming to find a fair balance between respecting copyright and facilitating innovation, YouTube streamlining consent giving for AI training and providing creators with flexibility is an important step forward. Many channel owners have already opted in for their videos to be used in developing the next generation of AI tools. This enables us to create and provide high-quality, structured video datasets. Meanwhile, AI developers have no trouble verifying the data's legitimate origin,” said Julius Černiauskas, CEO at Oxylabs.

All datasets offered by Oxylabs include videos, transcripts, and rich metadata. While such data has many potential use cases, Oxylabs refined and prepared it specifically for AI training, which is the use that the content creators have knowingly agreed to.

Large volumes of high-quality video data are fundamental for developing multimodal AI, capable of seamlessly handling text, audio, and visual data when performing tasks or generating different types of content. Acquiring such data in a convenient way that establishes a transparent link between creators and AI companies is a challenge the industry is still trying to solve. Structured, AI-ready datasets from YouTube are now a part of this developing improved model for training AI on public data.

Get the latest news
delivered to your inbox

Sign up for The Manila Times newsletters

By signing up with an email address, I acknowledge that I have read and agree to the Terms of Service and Privacy Policy.

Importantly, consent-based datasets also allow AI companies and creators to be on the same page regarding fair AI development. This development has been riddled with still unanswered questions about making copyrighted material fuel rather than stall innovation.

"These datasets offer a breath of fresh air to a tense ecosystem in dire need of facilitating systematic cooperation between creators and AI companies based on mutual agreement. The next wave of tools that will shake the market can now be built on data that all can agree is right for AI training. Hopefully, this also marks a better, more sustainable way forward,” concluded Černiauskas.

The release of ethically sourced YouTube datasets continues Oxylabs' longtime mission to establish and promote ethical industry practices, previously marked by co-founding the Ethical Web Data Collection Initiative (EWDCI) and introducing an industry-first transparent tier framework for proxy sourcing.

To learn more about creator-consent-based YouTube video datasets for AI training, visit the official website now.

About Oxylabs

Established in 2015, Oxylabs is a web intelligence platform and premium proxy provider, enabling companies of all sizes to utilise the power of big data. Constant innovation, an extensive patent portfolio, and a focus on ethics have allowed Oxylabs to become a global leader in the web intelligence collection industry and forge close ties with dozens of Fortune Global 500 companies. Oxylabs was named Europe's fastest-growing web intelligence acquisition company in the Financial Times FT 1000 list for several consecutive years. For more information, please visit: https://oxylabs.io/

Media Contacts

Vytautas Kirjazovas

Oxylabs.io

Tel: +370 655 34419

Email: [email protected]

Read Entire Article