OpenAI Trains YouTube Videos for AI Training Models

  • Editor
  • May 20, 2024
    Updated
OpenAI-Trains- YouTube-Videos-for- AI-Training-Models

OpenAI has recently been reported to transcribe more than one million hours of YouTube videos. This effort was to gather data to train its GPT-4 model. The New York Times highlighted that OpenAI might have overlooked YouTube’s copyright rules, a concern for Google, the owner of YouTube.

See what are people talking about this advancement.

Comment
byu/Georgeika from discussion
intechnology

To convert the videos‘ audio into text, OpenAI used a special tool they developed, named Whisper. The text obtained from these videos was then used to make ChatGPT smarter in conversations. Inside OpenAI, there were discussions about whether using YouTube’s videos this way was right or wrong. Despite potential policy issues, the company decided to proceed, driven by the need for diverse data to improve their AI.

In the evolving landscape of AI’s application across various sectors, its role in education continues to spark significant debate. To delve deeper into how AI technologies might reshape teaching, visit our comprehensive analysis in the blog Will AI Replace Or Assist Teachers?.

Greg Brockman, OpenAI’s president, was reportedly involved in choosing the videos for this project. YouTube does not allow its videos to be used this way. However, an openAI spokesperson Lindsay Held said

We use various data sources, including public and partnership-obtained data, to teach their AI about the world.

Google’s response was cautious, with spokesperson Matt Bryant pointing out that YouTube’s rules don’t allow this kind of use of their content. Neal Mohan, YouTube’s CEO, also expressed concern, indicating that if OpenAI used YouTube for training another AI product, Sora, it would clearly break the platform’s policies.

Interestingly, The New York Times report also mentioned that Google might be doing something similar with its AI model, Gemini, by using YouTube video texts for training. This raises questions about copyright and the use of online content for AI development.

It is also noted that some people are worried about copywriting issues.

Comment
byu/Georgeika from discussion
intechnology

The situation puts a spotlight on the complex issue of using online content to train AI technologies, raising important questions about copyright and the ethics of data use.

As the quality of video data plays a crucial role in training more effective AI models, improving video resolution can be key. Enhance your understanding of this process with our blog on essential video upscaling hacks, which offers practical tips to improve video quality.

For more information about AI, Visit our allaboutai.com

Was this article helpful?
YesNo
Generic placeholder image

Dave Andre

Editor

Digital marketing enthusiast by day, nature wanderer by dusk. Dave Andre blends two decades of AI and SaaS expertise into impactful strategies for SMEs. His weekends? Lost in books on tech trends and rejuvenating on scenic trails.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *