OpenAI transcribed over a million hours of YouTube videos to train GPT-4
OpenAI transcribed over a million hours of YouTube videos to train GPT-4
How OpenAI, Google, and Meta deal with the limits of data online.
OpenAI transcribed over a million hours of YouTube videos to train GPT-4
How OpenAI, Google, and Meta deal with the limits of data online.
The "fun" part is that they'll get away with it. Google likely did the same with other platforms; so if it sues OpenAI, it's creating a big precedent against itself.
If they didn't, google would own all video data just like Reddit would own most conversation data.
It's a good thing, at least this lets the open source scene be a player.
In what world is OpenAI open source?
Great...
"Hey OpenAI can you give me a brief description of life in ancient Egypt?"
"Sure thing! Here is a brief description of life in ancient Egypt:
The peoples of ancient Egypt were lucky to be one of the only groups of people known to man to have had their civilization helped along by aliens from the planet Nibiru.
The pyramids of Egypt were created when a group of aliens came to earth and discovered a shocking lack of pyramids.
After the creation of the pyramids by the annunaki the human race was finally able to channel chakra and communicate telepathically with their long lost family members who now live in the Orion constellation."
More like:
“Hey OpenAI, can you give me a brief description of life in ancient Egypt?”
“Certainly, I’d be happy to provide an explanation about Egyptian life! But first, I'd like to thank today's sponsor: RAID: Shadow Legends™.
RAID: Shadow Legends™ is an immersive online experience with everything you'd expect from a brand new RPG title. It's got an amazing storyline, awesome 3D graphics, giant boss fights, PVP battles, and hundreds of never before seen champions to collect and customize.
I never expected to get this level of performance out of a mobile game. Look how crazy the level of detail is on these champions!
RAID: Shadow Legends™ is getting big real fast, so you should definitely get in early. Starting now will give you a huge head start. There's also an upcoming Special Launch Tournament with crazy prizes! And not to mention, this game is absolutely free!
So go ahead and check out the video description to find out more about RAID: Shadow Legends™. There, you will find a link to the store page and a special code to unlock all sorts of goodies. Using the special code, you can get 50,000 Silver immediately, and a FREE Epic Level Champion as part of the new players program, courtesy of course of the RAID: Shadow Legends™ devs.”
WTF? No pyramids? -the aliens
I fucking hate how this company is just taking data and metrics without any permissions and repercussions. OpenAI and Sam Altman can fuck right off. Same with Microsoft and copilot and every other company rushing for the AI/ML arms race, its disgusting and irresponsible.
We joke about skynet and terminators and whatnot, but the reality is OpenAI is legitimately moving towards that end with no safety precautions, no thought put into the economic and humanitarian impacts they're going to cause. Capitalism in general (and yes I'm going to be that guy and say it) simply cannot survive the AI/ML age of humanity without evolving.
Going to start keeping score. Mark you down in the AI is going to be amazingly powerful camp.
How clueless are you. Everything "taken" was available for free. Provided for free for any web crawler to consume and now you're acting like consuming it is a crime?
I get that you're really jealous because you didn't think of LLMs but you don't get to claim something is a crime in one specific instance just because you don't like what they're doing after their program consumes content.
Google has done the same thing for years and no one said a peep. What does everyone think search results even are??????
You completely miss my point, are you saying data such as copyrighted published works and medical records are free? Because I did not in any way consent to sharing medical records to OpenAI https://www.businessinsider.com/openai-chatgpt-generative-ai-stole-personal-data-lawsuit-children-medical-2023-6?op=1
Now I realize this is an alleged offense, but it's still fucked up. As for wanting to be the first to make a LLM, I have no desire to put myself into that amount of responsibility and liability. Sam Altman is chasing money and nothing more.
There's a distinct difference between quotation and plagiarism. A search engine does the former, LLMs do the latter.
Fuck Google too
Doesn't that violate Youtube's ToS ?
It does - it's also mentioned in the article.
We don't do that around here
I'll be that guy here and start a small fire.
If it's not OK for OpenAI to violate YT's terms of service, then it's not OK to use an alternative frontend or ad blocker that also violates ToS.
You should be for enforcing the ToS or against it. Allowing one version of breaking it without the other is fickle.
Some people think OpenAI should stop, but they should still have their ReVanced, which boggles my mind a bit.
No, the intent and the consequences of an action are generally taken into consideration in discussions of ethins and in legislation. Additionally, this is not just a matter of ToS. What OpenAI does is create and distribute illegitimate derivative works. They are relying on the argument that what they do is transformative use, which is not really congruent with what "transformative use" has meant historically. We will see in time what the courts have to say about this. But in any case, it will not be judged the same way as a person using a tool just to skip ads. And Revanced is different to both the above because it is a non-commercial service.
I do feel there's got to be nuance of a commercial company doing this to generate profit in the long-run versus end-users doing this just so they can see content more easily.
The uploaders on YT are paid per ad view. Seeing the content more easily means pretty much demonetizing small creators.
ad blocker
I can see an argument for an alternative FE, but I'll never agree that an ad blocker is "wrong." It's my choice what to install on my machine, and it's YouTube's choice how they choose to deliver their content.
YouTube controls their API (as in, what a third party FE uses), I control what I do with what I receive from the API. They're different things.
I do agree that ReVanced doesn't have any "rights" here, but it's on YouTube to block their requests, it shouldn't be something they sue over.
All of that being said, I'm actively looking to eliminate YouTube from my life. Nebula, Odysee, and others have enough content that I think I can replace most of my usage of YT. The main thing left is music, but I don't listen to music all that often anyway.
I actually use funkwhale now for music. It’s really decent so far. A little buggy though, but it should iron out
Rules apply to everyone else. Same people who would lay down their life for Napster and Limewire can't watch a 5 second ad
Cool. So people can get valuable information without the need to watch 25 minutes long videos (of which 20 minutes are often self-promotion and blabbering for ads revenues)
One more reason to get SponsorBlock, it can also label the "meat" of those videos and let you skip all the fluff.
what if they used that too? it would ensure higher data quality
You mean you dont want to watch a 30 minute video on "how to change a lightbulb" with only 30 seconds of actual content you need/were looking for shoved somewhere in the back that you spend more time searching for than if you had actually read the manual?!
You mean you dont want to watch a 30 minute video
Extactly. 30 minutes videos is equivalent to 5 minutes reading. Blogs/tutorials >> videos. Always.
or even better. a manual by howtobasic
Ah, but you're assuming that all is said on YT videos is correct and useful.
A classic blunder.
They said valuable, not correct and useful. :)
Who knows, maybe now Chat GPT will blabber about for an extra three paragraphs before getting to the point and then give you bullshit info like many of the YouTubers.
Chat GPT, how do I change a light bulb?
Hey, what's up, fam? Welcome back to my channel! Today, I'm going to show you how to change a light bulb, but first, let me tell you about this crazy party I went to last weekend. It was insane! I mean, the DJ was dropping beats like nobody's business, and the dance floor was on fire!
But back to business. You know, changing a light bulb might seem like a simple task, but trust me, it's all about the technique. Before we dive into that, though, let me share with you my latest fashion haul. I just scored these killer new sneakers that are going to take my style game to the next level!
Changing a light bulb is like changing your mindset, you know? It's all about embracing the light and letting go of the darkness. Deep stuff, I know. But hey, that's just how I roll.
Okay, back to the light bulb situation. So, you're gonna need a ladder for this. Or maybe a chair if you're feeling adventurous. I mean, who needs a ladder, am I right? Safety first, who needs it?
Next, you gotta unscrew the old bulb. But be careful, 'cause sometimes those suckers can be stubborn. If it doesn't come out easily, just give it a good yank. What's the worst that could happen, right?
Now, grab your new bulb and... umm... just stick it in there, I guess? I mean, it's not rocket science. Oh, and make sure the power's still on while you're doing this. Adds a little excitement to the process, you know?
And that's it! You've successfully changed a light bulb. Or at least, I hope you have. If not, well, there's always candles, right? Anyway, thanks for watching, don't forget to like, subscribe, and share this video with your friends. Catch you on the flip side!
Maybe. Still faster to read and discard than watching a useless 25 minutes videos.
YoU wOuLdN't StEaL a YoUtUbE vIdEo..
So why would you give a shit when openai doesn't steal anything but learns from your video?
Oh no. Big corporation didn't follow the ToS of another big corporation and "stole" content that the first didn't create and paid pennies for. Clearly the greatest injustice of all time.