I started using ChatGPT regularly in mid-2023. Back then, I tried to see if OpenAI believed in a “transparency” principle because I wanted to know more about the source of data used by OpenAI to develop its systems. I’m still looking for that answer because using scraped data for the foundation of machine learning surely must have copyright and intellectual property implications. That’s what the New York Times is arguing. There has to be a huge class action brewing eventually for the origins of data scraped because human writers have not given their permission for their writing to be used. No writer has even been acknowledged in the distribution of their ideas and thoughts that make up ChatGPT4.
As I turn the page of another journal I’m using to track my use of several AIs, I share a “4P Framework” with people (one of the four Ps) during my training sessions. Since Amazon, Apple, Google, Meta, and Microsoft are embedding Al that is sourced from humans in all their AI products, they are extracting value from these services at the expense of the very people who are likely to be thrown out of jobs that will ultimately cease to exist. This looks like an old playback, one used by social media and other technology companies have for decades.
I’ve seen the results of the “release it fast” imperative with social networking apps. If people like what they’re getting they let big tech ride roughshod over governments and individuals. In the case of scraping and educating large language models, surely these corporations have an obligation and responsibility to be held accountable for violations of copyright and intellectual property. It looks like the EU is taking the first steps in this regard because Meta.ai is restricting functionality of its Metaview app when used with Ray-Ban Meta Smart Glasses.
But there does not appear to be any legislation that forces the Large Language Modelers to reveal data scraping logs or to show the source of training data sets. And I wonder if that will ever happen. A lot of human intelligence was used to bring us to this very functional threshold of generative artificial intelligence. Surely there is a creative class with a case worth hearing in a Commercial Court.