A big thanks to my dear friend Holden Page who read a version of this essay, and helped me improve it. — Alex
OpenAI closed new content deals last week, inking agreements with Le Monde and Prisa Media. The new OpenAI agreements didn’t arrive in a vacuum: A recent Gartner report anticipates that AI will cut search volume by 25%, impacting traffic to publishers.
The juxtaposition of the two items underscores a common worry about modern AI models, their current training data, and what they will be able to consume in the future. The easy answer to this conundrum is for AI model makers to crafts deals with publishers to access to their material, past and future.
If only it was that simple.
For OpenAI, every signed media deal is useful, certainly, but also operates as a form of public relations and precedent-setting. By working with publishers and writing some seven-figure yearly checks, the Microsoft-backed AI behemoth cuts back against criticism that it built its empire atop stolen data. At the same time OpenAI pulls the AI training data ladder up behind it.
This situation could create a world in which only the already-wealthy can create and maintain AI models. That tastes like a form of regulatory capture, and is brilliant by OpenAI in two ways. First, the deals absolves OpenAI of some of its prior data scraping sins; and, second, it ensures that it will be harder to both follow and mimic.
Keeping the future open
The OpenAI model of paying some media companies now means that startups coming up in its wake that want to challenge it may not be able to afford to do so. Upfront payments for data access could lead to cost barriers too high for startups to climb.
What we do not want is an AI model future built and run entirely by the existing technology leaders. Microsoft can afford to ensure that OpenAI can afford data. So too can Alphabet, Meta, Amazon, and Apple for their own models and the companies they back. But for a few college kids in their dorm, a few young adults in their shared flat, or even a handful of later-career technologists, the costs to get data to build an AI model that might generate material revenues in the future could be just too high.
Hunter Walk, a venture capitalist, has an idea for how to resolve the matter: An “AI Safe Harbor” that would allow “AI startups to experiment without fear of legal repercussions so long as they meet certain conditions.” It’s a worthy concept, and one that I think you should read.
After chewing over Walk’s idea and the latest market news, I thought up a way to invert the issues in a manner that seemed to resolve many of the well-explored issues that AI models, training data, and fair payment for use bring up.
It goes something like this:
Charging AI startups for upfront access to training data is unworkable, as the smallest companies have the least capital.
Instead of charging for upfront access, flip payouts to the back-end, with AI model startups and even larger entities of their ilk paying out a portion of their revenue in return for used data.
The cut would need to be material to matter, and measured against total revenues and not a lower income statement line. Something between 20% and 30% feels reasonable.
If that sounds high, keep in mind that other businesses have to pay for input materials. Apple has to pay for the metals that go into its phones, just as the companies they purchase those refined materials from pay for raw mining inputs; and the mining companies have their own costs. It feels uniquely perverse that tech companies more than happy to demand payment for their work are racing to consume our work for free because to do so is cheaper and more convenient than paying.
Theft is, after all, a perfect business in gross margin terms. All the upside, none of the icky costs.
At 20% of revenue AI model companies would certainly have lower gross margins than, say, SaaS, but that is not a lethal difference. After compute and delivery costs, AI model companies would still be incredibly lucrative, and more so than the media companies they would be dealing . A good business is no sin, even if input theft would allow that same business to be truly great.
There are other ideas to consider. Walk linked to Ben Werdmuller’s argument in favor of an ”ASCAP for AI,” using the music industry’s model of centralizing rights and reporting to allow for AI companies to interact with a single entity instead of piecemeal deals. It’s a good read and one that I recommend.
Let’s be tech-forward but fair
I am more than happy to be generous with AI companies, because I am cognizant of the importance of allowing new technologies to bubble and agitate and iterate sans too much regulation or cost. But if we want AI models to reach their full height and bring to our species the benefits that many expect, they will require a never-ending stream of content to eat. It’s in the best interest of AI model companies to pay for what they use, not only because it meets the basic concept of fairness, but also because it would make their own future work better, and thus more useful.
And, therefore, more lucrative.
Featured image via Unsplash.
As a fellow Spotify loyalist, your suggestion sounds a lot like the royalties Spotify (and others) pay to musicians.