The NYT, AI, and how the internet could change in 2024
2023 was very much the year of AI. After the birth of Chat GPT, Google Bard, and countless other large-language model (LLM) chat-bots, artificial intelligence entered the public consciousness in a real way for the first time. This year though, things are set to turn up a notch. As the New York Times kicks off the year with a landmark copyright lawsuit, 2024 could very much be the year that the internet landscape and journalism change forever.
Firstly, it’s worth outlining just what aspects of artificial intelligence are having the most significant impact going into this new year. For years, machine learning has been a huge part of how the internet works with everything from advertising to concert tickets learning from user behaviours to improve and personalise online experiences. Last year though, all of that changed. The release of Chat GPT to the public on 30 November 2022 revolutionised how the public and businesses saw the technology. All of a sudden, (LLMs) went from being ‘helpful add-ons’ to potential job stealers and doomsday causers. Microsoft quickly acquired large stakes of Open AI and Alphabet CEO Sundar Pichai vowed to “make the competition dance” with the release of its competitor, Google Bard.
These LLMs are a significant change from what went before. Their transformer models have the capability to learn from far larger data sets than was previously the case, ‘tokenise’ that data, and respond to queries in increasingly human-like ways. Without going into too much technical detail, these models rake huge amounts of data from millions of websites, train themselves on that data, and use it to respond to user queries and questions.
Inevitably, after an initial surge in popularity and excitement, publishers, content creators, and basically anyone producing content on the internet became quickly concerned with how copyright might be jeopardised in this scenario. The billions of dollars so far invested in the industry have been almost entirely predicated on the argument of Sam Altman (Open AI CEO) and others that the use of this data falls under ‘fair use’ exemptions from copyright law. In the United States though, outcomes in cases like these are famously hard to predict and almost identical cases often have inconsistent results depending on judges and states.
The ‘fair use’ argument that AI bosses use is based on a series of factors. Primarily, the idea is that although these models are trained off millions of websites and articles written by others and covered by copyright law, they are not directly reproducing it. Open AI and others argue that their use is ‘transformative’, much like a parody of a song or a book review.
A common case referenced here is when Google Books was sued by The Authors Guild in September 2005. In that case, judges found with Google. This was based on the idea that the company was not building a ‘book substitute’ but instead a search engine and database for different publications.
As you might have guessed by now, many publishers wholly disagree with Open AI’s reading of the law and on the 27th of December, the New York Times made the first legal move after months of attempted negotiations. The Times said that those negotiations ‘had not produced a resolution’, whilst Open AI said that it was ‘surprised and disappointed’.
It’s almost impossible to predict an outcome in this specific case. Open AI and Microsoft are, by all accounts, extremely reluctant to settle with The Times for fear of thousands of different publishers following suit and queuing up for pay-outs. More likely, it seems, is the eventual establishment of some long-term model to repay writers and publishers.
Already, times are tougher than they have ever been for news sites and journalists. The New York Times is one of the few organisations that has managed to establish a sustainable subscription model in the industry and other newspapers from around the world have experimented with different models to try and survive. Some, such as The Independent, went online only as early as 2016 but most established similar online subscription models.
Clearly then, a world in which users can ask chat bots for a summary of the news or even to reproduce entire articles that would otherwise be behind a paywall is extremely problematic for the industry. In this sense, Chat GPT and others are producing clear alternatives to news products and not falling under ‘fair use’ exemptions.
This is just one way in which natural language models are set to transform the internet as we know it in the coming months and years. An entire industry that has been built on search engine optimisation and referral links is about to be shaken up more than could have been imagined just 18 months ago. If users are simply interacting with chatbots, they will no longer have to use search engines such as Google to find information.
It is also true that there are still more questions than answers. If those language models continue to produce the same amount of content for websites as they are at the moment, will their training data be compromised? How will advertising adapt? How are these chatbots even monetizable?
So, the outcome of this particular lawsuit is up in the air and is likely to remain so for months. What is sure though, is that 2024 will see the internet change in the most significant way since the advent of social media.