ai training data Archives - Tech | Business | Economy

OpenAI, Anthropic May Use Investor Funds to Tackle Growing Copyright Lawsuits

Joan Aimuengheuwa — Wed, 08 Oct 2025 07:38:29 +0000

OpenAI and Anthropic are reportedly weighing the option of using investor money to cover potential multibillion-dollar copyright settlements.

A Financial Times report revealed that both companies are exploring alternative ways to handle the risks associated with how their AI models were trained. Copyright owners, including authors, publishers, and media houses, have filed more than a dozen lawsuits against tech companies including OpenAI, Microsoft, Meta, and Anthropic, accusing them of using protected works without authorisation to train their large language models.

To manage these legal threats, OpenAI has reportedly partnered with Aon, one of the world’s leading insurance firms, to secure coverage worth up to $300 million for emerging AI-related risks. However, some sources told the Financial Times that the actual figure could be lower, and regardless, it still falls far short of what would be required to cover the potential damages from ongoing lawsuits.

Kevin Kalinich, Aon’s Global Cyber Risk Head, explained that the insurance industry itself is finding it difficult to match the scale of risk caused by AI model providers. “The insurance sector broadly lacks enough capacity for (model) providers,” he said.

Because of this gap, OpenAI is reportedly considering “self-insurance”, essentially setting aside investor capital in a protected pool to absorb possible legal costs. Discussions have also surfaced about creating a “captive,” an internal insurance structure used by large firms to manage risks that the traditional market cannot handle.

Anthropic appears to be taking a similar route. According to the Financial Times, the company is using part of its own funds to cover a $1.5 billion settlement that was preliminarily approved by a California federal judge last month.

The case was filed by a group of authors who alleged that their works were used to train Anthropic’s AI system, Claude, without consent.

The number of copyright claims is forcing AI companies—and their backers—to confront questions about financial accountability and transparency. If investor funds are being used to offset legal risks, governance issues inevitably follow: who decides how much to reserve for potential liabilities, and how are investors’ interests safeguarded?

Analysts believe these developments could change how AI startups raise and allocate capital. Investors may soon demand clearer disclosures on data sources, litigation exposure, and risk management frameworks before funding new ventures.

Meanwhile, the U.S. Copyright Office is still assessing whether training AI systems on copyrighted content amounts to infringement, while the European Union’s AI Act could compel firms to reveal their training datasets, opening another front of legal vulnerability for AI developers.

Neither OpenAI, Anthropic, nor Aon has commented on the report.

The post OpenAI, Anthropic May Use Investor Funds to Tackle Growing Copyright Lawsuits appeared first on Tech | Business | Economy.

Cloudflare Blocks AI Bots, Launches Paywall to Help Publishers Get Paid

Joan Aimuengheuwa — Tue, 01 Jul 2025 14:04:55 +0000

Cloudflare has launched a new enforcement tool designed to stop unauthorised scraping of online content by artificial intelligence (AI) firms, a move that could dramatically alter the economics of the internet.

As of July 2025, the internet infrastructure giant now blocks all AI crawlers by default unless they’ve either paid for access or received explicit permission from the content owner.

The company’s new product, Pay Per Crawl, lets publishers charge AI firms a fee every time they crawl their content. If a crawler doesn’t pay, it gets hit with a “402 Payment Required” response, a rarely used HTTP status code that could become the foundation of a new internet revenue model.

Cloudflare powers nearly 20% of the internet. This shift, if widely adopted, could cut off AI companies from training on a vast portion of the web, unless they pay.

Major publishers, including Condé Nast, TIME, BuzzFeed, The Atlantic, Gannett, and others such as Reddit, Pinterest, and Stack Overflow, have already signed on.

Many of them have seen advertising and referral traffic dwindle in recent years as AI-generated summaries and chatbot answers have increasingly replaced the need to click through to the original source.

“This is just the beginning of a new model for the internet,” said Stephanie Cohen, Cloudflare’s Chief Strategy Officer. “The change in traffic patterns has been rapid, and something needed to change.”

That change is reflected in the data. According to Cloudflare, Google’s crawl-to-click ratio has collapsed from 6:1 to 18:1 in the last six months, suggesting users are increasingly finding what they need directly within the search results, especially via AI Overviews. OpenAI’s ratio is far more extreme at 1,500:1.

Historically, search engines indexed the web with the tacit understanding that referrals would follow, generating value for content creators. But AI firms have upended that agreement, lifting vast amounts of content to train their models and offer summarised responses, all while bypassing the original creators entirely.

Some firms go further by ignoring established web standards like robots.txt, which is intended to block unauthorised scraping. Despite publishers’ attempts to draw boundaries, many AI companies insist they haven’t broken any laws.

The legal pushback has already begun. The New York Times sued OpenAI and Microsoft in late 2023 for copyright infringement. Reddit recently took legal action against Anthropic for allegedly harvesting user comments without permission, even though scraping was explicitly prohibited via robots.txt. BBC and Ziff Davis have filed similar lawsuits.

This escalating issue is happening in parallel with massive drops in web engagement. In the U.S., 60% of searches now end without a single click, and click-through rates have plunged by 30% between 2024 and 2025.

Publishers are being squeezed at both ends, their traffic is drying up, and their content is being repurposed without compensation.

Cloudflare’s CEO Matthew Prince framed the initiative as both a defensive and visionary move: “This is about safeguarding the future of a free and vibrant internet.” He pointed at a long-term plan to create a transparent and open marketplace for content access, where AI firms would be required to negotiate fair rates for crawling and training.

For now, Pay Per Crawl is available as a private beta. Cloudflare handles both authentication and payments, serving as the intermediary between web publishers and AI companies.

The post Cloudflare Blocks AI Bots, Launches Paywall to Help Publishers Get Paid appeared first on Tech | Business | Economy.

BBC Threatens Legal Action Against Perplexity Over Unauthorised Use of Its Content

Joan Aimuengheuwa — Fri, 20 Jun 2025 07:46:28 +0000

The British Broadcasting Corporation (BBC) has issued a legal threat to Perplexity, an AI-powered search startup, demanding that it cease using BBC content to train its AI models.

In a letter addressed to Perplexity CEO Aravind Srinivas, the BBC accused the company of scraping its online material without consent and warned of possible legal consequences if its demands were not met.

Specifically, the BBC is asking for the deletion of all copies of its content used for training purposes, an end to the practice, and a proposal for financial compensation. Failure to comply, it says, could lead to an injunction.

This escalation places Perplexity at the centre of a deepening rift between traditional media outlets and tech companies leveraging journalistic content to power artificial intelligence systems.

“The BBC’s letter outlines a potential injunction unless Perplexity halts its content scraping activities, purges existing data from its models, and presents a proposal for financial compensation,” reported Financial Times, which had access to the communication.

Perplexity’s reaction was quick as it said in a statement also quoted by FT, “The BBC’s claims are manipulative and opportunistic,” adding that the broadcaster “has a fundamental misunderstanding of technology, the internet and intellectual property law.”

This isn’t the first time Perplexity has drawn the ire of publishers. In October 2024, The New York Times served the company with a cease-and-desist notice over similar allegations. The paper demanded a full stop to the use of its articles in AI training and sought answers on how Perplexity bypassed anti-scraping mechanisms.

Other major publishers such as Forbes, Wired, and Axel Springer have had similar complaints. While Perplexity insists it is indexing publicly accessible content rather than scraping it for training, companies are not convinced.

To manage the case, Perplexity launched a Publishers Program in mid-2024, offering revenue-sharing arrangements to selected media outlets.

Among the early participants are TIME, Der Spiegel, The Texas Tribune, and Fortune. These partners receive a share of ad revenue whenever their content appears in responses generated by the platform.

We don’t know yet if the BBC was ever approached to join the scheme—or declined.

The broadcaster’s demand is a defensive legal move which stresses the need for a transparent and enforceable licensing structure for how AI firms use journalistic content.

Perplexity, which is backed by high-profile investors including Amazon’s Jeff Bezos, continues to defend its operations. The company maintains it is not infringing on intellectual property rights and blames the issue on misinterpretations of how its system works.

The post BBC Threatens Legal Action Against Perplexity Over Unauthorised Use of Its Content appeared first on Tech | Business | Economy.