Reddit should upgrade web standard to stop automated scraping

Popular social media network Reddit has taken preemptive measures to preserve its material. A web standard used by the platform to restrict automated data scraping from its website will be updated on Tuesday. Reports that AI companies were circumventing the regulation to obtain material for their systems prompted this move.
Reddit’s action is noteworthy, particularly because AI corporations have been accused of harming publishers. These companies have been accused of plagiarizing publisher material to construct AI-generated summaries without acknowledgment or consent, undermining their hard work and rights.
Reddit proposed updating the Robots Exclusion Protocol, or “robots.txt,” a widely used standard for determining whether portions of a site may be scanned.
The business also claimed it would continue rate-limiting, which limits requests from one organization. It will prevent unknown bots and crawlers from scraping its website for raw data.
Recently, publishers have relied on robots.txt to prevent tech firms from exploiting their material for free to train AI algorithms and provide summaries for search queries.
Last week, content licensing company TollBit wrote to publishers that many AI businesses were scraping publisher sites by evading the web standard.
Following a Wired research, which opens a new tab, AI search company Perplexity likely overcame robots.txt web crawler blocking.
AI Weekly: Nvidia sets another record while Apple stumbles in Europe.
In June, Forbes accused opens new tab Perplexity of copying its investigative reports for generative AI systems without attribution.
On Tuesday, Reddit stated scholars and groups like the Internet Archive may still utilize its information for non-commercial purposes.

Why I sold my business to my staff

Oil prices fall and shares jump after US-Iran deal announced

Fox to buy Roku streaming firm in $22bn deal

News

Trending

Services