Amazon Investigates Perplexity AI Over Potential Data-Scraping Violations

From PC Mag: Amazon Web Services is investigating Perplexity AI over its data-scraping practices after multiple news outlets, including Forbes and Wired, reported that the AI startup is swiping their web archives to train its models without consent or compensation.

An AWS rep confirmed Thursday that Amazon is looking into Perplexity's behavior, Wired reports. The rep also said all AWS clients must follow the robots.txt file instructions. Robots.txt files are typically added to websites to ask bots and web crawlers not to scrape their data, whether for generative AI tools or other purposes. PCMag, for instance, has a robots.txt that disallows scraping from Perplexity, Anthropic's Claude, and the GPTBot, to name a few.

“AWS's terms of service prohibit customers from using our services for any illegal activity, and our customers are responsible for complying with our terms and all applicable laws,” the AWS rep said in a statement.

This month, Perplexity sparked a frustrated response from Forbes over the AI firm's decision to publish AI-generated news articles that pull from human journalists' work. Forbes Chief Content Officer Randall Lane accused Perplexity of conducting "cynical theft," and further alleged that Perplexity is creating "knockoff stories" using "eerily similar wording" and "entirely lifted fragments" from its articles. Forbes is also taking issue with the lack of adequate citation and omission of the outlet's name in the AI-generated stories.

View: Full Article