{"id":4126,"date":"2025-08-05T02:24:26","date_gmt":"2025-08-05T02:24:26","guid":{"rendered":"https:\/\/jobuzo.com\/en\/perplexity-is-allegedly-scraping-websites-its-not-supposed-to-again\/"},"modified":"2025-08-05T02:24:26","modified_gmt":"2025-08-05T02:24:26","slug":"perplexity-is-allegedly-scraping-websites-its-not-supposed-to-again","status":"publish","type":"post","link":"https:\/\/jobuzo.com\/en\/perplexity-is-allegedly-scraping-websites-its-not-supposed-to-again\/","title":{"rendered":"Perplexity is allegedly scraping websites it&#8217;s not supposed to, again"},"content":{"rendered":"<div>\n<p>Web crawlers deployed by Perplexity to scrape websites are allegedly skirting restrictions, according to a new report from Cloudflare. Specifically, the report claims that the company&rsquo;s bots appear to be &ldquo;stealth crawling&rdquo; sites by disguising their identity to get around robots.txt files and firewalls.<\/p>\n<p>Robots.txt is a simple file websites host that lets web crawlers know if they can scrape a websites&rsquo; content or not. Perplexity&rsquo;s official web crawling bots are &ldquo;PerplexityBot&rdquo; and &ldquo;Perplexity-User.&rdquo; In Cloudflare&rsquo;s tests, Perplexity was still able to display the content of a new, unindexed website, even when those specific bots were blocked by robots.txt. The behavior extended to websites with specific Web Application Firewall (WAF) rules that restricted web crawlers, as well.<\/p>\n<figure class=\"caas-figure\">\n<div class=\"caas-figure-with-pb\">\n<div>\n<div class=\"caas-img-container caas-img-loader\"><noscript><\/noscript><\/div>\n<\/div>\n<\/div>\n<p><figcaption class=\"caption-collapse\"><span class=\"caption-credit\"> Cloudflare<\/span><\/figcaption><\/p>\n<\/figure>\n<p>Cloudflare believes that Perplexity is getting around those obstacles by using &ldquo;a generic browser intended to impersonate Google Chrome on macOS&rdquo; when robots.txt prohibits its normal bots. In Cloudlfare&rsquo;s tests, the company&rsquo;s undeclared crawler could also rotate through IP addresses not listed in Perplexity&rsquo;s official IP range to get through firewalls. Cloudflare says that Perplexity appears to be doing the same thing with autonomous system numbers (ASNs) &mdash; an identifier for IP addresses operated by the same business &mdash; writing that it spotted the crawler switching ASNs &ldquo;across tens of thousands of domains and millions of requests per day.&rdquo;<\/p>\n<div class=\"caas-da \">\n<div data-wf-benji-page-context='{\"pageUrl\":\"https:\/\/www.engadget.com\/ai\/perplexity-is-allegedly-scraping-websites-its-not-supposed-to-again-211110756.html\",\"spaceid\":\"1197802876\",\"site\":\"engadget\",\"hashtag\":\"news;perplexity;ai;cloudflare;gear;tech\",\"lmsid\":\"a0V0W00000HOQu8UAH\",\"lpstaid\":\"e07ea572-2b5a-41ee-b427-cace91381afa\",\"pt\":\"content\",\"pd\":\"non_modal\",\"pct\":\"story\"}' data-wf-benji-wafer-config=\"{}\" data-wf-benji-config='{\"positions\":{\"LREC1-e07ea572-2b5a-41ee-b427-cace91381afa1754360665123\":{\"id\":\"LREC1-e07ea572-2b5a-41ee-b427-cace91381afa1754360665123\",\"region\":\"index\",\"size\":[[300,250]],\"kvs\":{\"loc\":\"mid_center\"},\"path\":\"\/22888152279\/us\/eng\/ros\/dt\/us_eng_ros_dt_mid_center\"}}}' id=\"sda-LREC1-e07ea572-2b5a-41ee-b427-cace91381afa1754360665123\" class=\"wafer-benji caas-sda-benji-ad caas-sda-gam-container caas-sda-gam-container-center\" data-wf-trigger=\"onLoad\" data-wf-margin=\"100 0\">\n<p>ADVERTISEMENT<\/p>\n<div id=\"LREC1-e07ea572-2b5a-41ee-b427-cace91381afa1754360665123\">\n<p>Advertisement<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"internal-linking-related-contents\"><a href=\"https:\/\/jobuzo.com\/en\/12-weeks-jail-for-school-it-support-technician-who-took-upskirt-videos-of-teachers\/\" class=\"template-1\"><span class=\"cta\">News :<\/span><span class=\"postTitle\">&lt;div&gt;12 weeks' jail for school IT support technician who took upskirt videos of teachers&lt;\/div&gt;<\/span><\/a><\/div><p>Engadget has reached out to Perplexity for comment on Cloudflare&rsquo;s report. We&rsquo;ll update this article if we hear back.<\/p>\n<p>Up-to-date information from websites is vital to companies training AI models, especially as service&rsquo;s like Perplexity are used as replacements for search engines. Perplexity has also been caught in the past circumventing the rules to stay up-to-date. Multiple websites reported in 2024 that Perplexity was still accessing their content despite them forbidding it in robots.txt &mdash; something the company blamed on the third-party web crawlers it was using at the time. Perplexity later partnered with multiple publishers to share revenue earned from ads displayed alongside their content, seemingly as a make-good for its past behavior.<\/p>\n<p>Stopping companies from scraping content from the web will likely remain a game of whack-a-mole. In the meantime, Cloudflare has removed Perplexity&rsquo;s bots from its list of verified bots and implemented a way to identify and block Perplexity&rsquo;s stealth crawler from accessing its customers&rsquo; content.<\/p>\n<\/div>\n<p><sub><\/sub><\/p>\n<div>Perplexity is allegedly scraping websites it&rsquo;s not supposed to, again<\/div>\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Web crawlers deployed by Perplexity to scrape websites are allegedly skirting restrictions, according to a new report from Cloudflare. Specifically, the report claims that the company&rsquo;s bots appear to be &ldquo;stealth crawling&rdquo; sites by disguising their identity to get around robots.txt files and firewalls. Robots.txt is a simple file websites host that lets web crawlers&#8230;<\/p>\n<p class=\"more-link-wrap\"><a href=\"https:\/\/jobuzo.com\/en\/perplexity-is-allegedly-scraping-websites-its-not-supposed-to-again\/\" class=\"more-link\">Read More<span class=\"screen-reader-text\"> &ldquo;Perplexity is allegedly scraping websites it&#8217;s not supposed to, again&rdquo;<\/span> &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":4127,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-4126","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts\/4126","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/comments?post=4126"}],"version-history":[{"count":0,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts\/4126\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/media\/4127"}],"wp:attachment":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/media?parent=4126"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/categories?post=4126"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/tags?post=4126"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}