Google Crawler Documentation Has A New IP List
Google updated their Googlebot and crawler documentation to add a range of IPs for bots triggered by users of Google products. The names of the feeds switched which is important for publishers who are whitelisting Google controlled IP addresses. The change will be useful for publishers who want to block scrapers who are using Google’s cloud and other crawlers not directly associated with Google itself.
New List Of IP Addresses
Google says that the list contains IP ranges that have long been in use, so they’re not new IP address ranges.
There are two kinds of IP address ranges:
- IP ranges that are initiated by users but controlled by Google and resolve to a Google.com hostname.
These are tools like Google Site Verifier and presumably the Rich Results Tester Tool. - IP ranges that are initiated by users but not controlled by Google and resolve to a gae.googleusercontent.com hostname.
These are apps that are on Google cloud or apps scripts that are called from Gooogle Sheets.
The lists that correspond to each category are different now.
Previously the list that corresponded to Google IP addresses was this one: special-crawlers.json (resolving to gae.googleusercontent.com)
Now the “special crawlers” list corresponds to crawlers that are not controlled by Google.
“IPs in the user-triggered-fetchers.json object resolve to gae.googleusercontent.com hostnames. These IPs are used, for example, if a site running on Google Cloud (GCP) has a feature that requires fetching external RSS feeds on the request of the user of that site.”
The new list that corresponds to Google controlled crawlers is:
user-triggered-fetchers-google.json
“Tools and product functions where the end user triggers a fetch. For example, Google Site Verifier acts on the request of a user. Because the fetch was requested by a user, these fetchers ignore robots.txt rules.
Fetchers controlled by Google originate from IPs in the user-triggered-fetchers-google.json object and resolve to a google.com hostname.”
The list of IPs from Google Cloud and App crawlers that Google doesn’t control can be found here:
https://developers.google.com/static/search/apis/ipranges/user-triggered-fetchers.json
The list of IP from Google that are triggered by users and controlled by Google is here:
https://developers.google.com/static/search/apis/ipranges/user-triggered-fetchers-google.json
New Section Of Content
There is a new section of content that explains what the new list is about.
“Fetchers controlled by Google originate from IPs in the user-triggered-fetchers-google.json object and resolve to a google.com hostname. IPs in the user-triggered-fetchers.json object resolve to gae.googleusercontent.com hostnames. These IPs are used, for example, if a site running on Google Cloud (GCP) has a feature that requires fetching external RSS feeds on the request of the user of that site. ***-***-***-***.gae.googleusercontent.com or google-proxy-***-***-***-***.google.com user-triggered-fetchers.json and user-triggered-fetchers-google.json”
Google Changelog
Google’s changelog explained the changes like this:
“Exporting an additional range of Google fetcher IP addresses
What: Added an additional list of IP addresses for fetchers that are controlled by Google products, as opposed to, for example, a user controlled Apps Script. The new list, user-triggered-fetchers-google.json, contains IP ranges that have been in use for a long time.Why: It became technically possible to export the ranges.”
Read the updated documentation:
Verifying Googlebot and other Google crawlers
Read the old documentation:
Archive.org – Verifying Googlebot and other Google crawlers
Featured Image by Shutterstock/JHVEPhoto
Source link : Searchenginejournal.com