Google On How Googlebot Handles AI Generated Content

Vernon August 28, 2023

0 3 minutes read

Google On How Googlebot Handles AI Generated Content

Google’s Martin Splitt was asked how Googlebot’s crawling and rendering was adapting to the increase in AI generated content.

Martin’s answer provided insights into how Google handles AI generated content and the role of quality control.

Googlebot Webpage Rendering

Webpage rendering is the process of creating the webpage in a browser by downloading the HTML, images, CSS and JavaScript then putting it all together into a webpage.

Google’s crawler, Googlebot, also downloads the HTML, images, CSS and JavaScript files to render the webpage.

How Google Handles AI Generated Content

The context of Martin’s comments were in a webinar called Exploring the Art of Rendering with Google’s Martin Splitt, which was produced by Duda.

One of the audience members asked the question about whether the large amount of AI content had an effect on Google’s ability to render pages at the point of crawling.

Martin offered an explanation but he also added information about how Google decides at crawl time whether a webpage is low quality and what Google does after a determination.

Ammon Johns asked the question, which was read by Ulrika Viberg.

Here is the question:

“So, we have one from Ammon as well, and this is something that is talked about a lot.

I see it a lot.

They said, content production increases due to AI, putting increasing loads on crawling and rendering.

Is it likely that rendering processes might have to be simplified?”

What Ammon apparently wants to know is if there are any special processes happening in response to the AI content in order to deal with the increased crawling and rendering load.

Martin Splitt replied:

“No, I don’t think so, because my best guess is…”

Martin next addresses the obvious issue with AI content that SEOs wonder about, which is detecting it.

Martin continued:

“So we are doing quality detection or quality control at multiple stages, and most s****y content doesn’t necessarily need JavaScript to show us how s****y it is.

So, if we catch that it is s****y content before, then we skip rendering, what’s the point?

If we see, okay, this looks like absolute.. we can be very certain that this is crap, and the JavaScript might just add more crap, then bye.

If it’s an empty page, then we might be like, we don’t know.

People usually don’t put empty pages here, so let’s at least try to render.

And then, when rendering comes back with crap, we’re like, yeah okay, fair enough, this has been crap.

So, this is already happening. This is not something new.

AI might increase the scale, but doesn’t change that much. Rendering is not the culprit here.”

Quality Detection Applies To AI

Martin Splitt did not say that Google was applying AI detection on the content.

He said that Google was using Quality Detection at multiple stages.

This is very interesting because Search Engine Journal published an article about a quality detection algorithm that also detects low quality AI content.

The algorithm was not created to find low quality machine generated content. But they discovered that the algorithm automatically discovered it.

Much about this algorithm tracks with everything Google announced about their Helpful Content system which is designed to identify content that is written by people.

Danny Sullivan wrote about the Helpful Content algorithm:

“…we’re rolling out a series of improvements to Search to make it easier for people to find helpful content made by, and for, people.”

He didn’t just mention content written by people once though. His article announcing the Helpful Content system mentioned it three times.

The algorithm was designed to detect machine generated content that also detects low quality content in general.

The research paper is titled, Generative ****** are Unsupervised Predictors of Page Quality: A Colossal-Scale Study.

In it the researchers observe:

“This paper posits that detectors trained to discriminate human vs. machine-written text are effective predictors of webpages’ language quality, outperforming a baseline supervised spam classifier.”

Circling back to what Martin Splitt said:

“…we are doing quality detection or quality control at multiple stages…

So, this is already happening. This is not something new.

AI might increase the scale, but doesn’t change that much.”

What Martin seems to be saying is that: