Controlling Googlebot thru Log files analysis — Serge Bezborodov // JetOctopus

Serge Bezborodov, CTO and Co-Founder of JetOctopus, talks about log files and Googlebot. Unlike typical SEO crawlers, Googlebot has the ability to crawl pages that may not be present in your website's existing structure or indexed. What we want to avoid is valuable pages not being crawled and indexed, while Googlebot’s crawl budget is potentially wasted on outdated or low-quality pages. Today, Serge discusses controlling Googlebot through log file analysis.

About the speaker

Serge Bezborodov

JetOctopus

Serge is CTO and Co-Founder of JetOctopus

Jet Octopus Free Trial

Part 1 Controlling Googlebot thru Log files analysis — Serge Bezborodov // JetOctopus
Part 2Interlinking structure’s impact on Googlebot — Serge Bezborodov // JetOctopus
Part 3JavaScipt that doesn’t hurt SEO performance — Serge Bezborodov // JetOctopus

Show Notes

·02:10 - Large websites and crawlability issues For large websites, Googlebot often fails to crawl all of the pages, resulting in only a fraction of the website being indexed and fewer impressions from the SERP. By increasing the crawlability of your website, you can improve indexation and gain more organic traffic from Google. ·03:20 - When you should start thinking about your websites crawl budget When your website reaches 10,000 pages or more, it becomes increasingly important to pay attention to your crawl budget. For instance, if youve created one million pages through programmatic SEO, you must work with your technical SEO and log file analysis. ·04:22 - Understanding website crawl patterns Within Google Search Console, you can see your crawl budget in the last 90 days. To identify which URLs were crawled by Googlebot, you will need to ask your developer team for the access logs of the website. ·05:23 - Analyzing your websites crawl budget state with access logs Access logs contain information about how Googlebot, other bots, and your end users visit your website. With access logs, you can determine the number of pages crawled by Googlebot, how many pages have been crawled just once, etc. ·06:30 - Leveraging log files and crawl data for actionable insights In addition to indexed and existing pages, Googlebot can crawl outdated and bug-generated pages on your website. By analyzing log files and combining them with crawl data, you can effectively prioritize valuable pages for Googlebot's crawling and indexing process. ·09:50 - How to improve your crawl budget usage Analyze what Googlebot crawls to identify and remove u ecessary pages. For large websites with generated content, prioritizing quality and valuable content is crucial as even well-designed, AI-generated pages may not be indexed if Google deems them irrelevant to users.

Episode Summary

·"One of the biggest problems when it comes to big websites is, Googlebot doesn't crawl all of your website. You can generate millions of pages, but Googlebot will only crawl dozens of them." -Serge Bezborodov, Co-Founder, JetOctopus ·"Increasing the crawl budget of your website is the first stage of any technical optimization to make your indexation better and get more organic traffic from Google." -Serge Bezborodov, Co-Founder, JetOctopus ·"When your website has more than 10,000 pages, you can start analyzing your crawl budget. And, the more pages you have, the more attention you should pay to your crawl budget." -Serge Bezborodov, Co-Founder, JetOctopus ·"If you have 1 million pages created, you should definitely work with your technical SEO and your log files analysis 100%." -Serge Bezborodov, Co-Founder, JetOctopus ·"Googlebot wants to crawl and index only valuable content for Google users. Quality issues here is a very important topic for big websites with generated content." -Serge Bezborodov, Co-Founder, JetOctopus