Yandex Source Code Leak and what it means for SEO pt.1 — Mike King // iPullRank

Mike King, Founder and Managing Director at iPullRank, talks about the Yandex source code leak. Yandex, the world’s fourth largest search engine, suffered a major source code leak when a former employee published parts of the company’s internal repository online. While Yandex isn’t Google, there is a lot SEOs can learn about how a modern search engine is built from reviewing the codebase. Today, Mike discusses the Yandex source code leak and what it means for SEO.
About the speaker

Mike King

iPullRank

 - iPullRank

Mike is Founder and Managing Director at iPullRank

Show Notes

  • 05:21
    Examining the Yandex source code
    The Yandex source code leak provides insight into how a search engine operates. By examining this codebase, we can expand our understanding of search engine ranking factors and help to improve our search engine optimization practices.
  • 08:13
    Analysis of Yandex search engine codebase
    Within the codebase, there were some missing directories that were referenced. However, information such as the initial weights for a series of ranking factors in their search engine's algorithm was discovered, which can provide a better understanding of modern search.
  • 12:42
    Yandex's lack of javascript rendering and its impact on relevance
    Yandex has a beta version of JavaScript rendering for its search engine. This means that Yandex can only be so relevant as a large portion of the web uses React, Angular, and Vue, which requires JavaScript rendering to have a more robust understanding of the web.
  • 15:00
    Phrase
    Google has a limitation of 32 grams in their phrases while Yandex has a limit of 64 grams. Despite Yandex's longer phrase limit, the use of BERT in their indexing process could improve relevance, as queries are turned into embeddings rather than combinations of words.
  • 18:47
    Exploring neural ranking algorithms in Yandex's codebase
    Mike is still investigating Yandex's search engine's ranking system and trying to understand how it works. Hes also started a Slack community called "The Index" to invite other technically inclined SEOs to help dig into the code and build wiki-style documentation.
  • 20:24
    Opportunity to join the Yandex decoding effort
    Mike is inviting interested SEOs to join a Slack community called "The Index" to contribute to the discussion on the Yandex search engine. The GitHub for the project is called the "Yandex decoder ring" and is publicly accessible via a pull request.

Quotes

  • "So much of our understanding of SEO is based on things we learned in 2003. But Google is just so much more sophisticated now." -Mike King, Founder, iPullRank

  • "Understanding that by explicitly seeing in the Yandex codebase that there were about 18,000, different ranking factors allows us to expand our thinking about what Google might be considering." -Mike King, Founder, iPullRank

  • "People were using the AOL leak data from 2006 for a good five years to build CTR models for Google." -Mike King, Founder, iPullRank

  • "The way that an index is structured is it's not like you just go to one computer and hit a database, and then get all the documents back. It's distributed across 1000s of computers." -Mike King, Founder, iPullRank

About the speaker

Mike King

iPullRank

 - iPullRank

Mike is Founder and Managing Director at iPullRank

Up Next: