SEO

Satan, SEO and Subdomains – VOL VI. – Horsemen of the SEOcalypse

Zdeněk Nešpor

SEO

After a few trillion redirects and the deletion of tens of thousands of subdomains, there remains a question that we have not yet answered. How is it actually? Are subdomains good or bad?

Introduction

Horsemen of the SEOcalypse is the sixth installment of the "Satan, SEO and Subdomains" text series. In the previous part, subtitled "SEOfirot of the Tree of Life", we described the issue of the category tree.

Let's quickly review the overview of the main types of Heureka subdomains:

  • Basic and System (www, blog, info) ~20 subdomains.
  • Mobile subdomain (m) 1 subdomain.
  • Category subdomains (laptops, mobile-phones, electronics) ~2500 subdomains.
  • Brand subdomains (sony, nikon, apple) ~60,000 subdomains.
  • Parametric subdomains (gaming-notebook, android-phones, xbox-360) ~1000 subdomains.

This text contains a summary of everything we've managed to do in five years of work. And it discusses the subdomain issue in more detail.

Hell of a Ride

We started and planned the work to "de-subdomainise" Heureka in early 2018. At that time, there were approximately 65,000 subdomains on Czech Heureka. Let's briefly recap the whole process until 2022.

Parametric sections took us approximately 18 months. We finally managed to resolve them in mid-2019. We got rid of 1.5% of subdomains and a lot of duplicate pages. Seemingly small progress and simplification of the user journey on the site, but one that has had a very nice impact on conversion rates and revenue. Read more: Satan, SEO, and Subdomains – VOL II. – Cancer SEOtherapy.

During 2020, after a year of preparation and six months of testing, brand nooks and especially brand categories disappeared from Heureka. We made a big dent by removing and redirecting 92.3% of subdomains. In the end, the work dragged on until mid-2022 due to unexpected complications. The business outcome was slightly positive, but the key outcome was maintaining the crawling intensity from Googlebot. The latter began crawling useful pages more frequently and indexing new content more quickly, updating the information displayed in the SERPs. Read more: Satan, SEO and Subdomains – VOL III – Controlled SEOcide.

Between 2018 and 2021, there was a gradual deployment of responsive parts of Heureka, until eventually the mobile version of the site moved to a fully responsive desktop. One subdomain that drove us, along with MFI, crazy. In this case, too, success was finally achieved. The biggest changes and their evaluation took place during the global pandemic of COVID-19. The data was less reliable, but in principle all indications were that everything was successful without losing valuable positions and organic traffic. Read more: Satan, SEO and Subdomains – VOL IV. – One To Rule MFI.

In the end, it was not the turn of the category subdomains, whose move would most likely fall in 2020–2022. We changed our plans due to major preparations for the launch of One Platform. A single platform and category tree for all Heureka Group countries, which will also eventually get rid of the subdomains. It's just going to happen a little later. Probably by 2025. Read more: Satan, SEO and Subdomains – VOL V. – SEOfirot of the Tree of Life.

Secrets from the Confessional

The attentive reader may have noticed that we have not yet mentioned a couple of other subdomains in the "basic, systems, and information" area. These include darky.heureka.cz, blog.heureka.cz, certifikace.heureka.cz, inspirace.heureka.cz or onas.heureka.cz.

We have also worked with them, but mostly it was nothing worth talking about in terms of SEO. Worth mentioning is the redesign of the gift guide, which you can read more about in the article (CZ only): https://www.heurekadevs.cz/nove-darky-na-heurece. We managed to close the old blog.heureka.cz on WordPress and move it to heureka.group and so on. It has always been rather minor and a few dozen redirects. A similar fate has already befallen and will probably befall some similar smaller subdomains.

But these do not bother us and never have. In all of these cases, the subdomains were used appropriately. On sites whose content is often thematically unrelated to the content of the main domain1 and which run on various CMS managed by external suppliers. Let's focus on this topic a bit now.

Resolving the Eternal Battle of Subdomains and Directories

So are subdomains or directories better? You'll probably find people who are adamantly convinced of one or the other. As with many things in SEO, it always simply depends on the specific situation. Subdomains can be useful for several purposes. For example:

  1. Creating space for a service or interface. For example, a mail server (mail.domena.cz) or an API (api.domena.cz).
  2. Separating two thematically different domains or services for different customers. For example, www.domena.cz will run an e‑shop with white goods and eco.domena.cz will run an educational project about ecology. Or on wholesale.domena.cz there will be an administration interface for the B2B segment.
  3. Creating a test site (test.domena.cz) or staging environment (stage.domena.cz). For this one you need to be careful that nothing leaks out and the content is not accessed by search engine robots.
  4. One of the typical uses of subdomains is to separate different technologies. I have a customized website at www.domena.cz. Then I run a WordPress blog on blog.domena.cz.
  5. They often use subdomains to make multilingual sites. When cs.domena.cz runs the Czech version, en.domena.cz runs the English version and so on.

The last two points are already somewhat on the edge, because both can be easily solved and have within the directory structure. Alternatively, the subdomain can be grafted2 so that it looks like the directory of the main domain blog.domena.cz -> www.domena.cz/blog. Language mutations are the subject of many other discussions, let's not go into that now.

The last two points are already somewhat on the edge, because both can be easily solved and have within the directory structure. Alternatively, the subdomain can be routed2 so that it looks like the directory of the main domain blog.domena.cz -> www.domena.cz/blog. Language mutations are the subject of many other discussions, let's not go into that now.

The classic directory structure www.domena.cz/kategorie or www.domena.cz/blog is more common and certainly more suitable for a normal website. The three big benefits of a directory structure are clearer internal linking, building a stronger link profile, and overall easier management and elimination of subdomain confusion, as the search engine sees subdomains as separate sites. Or is it? This is discussed in the following section.

Circles of Hell

We couldn't find much clear and relevant information on how Google perceives subdomains. In 20073, an article "Subdomains and subdirectories“4 by Matt Cutts5, explained Google's use of "host crowding." That is, it displays multiple search results from a single domain because it includes pages from subdomains. The same article also states that Google is trying to eliminate host crowding.

Host crowding was on the rise at the time, and many very effective black hat techniques were still working in SEO.6 Basically, it was replicating a well-ranking site (www.domena.cz) to one or more subdomains (replica1.domena.cz, replica2.domena.cz…). And subsequently, many identical results were displayed in the SERP underneath only from one site and its duplicates.

In our country, this technique was called "SERP wallpapering".7 That is, covering as much area and as many results as possible for a particular search phrase. Thanks to subdomains, Heureka excelled in this "wallpapering" and benefited from it for a long time. Not to mention that Google was already struggling with this and started to limit guest crowding at the end of 2007 (the year Heureka was founded). Anyway, this is probably the only clear indication that Google viewed subdomains as separate sites until then.

We'll take a big leap now to 2019, when Google released a "site diversity" core update.8 The update focused on diversity of results so that no more than two results from the same site would show up for most search queries. The site diversity system also hints at the issue of subdomains.

We quote from the Google Developers site from the section on Site diversity9:

„Site diversity generally treats subdomains as part of a root domain. IE: listings from a subdomain (subdomain.example.com) and the root domain (example.com) will all be considered from the same single site. However, sometimes subdomains are treated as separate sites for diversity purposes when deemed relevant to do so.“

Next, we have some ambiguous hints from John Mueller of Google. 

John Mueller's quote from the Google Search Central video10:

…„Some servers make it easier to set up parts of a website as subdirectories. That’s fine for us. This helps us with crawling since we understand that everything’s on the same server and crawl it in a similar way. … You’ll need to verify subdomains separately in Search Console… We do have to learn how to crawl them separately but for the most part that’s just a formality for the first few days.“ 

Quoting John Mueller from the Google Webmaster Central hangout11:

„In general, we see these the same. I would personally try to keep things together as much as possible. So if it’s the same site, then try to put them on the same site essentially, and use subdomains where things are really kind of slightly different.“

Quoting John Mueller from Google SEO office-hours12:

„We, from our point of view, when we talk with the search quality team, they say subdomains and subdirectories are essentially equivalent. You can put your content however you want. … With some kinds of websites, it’s also something where we might treat things on a subdomain slightly differently because we think maybe this is more like a separate website versus all of the same website.“

Let us summarise the information gained from previous sources of more recent dates:

  • Sometimes Google treats subdomains as one website and sometimes as separate websites.
  • Directories help when first crawling a site because it is clearer that it is a functional unit and everything can be crawled in a similar way (easier to identify crawling patterns). But it doesn't matter, because, within a few days, the crawler will also get to grips with subdomains.
  • Google sees subdomains and directories the same way and works with them the same way. Except when it perceives them differently.
  • Directories and subdomains are equivalent. But for some sites, we can treat subdomains as separate sites.

No one is probably much wiser from this vague information. If we were to sum it up in one sentence, it would go something like this, "It doesn't matter if you use directories or subdomains, it's the same, except in undefined situations where it's not the same." Welcome to the world of SEO.

Reading Sacred Texts

Reading information and statements like this sometimes makes you want to quit your job and go kill yourself. But a few things can be gleaned, as they say, "between the lines." Almost all of the sources cited mention or imply in various contexts that directories are preferable and easier to manage. That it is probably better to keep everything within a single (sub)domain. And also that subdomains are more likely to bring disadvantages.

Furthermore, we often find mention of the Google Search Console tool, where it is necessary to verify subdomains "separately". This is probably where the myth of separately perceived subdomains comes from. If we ignore the fact that it is already possible to create Domain Level properties13, this information does not confirm anything. Rather, it falls into the "subdomain management is more complicated" category.

The strange equivalence of subdomains and directories also has its explanation. The search engine really doesn't care whether someone runs a blog on blog.domena.cz, uplna-hovadina.domena.cz, www.domena.cz/blog or domena.cz/blog. It doesn't matter, because we're primarily talking about only one specific and independently existing instance of some content.

Although the search engine considers subdomains and directories to be the same in most cases, this is not true 100% of the time. Google itself chooses how it will work and rank in a particular case. Two or more subdomains with thematically strongly different content are likely to be evaluated as separate. Two subdomains with thematically the same or similar content will likely be evaluated as one site.

However, we have no way of determining or knowing when a subdomain will be perceived as a separate site. Will the categories herni-konzole.heureka.cz and barvy-laky.heureka.cz be seen as very different separate websites? Or is it sufficiently clear from the URL structure, sitemap, and structured data that they are the same site? Do links between subdomains have the same weight as between directories or not? These questions will probably never be resolved.

Discussions about subdomains and directories have been going on for years. And will continue to be for a long time, because finding a clear implication14 is virtually impossible. So let's rather go back to the beginning of the first article15 and reiterate one important thing that has not changed: Subdomains are not the cause of success or failure. They may have once helped in wallpapering the SERPs, but that is no longer the case. What is important is that at the beginning of Heureka's journey to success was a quality and smartly built service that covered a real need for users. 16

Archetypes of Good and Evil in the Basic Concepts of Web Optimization

The word "Satan" in the title of the Heureka subdomain articles was not chosen at random. At first glance, it evokes something negative, repulsive, and evil.

The devil is portrayed as the personification of evil. A clear opposite of good and an element of destruction. It appears in dozens of variations in many cultures. But usually representing completely different aspects.

In Hebrew, the term satan (śāṭān, שָׂטָן) is translated as "opponent" or "adversary." In Arabic (Shaitan, شيطان), again as "distant" or "errant".17 The New Testament then uses the term "Tempter." All of these terms are much better suited to our needs.

Indeed, search engines do not distinguish between concepts such as good and evil. At the lowest level, websites are just networks that we are trying to optimize. However, without adequate care, they can easily go astray and become our adversaries.

The Temptation of Subdomains

Subdomains are not a priori18 good or bad. It depends on their specific use. Rather, they are temptations. In our case, once upon a time, they represented temptations internal. If the first 10 worked well for us, why shouldn't the next 10,000 or 100,000 subdomains? As the saying goes. The road to hell is paved with good intentions.

We need to be even warier of external temptations. Lots of people have tried to copy Heureka thinking that subdomains are cool. There is only one thing to write about this. Never copy someone else's site without having a detailed insight into why they do the thing and how exactly it works. Don't fall for myths and false illusions.

Splitting a site completely unnecessarily into several subdomains doesn't make much sense these days. In general, of course, it depends on the capabilities that a given site or hosting has. The vast majority of commonly available box solutions, e‑shop or website rentals, and especially hosting offer customers very limited options, with which not much can be invented. If a subdomain is the only way to run a blog, that's fine. It's probably not optimal, but you don't have to worry too much about it. It'll work, too. The main thing is not to try to "outsmart" or "sniff" something over a search engine.

The temptation is strong. You have to learn to resist and make decisions logically, with thought and a vision of a long-term strategy. Abandon all hope if you don't. And as we've already explained, subdomains alone do not determine success. Let's take a look at the factors that actually block website success.

The Four Horsemen

Many companies are teetering on the edge of a potential apocalypse with their websites. Some may even be reveling in it. There are four ways to meet the doom. Let's call it the PWHD framework.

Pestilence

Websites tend to be literally littered with hundreds of bugs, outdated code, and half-baked stuff. And of course we all want to keep growing and making more money. These two paths are somewhat incompatible.

Bugs are a blocker to effective progress. It's useless to create content when there are so many duplicates on the site that the bot virtually doesn't crawl the new pages and indexing takes months. It's useless to have a bunch of products when the site takes 10 seconds to load and users can't use it. There are a lot of examples.

At some point you need to put a stop sign, cut back on a few innovations, and fire various bollocks and blockers. Further progress will be easier and faster to achieve.

The plague was basically the biggest problem in Heureka. The problems and unfinished areas have been long bought up. Consequently, there was a chain of bugs that could no longer be addressed easily and quickly. It's pretty certain that this has held us back unpleasantly in the development of other areas.

Bugs and holes need to be fixed as soon as possible, while it is still easy. You can apply a quick patch to something, but you never know when it will stop working. In SEO, it doesn't pay to underestimate even small mistakes. Especially with a large website. Of course, we're not talking about one 404 found, but large error patterns.

War

People within companies often war against each other instead of banding together to fight problems. Scrap any internal squabbles. It's not going anywhere at all. Everyone's goal should be to build a quality website/service.

If you have to fight over every priority and minutiae, any effectiveness is severely lost. SEOs need to have confidence and enough time to develop. And truly effective SEO is created by combining many other channels. PPC partially supports organic and vice versa. Social and PR helps generate links. PR and brand help build brand awareness and thus increase searches. SEO is a jigsaw puzzle of many pieces that the whole company must participate in.

For example, when an article is created, it should ideally be supported by backlinks, driven through social and newsletter and mined for PR activities. Content that is handled in this way is more quickly indexed by search engines and gains ranking faster. It requires some processes and coordination, but it pays off.

Fortunately, Heureka has a friendlier-than-average environment and a good company culture. But we see it sometimes elsewhere, where people don't fight problems, they fight each other. In such an environment, you can't get anything done and there's almost no point in trying.

Hunger

Few resources, poor quality people, poor quality work, stop state for innovation. In an environment where nothing works, there is no money for anything and everything is a problem, SEO simply cannot be done at all.

Organic traffic is not free. It is the result of long term work and a lot of effort. You need to have dedicated resources, development capacity, budget for tools and links. And most importantly, to do everything properly. A job done halfway won't have half the effect, but maybe only a quarter. Sometimes none at all, because SEO-related tasks are often linked to each other and must be handled in the right order. Similarly, "we'll do something now and finish the rest in six months" doesn't work very well.

Typically, better/faster indexing of content often requires fixing a sitemap, deleting duplicates from a site, or better linking of a site. If you just blindly chase traffic and don't solve the technical blockers because there's no time/money/taste/power/will, you're not going to get anywhere.

You can't move anywhere without adequate resources.

Death

Neglect the web for long enough and you will eventually quit. Subtly and without warning. SEO is very tricky and has a lot of inertia. If you turn off PPC, the traffic from advertising stops instantly. You stop sending newsletters and logically you lose that traffic too. In contrast, SEO can give a more stable impression. You stop working on it and the organic traffic keeps coming. A year, two, maybe three. However, the break to zero can come very suddenly.

You need to work on SEO continuously. Search engines don't sleep and they are always changing. This needs to be reflected. Every year you neglect in SEO, you will be making up for the next two years. This is kind of the opposite of the inertia mentioned above. It just takes time to kick organic traffic up (again). Sometimes for a long time and it can be really expensive.

Don't expect an SEO specialist to save something in a month. Especially with big sites, it's a multi-year job, picking up and gluing together many little pieces. They won't get good results on their own, but when it's all glued together, you'll be glad you invested in organic traffic.

Purgatory


We've spent many months preparing and planning for everything we've described in this series of articles. Hundreds of hours analyzing, evaluating, creating scenarios, drafting briefs, and discussing with stakeholders and developers in the company. The hard work has paid off. We can't say it always went like clockwork. But in the end, we fought most things out.

Of course, we didn't just work on subdomain issues over the five years. The range of things to solve was very diverse, from classic keyword analysis, experiments with GTMek, data scraping, preparing reports and various automations.

Worth mentioning are the four tools we developed to help us in our work.

  • Redirect Tool and Traffic Admin are tools that allow for massive URL redirection and management. They were created for us by a dedicated "Traffic Aquisition" team of developers. The tool is not public and is for internal Heureka purposes only.
  • The Junior System is a set of SQL databases and queries that allowed us to mass process data for hundreds to thousands of keyword analyses simultaneously. The tool is not public.
  • URL Regexator was created as a tool to easily create regular expressions with multiple subdomains or URL slugs for Google Analytics and Google Search Console.
  • Robots.txt Validator helped us validate new directives in robots.txt and revisions to existing ones.


Sacred Optimization

A year in the life of an SEO specialist in a large company with several giant websites is no mean feat. As we have described in previous texts, that one year can be taken up with planning and negotiating before any changes can be made. In our case, in the end, even the five years we worked on de-subdominating Heureka were not enough to do everything. But regardless, we still successfully got rid of tens of thousands of subdomains, hundreds of billions of duplicates, and did a lot of other useful tasks and innovations. Although the work originally planned is not yet complete, we believe we still have a lot to be proud of.

It doesn't really matter much. After all, terms like "beginning" or "end" are irrelevant in the context of SEO and web optimization. We're always jumping on the digital bandwagon. And no one will ever have it all done and dusted. True "optimization",19 i. e., generally across the board and steadily achieving the best possible result, is virtually unattainable in the extremely dynamic and highly competitive world of online marketing and websites, and certainly unsustainable in the long term. Time passes, new challenges keep coming and things are always changing. That "sacred" optimization is just a never-ending journey to get the best possible result in a specific situation and conditions. This is a basic fact that we all have to accept and learn to live with somehow. Panta Rhei!20

We, above all, now hope that we have been able to shed enough light on the issue of subdomains to help prevent more people from falling into this trap. We hope you have enjoyed and been enlightened by our articles. Goodbye and see you next time.

Memorial

Finally, we would like to thank all our colleagues who helped us. Especially the developers and production guys who did most of the work and without whom we couldn't have gotten anywhere! And also the management of Heureka Group, who gave us enough trust and resources to make our epic campaign a reality. Thank you! We could never have done it without you, your cooperation, and your support. ❤️


Series on SEO and subdomains


Disclaimer

Approach the text with caution. This article and the entire series are not intended as a guide. The texts do not contain any "universal" truths. Each site represents a unique system with different starting conditions. An individual approach and perfect knowledge of the specific site and the subject matter are required.

The article describes our website. We do not evaluate the general effectiveness of subdomains or directories. Nor do we recommend any specific solution. Again, this is a highly individual matter, influenced by many factors.

Strategies and detailed plans for some of the activities described here have been in the works for over a year. Everything has been discussed, tested, and validated many times. Please keep this in mind when you do similar activities yourself.

Some of the data presented may be inaccurate and purposefully distorted. Specific numbers such as organic traffic stats, revenue, conversions, and the like, we don't plan to leak out for obvious reasons. However, key information such as subdomain counts, URLs, and our practices are presented truthfully without embellishment.

The text may contain advanced concepts and models that are not entirely standard in SEO. The articles are therefore supplemented with footnotes with sources where everything is explained in detail.


Footnotes:

  1. Respectively (sub)domains.

  2. Routing https://en.wikipedia.org/wiki/Routing

  3. The same year that Heureka was founded.

  4. Article Subdomain and Subdirectories https://www.mattcutts.com/blog/subdomains-and-subdirectories/

  5. Matt Cutts https://en.wikipedia.org/wiki/Matt_Cutts

  6. Black hat SEO https://en.wikipedia.org/wiki/Search_engine_optimization#White_hat_versus_black_hat_techniques

  7. SERP "wallpapering" is actually a broader term and can involve wallpapering using a variety of techniques. Not just subdomains. "Host crowding" is just one type of wallpapering.

  8. Site diversity core update https://twitter.com/searchliaison/status/1136739062843432960

  9. Site diversity system, published 6. 6. 2019, quoted 7. 3. 2023 https://developers.google.com/search/docs/appearance/ranking-systems-guide#site-diversity-system

  10. Subdomain or subfolder, which is better for SEO? Google Search Central, published 21.12.2017, quoted 7. 3. 2023 https://www.youtube.com/watch?v=uJGDyAN9g-g

  11. English Google Webmaster Central office-hours hangout, Google Search Central, published 18. 5. 2018, quoted 7. 3. 2023 https://www.youtube.com/watch?v=kQIyk-2-wRg&t=672s

  12. English Google SEO office-hours from September 17, 2021, Google Search Central, published 17. 9. 2021, quoted 7. 3. 2023 https://www.youtube.com/watch?v=iXRhfPhEv2Y&t=2725s

  13. Announcing domain-wide data in Search Console https://developers.google.com/search/blog/2019/02/announcing-domain-wide-data-in-search

  14. Implication https://en.wikipedia.org/wiki/Logical_consequence

  15. See article https://www.heurekadevs.cz/satan-seo-subdomeny-i-chaos

  16. See article https://www.heurekadevs.cz/satan-seo-subdomeny-i-chaos

  17. Satan https://en.wikipedia.org/wiki/Satan#Hebrew_Biblehttps://en.wikipedia.org/wiki/Satan#Islam

  18. A priori https://en.wikipedia.org/wiki/A_priori_and_a_posteriori

  19. Optimization comes from the Latin "optimus", which translates as "best". https://en.wiktionary.org/wiki/optimus

  20. Panta Rhei -> Everything flows. https://en.wikipedia.org/wiki/Panta_Rhei

Author

Zdeněk Nešpor

SEO

Technically focused SEO specialist and webmaster. 

<We are social too/>

Interested in our work, technology, team, or anything else?
Contact our CTO Lukáš Putna.

lukas.putna@heureka.cz