How to properly index the pages of your website?

Google, like any other search engine (Bing, Qwant, Ecosia, Yandex, Baidu…), relies on an algorithm to index web pages within its results. But how does indexing work? What algorithm is indexing based on? And finally, 12 tips to optimize the indexing of my website.

What is indexing and how does it work?

The indexing of a web page is its appearance among the search results when a user formulates a request. Google compares its indexing to the catalog of a library, which would allow the user to find the information he needs when he requests it.

“The Google index is like a library catalog, which provides information about all the books available in the library. However, instead of book information, the Google index contains a list of all web pages known to Google systems.” Source: Support Google

indexing is thus used to list the various sites and web pages in order to allow the user to reach the information sought when he has formulated his request in the search bar. Indexing therefore designates the fact of inventorying and prioritizing both the content corresponding to an intention.

Indexing and positioning: is it the same thing?

Most of the time, it is estimated that the indexing of a page and its positioning designate the same phenomenon. However, if we want to be precise, we must distinguish between the two. Indeed, indexing corresponds more to the consideration of a page by a search engine, when positioning designates the placement of the latter among the results. Thus, positioning implies that indexing has already taken place. Indexing could rather be related to the act of classifying the page, to its “tracking” by the algorithm, while positioning is more a matter of its placement once the indexing has been carried out and of its evolution as it progresses. measure.

Some information about Google indexes

Did you know? Since 2003, Google has used 2 separate indexes.

The first Google index, or its main index, lists the pages we access through the SERP during a classic search.

The second Google index, called secondary index or complementary index, is used by Google to classify “second choice” pages, that is to say pages that are only displayed at the request of the user, when he clicks on the button “Relaunch the search by including the ignored pages”. Indeed, the web is full of various and varied pages and among them a large number do not respect the optimization criteria recommended by Google, such as:

  • the prohibition of Duplicate Content,
  • the prohibition to create link farms,
  • the ban on over-optimizing content by using too many keywords,
  • etc.

“Second-choice” content is content judged to be of no interest by Google: content that is irrelevant, too short or on sites with insufficient technical capabilities ( loading time too long for example).

But these pages do exist and can be used by many users, especially on niche topics or for very exhaustive research. That's why Google still indexes them and allows consenting users to access them. The big difference between Google and its French competitors (Bing and Yahoo) is precisely this selectivity since the competition also uses second-choice results by mixing them with others. Clearly, the competition does not use such a precise selection algorithm for the indexing of its contents and where Google distributes the contents of "first quality" and the contents of second choice, the competing search engines do not the difference.

Some figures on Google indexing

  • There were 130 billion pages indexed by Google in 2021!
  • 20 billion websites are crawled and indexed by Google on a daily basis.
  • 80,000 requests are made every second. For one day, this corresponds to 6.9 billion.
  • Among them, 500 million requests are new and had never been formulated before.

Source

How to optimize the indexing of my website?

It is possible to have your website indexed by Google if you find that this is not done. In this case, several different factors are to be monitored and it is possible to put in place certain actions.

How do I know if my site is indexed by Google?

To find out if a site is indexed by Google, nothing could be simpler. You must enter the following query in the search engine “site:www.yoursite.com”. If the website appears in the results, then it is indeed indexed. Otherwise, it is not.

How do I get my website indexed by Google?

You own a website, but despite your efforts, Google does not index your site. Here are some ways to get Google to index your site.

1. Make the site indexable with the robots.txt file

Sometimes certain sites do not appear among Google search results simply because they do not meet the basic indexability. Indexability is the way the webmaster will show Google that his site is available for indexing. To do this, it is first of all necessary to provide robots crawlers with a robots.txt which allows them to tell Google which pages to index. Because yes, on a site, any page is not good to index (duplicates, summary of the basket, institutional pages…). This will avoid indexing pages that do not need to be indexed and which could penalize the site's SEO overall.

2. Make it easier for robots to crawl with the sitemap.xml

The sitemap is a file allowing robots to find their way around the site map, to access the pages one after the other and to understand the navigation logic between each of them. between them. It facilitates the crawling of robots and would thus accelerate the indexing of the pages of a site.

3. Use Google Search Console to monitor the indexing of pages

Tool number 1 for webmasters, the Google Search Console allows you to inspect each URL to know its indexing status, thanks to the indexing inspector. All you have to do is copy/paste the URL whose status you want to know in the search form.

screenshot indexation google

Request for indexing on Google Search Console

4. Request a re-indexing of pages from Google

Google Search Console also allows you to request a re-indexing of content in the event that one or more pages have recently been modified. Indeed, modifications made to certain pages can modify their optimization and therefore their positioning, which is why it is then important to ask Google to index them again, that is to say to carry out a new analysis. To request a re-indexing, simply click on the eponymous button. When you have a large number of pages, then you will rather request a re-indexing of the sitemap so that Google performs a more global analysis.

5. Offer responsive pages

Since November 4, 2016, Google has indexed pages based on the Mobile First, that is to say by analyzing the mobile versions of each of them rather than the desktop versions ( computer) as was the case until then. Thus, non-responsive websites, that is to say those that do not offer an optimized mobile version, experience an inevitable negative evolution in their positioning among the SERPs. This is why it is extremely important to offer an optimized mobile version of your site, and adaptable to all devices, which always favors theUX (user experience) despite the change of device.

6. Update your pages regularly

The more updates there are, the more a website has a crawl budget , i.e. it will be indexed more regularly if it is renewed from time to time. other. Thus, it is appropriate to offer news, or to enrich articles, to propose edits according to changes in data: so many modifications that encourage robots to crawl regularly and to re-index your content.

7. Optimize internal networking

Did you know? Robot crawlers are called spiders, and the web forms a web. It should be understood that each site also forms a web, and it is easier for spiders to move if the web has already been pre-woven for them. Also, it is very important to create links between each page in order to offer a determined path to the robots: they will only reference you better and indexing will be facilitated.

8. Optimize your headers

The definition and hierarchy of titles and subtitles is not an accessory: it allows Google robots to find their way around content. Do not forget that crawler robots do not have our human intelligence, which is why it is necessary to assist them in the analysis of our pages by providing them with the editorial framework of each of them. As such, the H1s must be representative of the content of each page (and fundamentally transparent) and the subtitles must indicate to the robots which specific problem each paragraph deals with.

9. Make your pages load fast

Did you know? One in two users stop browsing a site if the pages take more than 3 seconds to load. The same logic applies to indexing robots: if a page takes too long to load, then the robot abandons its analysis and does not index the page. It is therefore important that the loading time of the pages of your site is as fast as possible.

10. Optimize your authority

The more a site is recognized among its peers for its seriousness, its relevance and its reliability, the more crawler robots will analyze its pages. Thus, it is necessary to work on the acquisition of backlinks, to obtain positive opinions on Google My Business or on verified opinion solutions. Clearly, it is imperative to optimize the EAT criteria on your site to attract robots and increase your crawl budget.

11. Sort it out Let's

imagine a big house full of beautiful objects but also many uninteresting trinkets. These trinkets don't show off beautiful objects and the house is unlikely to end up on the cover of a decorating magazine. Once the old trinkets have been removed, the house gains in elegance and all visitors notice the beautiful objects all the more. It's a bit the same principle for the indexing of the pages of a site: the pages without value will tend to pull the indexing of all the others down. The robots believe that the site is not sufficiently relevant and therefore not trustworthy and will not index it or to a lesser extent. It is essential to sort out and lighten one's site from poor and irrelevant pages in order to highlight those that bring real added value.

12. Monitor your 404s

A 404 page is a bit like a locked door: it creates disappointment and above all it wastes the site's crawl budget. So set up a plugin allowing you to manage and identify 404s and be careful not to leave a single one on your site.