An Introduction To Crawling Your E-Commerce Website

For an e-commerce website, earning a better ranking is a matter of survival. A Good ranking is further aligned with better traffic. Good traffic implies that the admin and management must ensure that web pages are crawled properly and seamlessly. Search marketing professionals constantly monitor the pace of crawling, which is the real key to sales.

Understanding How Your Website Gets Crawled

Before bothering too much about search engine spidering, be confident about the quality of your web pages. Though there is the precedent of non-existent pages also being crawled by search engines, the very process of crawling will be a dud if the pages fail to show up in rankings.

Typically the crawling of pages is guided by an XML sitemap. However, since the competition is so heavy, the XML sitemap alone cannot guarantee a better ranking for the web pages. This answers the fundamental question of why search engines should crawl your e-commerce site and rank them publicly.

Many factors also influence crawling. One central element is the SEO performance of the website, which is an index of the depth to which the site is getting crawled. The degree of crawled pages will reflect the level of optimisation undergone by the site. Now let us look at the specific reasons why your site deserves to be crawled.

For ensuring faster crawling, having the right pages is the most important. Assess whether the products you wished to be crawled exist on your site. At this juncture, setting up a favourite crawler, as in Screaming Frog’s SEO Spider, will be a testimony of the desired crawling results. With an array of features, SEO Spider stands out as a better choice with many free tools such as Xenu Link Sleuth and GSite Crawler.

To verify the progress in crawler status, scan the output file to trace missing pages. If a crawl blockage is traced, remedy it by tracing the colour, style, and size of filter pages.

Now establish a pattern of the missing pages. Are some pages showing a particular combination of missing letters in the URL? Is one of the robots.txt disallowing more than what is intended? Is the site getting missed out? Check for a disallow in the robots.txt or meta robots Noindex command.

Identifying site errors

Find 404 Errors: Most e-commerce sites throw up 404 errors signalling discontinued products. Error pages usually are not crawled in the site’s navigation. In other words, when a product is discontinued, there is no need to continue linking.

Identify Your Redirects: In addition to 404 errors, crawlers also come across other redirects. Google states that every 301 redirects have the risk of “leaks” as 15 per cent of the authority gets lost while transferring to a receiving page. So, it is advised to limit the number of redirects.

Meta Data: Identify poorly written title tags: a crawler will be at its best in tracking meta descriptions and meta keywords. If the robots index overlooks meta tags, the price of keeping bad samples would be costly for the site owner.

Analyse Canonical Tags: Canonical tags are another problem area where blunders proliferate. Many sites have canonical tags placed on almost every page. Moreover, this example defeats the very purpose of canonical tags, which are meant to track down duplicate content. Always try and revise the use of canonical tags appropriately. For example, the latest feature of crawlers catching analytics data aids the crawling process and pace.

Google’s ‘crawlers’ or Google ‘bots’ understand well that good content has the quality and power to answer a search engine query and will ultimately rank better. However, if the content is spotted as matching the phrase a searcher uses, Googlebot spiders will help that searcher, who will eventually become a potential customer.

Site Errors: Before concluding, let us clarify that periodic audits are essential in all e-commerce sites. For the smooth crawling of web bots, it is necessary to have active working pages in a state of readiness.

There are many commonplace site errors to work on, including 404 (file not found); however, the most reliable way to address 404 errors is by using Google Webmaster tools.

Discontinued products: You can use two options for products that have ended. Either leave the 404 error page or replace it with 301 redirects, as they are more search engine friendly.

David Irvine

Hi, I’m Dave. I’ve been building websites and helping businesses grow online for over 20 years. If you think we can work together, get in touch today and say hello.