Thursday, May 29, 2014

How search engines index content

How Search Engines Index

It's important to understand how search engines discover new content on the web, as well as how they interpret the locations of these pages. One way that search engines identify new content is by following links. Much like you and I will click through links to go from one page to the next, search engines do the exact same thing to find and index content, only they click on every link they can find. If you want to make sure that search engines pick up your new content, an easy thing you can do is just make sure you have links pointing to it. Another way for search engines to discover content is from an XML sitemap.

Sitemaps | Site Maps

How Search Engines Index Websites
How Search Engines Index Websites
An XML site map is really just a listing of your pages' content in a special format that search engines can easily read through. You or your webmaster can learn more about the specific syntax and how to create XML site maps by visiting Once you've generated your site maps, you can submit them directly to the search engines, and this gives you one more way to let them know when you add or change things on your site. Search engines will always try to crawl your links for as much additional content as they can.

And while this is generally a good thing, there are plenty of times that you might have pages up that you don't want search engines to find. Think of test pages, or members-only areas of your site that you don't want showing up on the search engine results pages. To control how search engines crawl through your website, you can set rules in what's called a robots.txt file. This is a file that you or your webmaster can create in the root folder of your site, and when search engines see it, they'll read it and follow the rules that you've set.

You can set rules that are specific to different browsers and search engine crawlers, and you can specify which areas of your website they can, and can't see. This can get a bit technical, and you can learn more about creating robots.txt files rules by visiting Again, once search engines discover your content, they'll index it by URLs. URLs are basically the locations of web pages on the Internet. It's important that each page on your site has a single, unique URL, so that search engines can differentiate that page from all the others.

And the structure of this URL can also help them understand the structure of your entire website. There are a lots of ways that search engines can find your pages, and while you can't control how the crawlers actually do their job, by creating links and unique and structured URLs for them to follow, site maps for them to read, and robots.txt files to guide them, you'll be doing everything you can to get your pages in the index as fast as possible.

Steve Steinberger
how search engines index

No comments:

Post a Comment