Webmaster Papers








Search Engine Robots - How They Work, What They Do (Part I)


Automated search engine robots, sometimes called "spiders" or "crawlers", are the seekers of web pages. How do they work? What is it they really do? Why are they important?

You'd think with all the fuss about indexing web pages to add to search engine databases, that robots would be great and powerful beings. Wrong. Search engine robots have only basic functionality like that of early browsers in terms of what they can understand in a web page. Like early browsers, robots just can't do certain things. Robots don't understand frames, Flash movies, images or JavaScript. They can't enter password protected areas and they can't click all those buttons you have on your website. They can be stopped cold while indexing a dynamically generated URL and slowed to a stop with JavaScript navigation. How Do Search Engine Robots Work?

Think of search engine robots as automated data retrieval programs, traveling the web to find information and links.

When you submit a web page to a search engine at the "Submit a URL" page, the new URL is added to the robot's queue of websites to visit on its next foray out onto the web. Even if you don't directly submit a page, many robots will find your site because of links from other sites that point back to yours. This is one of the reasons why it is important to build your link popularity and to get links from other topical sites back to yours.

When arriving at your website, the automated robots first check to see if you have a robots.txt file. This file is used to tell robots which areas of your site are off-limits to them. Typically these may be directories containing only binaries or other files the robot doesn't need to concern itself with.

Robots collect links from each page they visit, and later follow those links through to other pages. In this way, they essentially follow the links from one page to another. The entire World Wide Web is made up of links, the original idea being that you could follow links from one place to another. This is how robots get around.

The "smarts" about indexing pages online comes from the search engine engineers, who devise the methods used to evaluate the information the search engine robots retrieve. When introduced into the search engine database, the information is available for searchers querying the search engine. When a search engine user enters their query into the search engine, there are a number of quick calculations done to make sure that the search engine presents just the right set of results to give their visitor the most relevant response to their query.

You can see which pages on your site the search engine robots have visited by looking at your server logs or the results from your log statistics program. Identifying the robots will show you when they visited your website, which pages they visited and how often they visit. Some robots are readily identifiable by their user agent names, like Google's "Googlebot"; others are bit more obscure, like Inktomi's "Slurp". Still other robots may be listed in your logs that you cannot readily identify; some of them may even appear to be human-powered browsers.

Along with identifying individual robots and counting the number of their visits, the statistics can also show you aggressive bandwidth-grabbing robots or robots you may not want visiting your website. In the resources section of the end of this article, you will find sites that list names and IP addresses of search engine robots to help you identify them. How Do They Read The Pages On Your Website?

When the search engine robot visits your page, it looks at the visible text on the page, the content of the various tags in your page's source code (title tag, meta tags, etc.), and the hyperlinks on your page. From the words and the links that the robot finds, the search engine decides what your page is about. There are many factors used to figure out what "matters" and each search engine has its own algorithm in order to evaluate and process the information. Depending on how the robot is set up through the search engine, the information is indexed and then delivered to the search engine's database.

The information delivered to the databases then becomes part of the search engine and directory ranking process. When the search engine visitor submits their query, the search engine digs through its database to give the final listing that is displayed on the results page.

The search engine databases update at varying times. Once you are in the search engine databases, the robots keep visiting you periodically, to pick up any changes to your pages, and to make sure they have the latest info. The number of times you are visited depends on how the search engine sets up its visits, which can vary per search engine.

Sometimes visiting robots are unable to access the website they are visiting. If your site is down, or you are experiencing huge amounts of traffic, the robot may not be able to access your site. When this happens, the website may not be re-indexed, depending on the frequency of the robot visits to your website. In most cases, robots that cannot access your pages will try again later, hoping that your site will be accessible then.

Resources

*SpiderSpotting - Search Engine Watch http://searchenginewatch.com/webmasters/spiders.html

*Robotstxt.org List of robots and protocols for setting up a robots.txt file. http://www.robotstxt.org/

*Spider-Food Tutorials, forums and articles about Search Engine spiders and Search Engine Marketing. http://spider-food.net/

*Spiderhunter.com Articles and resources about tracking Search Engine spiders. http://www.spiderhunter.com/

*Sim Spider Search Engine Robot Simulator Search Engine World has a spider that simulates what the Search Engine robots read from your website. http://www.searchengineworld.com/cgi-bin/sim_spider.cgi

Daria Goetsch is the founder and Search Engine Marketing Consultant for Search Innovation Marketing, a Search Engine Optimization company serving small businesses. She has specialized in Search Engine Promotion since 1998, including three years as the Search Engine Specialist for O'Reilly Media, Inc., a technical book publishing company.

Copyright © 2002-2005 Search Innovation Marketing. http://www.searchinnovation.com All Rights Reserved.

Permission to reprint this article is granted if the article is reproduced in its entirety, without editing, including the bio information. Please include a hyperlink to http://www.searchinnovation.com when using this article in newsletters or online.

RELATED ARTICLES


Google Has No Content On Page
One of the post from My SEO world Meet Up Group?.!
How to Get the Ranking You Always Wanted!
Is your web site well ranked (In the top ten search results) in the results? If not, you need to read this and get the ranking you always dreamed of getting with your web site! I will show you how to get your web site a top ten ranking in the search engines with these few easy-to-do steps.
Expert Help From Google Answers
Web users turn to search engines for answers to their questions. This is usually done through various levels of searching the engine's database. Sometimes though, no matter how hard they try, searchers can't find the information they need. Maybe they're not familiar with how to narrow and focus searches, or they may not have enough background in the subject they're researching to recognize the answer they need. Google Answers offers a solution.
6 Ways To Attract Search Engines To Your Website More Often
Adding fresh, updated content to your website is the surest way to get search engines engines to spider your site more often. Search engines are known to index sites updated on a regular basis more frequently.
Google Takes Care of Idiots Too
There's an old saying that goes, "God takes care of babies and idiots."
An SEO Checklist
Search engine optimization is on every webmaster's mind these days. Achieving a favorable ranking for the right keywords can mean a steady stream of targeted traffic to your site, and all for free - that's hard to beat. The key to high search engine rankings is structuring your website correctly, including plenty of content that is relevant to your keywords, and making sure your website is spider-friendly. You can use this checklist to make sure all of your Web pages can be found, indexed and ranked correctly:
Google: The Ultimate Web Writer?s Style Guide
Indulge me for a moment.
Importance of Keywords in Anchor Text or Title Text
Keywords are indisputably, the single most important element of an anchor text.
10 Things to Expect from Your SEO Copywriter
From the perspective of a business owner, webmaster, or marketing manager, the change exhibited by the Internet is profoundly exciting, yet profoundly disturbing. The information (and misinformation and disinformation) it offers, the business benefits it promises, and the rules it is governed by change at such a rapid rate that it's almost impossible to keep up.
The (Not So) Shocking Truth About Getting A High Search Engine Ranking
I have seen my site hit #3 at Google, and some of my fellow entrepreneurs are wondering how I did it. Well, it's no big secret, and it won't cost you anything but your time. Here's what I did:
Search Engine Indexing - 3 Strategies Guaranteed to Skyrocket Your Success
In order to design a website that performs well with the search engines, it is very important to give the search engines what they want. You'll want to find out what search engines index and what they do not in order to have your pages returned as a result. Consider the following strategies to get your web page noticed as frequently as possible.
HTML and Search Engine Optimization - What You Dont Know Can Kill You
When it comes to search engine optimization there is a lot of information available, some accurate, some not. If you really want to know what is going on regarding your website and how to best optimize it for good results with the search engines, you need to do some SEO research. Review the following suggestions and above all get your information verified from a variety of sites, don't just take one site's information as the truth and run with it because you could be running in the wrong direction.
What is Search Engine Optimization?
It is no secret that search engines are the number one traffic generating method for driving visitors to the different web sites. Search engines are very useful in helping people find the relevant information they seek on the Internet. The major search engines develop and maintain their own gigantic database of web sites that can be searched by a user typing in a keyword or keyword phrase in the search box.
How To Measure Search Engine Marketing ROI
According to the Search Engine Marketing Professional Organization (SEMPO), advertisers spent $4 billion in 2004 on search marketing programs and are expected to spend 39% more than that this year.
How to Prevent Duplicate Content with Effective Use of the Robots.txt and Robots Meta Tag
Duplicate content is one of the problems that we regularly come across as part of the search engine optimization services we offer. If the search engines determine your site contains similar content, this may result in penalties and even exclusion from the search engines. Fortunately it's a problem that is easily rectified.
All About Google
If you read The Search Engine Showdown, you know Google is my favourite search engine. Why? Google always offers the most results for any given search (they currently have over 8 billion pages indexed), it's faster than the Audi Quattro we test drove this morning, and 9 times out of ten, in my experience, all the front page results are relevant to my search. In fact, I usually find what I'm looking for within the first few sites listed. I also really respect the fact that two college kids started it (kinda like Abalone Designs!) and that those two college kids seem not to have forgotten where they came from. If you check out the images at Google's press center (http:// www.google.com/press/images.html) and scroll down to the Everyday Life Inside Google section, you'll get a feeling that life at Google is fun.
An Introduction to Google Sitemaps
... and why I 'm dying to get finally in the Google SERP
Google vs. Yahoo -- How To Rank High On Each One
Google likes incoming links, especially links from high-ranking, on-topic pages that include keywords in the link text. Google doesn't like over-optimized, high keyword densities and over use of keywords in headings, etc. like they use to.
SEO Blues
SEO, not again!, you may groan. The webmaster world is inundated by articles and "how to's" with regard to SEO (Search Engine Optimization). If you are a rookie webmaster, chances are, like me, you may have embarked on a merry-go-round on the SEO circuit, depending on which article and opinion you were first exposed to.
The Wonders of Wordtracker: Its More than a Hunt for Keywords
For me personally, Wordtracker.com is not just a tool for looking up keywords. Sure, that's one good use for it, but what I want to distinguish is another influential and exciting use for Wordtracker as an SEO resource.