What is a Search Engine Spider?The term search engine spider can be used interchangeably with the term search engine crawler. A spider is a program that a search engine uses to seek out information on the World Wide Web, as well as to index the information that it finds so that actual search results appear when a search query for a keyword is entered. The search engine spider “reads” the text on the web page or collection of web pages, and records any hyperlinks it finds. The search engine spider then follows these URLs, spiders those pages, and collects all the data by saving copies of the web pages into the index of the search engine for use by visitors. Search engine spiders are always working, sometimes to index new web pages, and sometimes to update ones that change frequently. The goal of a search engine spider is to perpetually supply the search engine it belongs to with the most up-to-date material possible. To work properly, and to update pages as soon as possible, a good search engine spider will have a way to prioritize what it is doing. As the World Wide Web is perpetually changing and expanding, this can be difficult for any search engine spider to do without an accurate and well thought out architectural structure. Every search engine will have a different structure for the search engine spider it uses, but there is a basic architecture that many adhere to so that it will be as efficient and fast as possible. This architecture begins with the search engine spider selecting web pages from the World Wide Web and sifting them through a downloading system with multiple threads. From there, the URLs, now separated into lists, head to a queue to wait to be prioritized. The URLs are then prioritized in a scheduler and are fed back through the downloading system this time in smaller batches that have been prioritized as to whether they are new pages, or pages that need to be updated, or just checked to be certain that they have not become spam. The final steps involve the URL’s text and Metadata being put into storage after it has been scanned. This architecture will also determine the behavior of the search engine spider. There are four distinct styles of behavior of search engine spiders, which include: Selection – A search engine spider that is strictly deciding on which pages need to be downloaded and others which already have a version within the index.
While by no means a simple process, search engine spidering is what makes it possible for people who use search engines to find the information they need. The search engine spider makes sure that all the information is up to date, and includes the most recently added pages as well. Call Brick Marketing today at 877-295-0620 or contact us and find out how we can help you drive visitors and sales for your business. |
| © 2008 Brick Marketing | 877-295-0620 | 200 Boston Ave. Medford MA 02155 USA |

