Internet Guide Logo

Search Engines

Introduction

The World Wide Web is a service found on the Internet. The World Wide Web is a hypertext document system that uses hyperlinks (a hypertext element) to interlink the documents together. Hyperlinks have URL's embedded within them, and a URL is a unique address that points to the location (server/computer) that contains the hypertext documents. The World Wide Web is based upon a client-server model: a client (browser) retrieves documents stored on servers (computers) connected to the Internet. The client (browser) and servers uses the Hypertext Transfer Protocol (HTTP) to communicate and transfer data between one another.

The problem that the client has, is finding documents that are relevent to the information they are looking for. When the World Wide Web was launched in 1991, there were not many servers and therefore not many websites. When Internet access expanded in the mid-1990's, the World Wide Web was probably the most popular service found on the Internet; alongside email. The problem that clients (users) had was navigating around the web and finding what they wanted; due to the ever growing amount of new websites. Directories had been the original way to locate web documents: a directory is a website that contains a list of hyperlinks; usually based on a specific subject. The problem that the major directories had: was that they could not keep pace with the amount of new websites being created and linkrot.

Search engines were created as a remedy to this problem. Search engines used an automated tool - called a crawler or a robot - that automatically followed hyperlinks and collected information about the documents it visited. The search engines then stored all the information they collection in a database. Users could then visit a search engines website and query their database to find hyperlinks that matched their search term. Search engine performs three functions: 1) it crawls the World Wide Web for content, 2) it indexes this content into a database, 3) provides a search facility at a website which allows clients to query the database. Further reading: Basics of using a search engine and how search engines differ from directories.

History

When the World Wide Web was first developed, a list of new web servers was edited by Tim Berners-Lee. As time went by, the amount of new servers grew exponentially and it was impossible to manually edit the amount of new servers and websites created. This is when it became apparent that an automated computer system was needed to edit and access new servers.

In the early 1990's, Gopher was a popular protocol that accessed documents on the Internet, and was a direct competitor to the World Wide Web. Veronica and Jughead were search engines / indexers that were developed for Gopher, and predate web search engines. One of the first programs that was used to 'crawl' the World Wide Web was 'the World Wide Web Wanderer'. However, the Wanderer was a program designed to survey the size of the web and was not, strictly speaking, a search engine. The first web search engines were launched in 1993, and included: ALIWEB, JumpStation and W3Catalog.

The early web crawlers and search engines were basic in their scope; WebCrawler was the first crawler that was able to crawl and index every word from a web document. In 1994, WebCrawler was the pinnacle of search engine technology, but, it was soon rivaled by the robots developed by Lycos and Yahoo!; these two companies continue to exert a large presence on the World Wide Web. The other search engines, whose prominence was short-lived, were: Magellan, Infoseek, Northern Light, and AltaVista. Most of these search engines crawled and indexed content for free: however, as time passed, and they could not find a suitable revenue model, they changed to paid inclusion.

One problem that hampered early search engines, was: How to make money? The early crawlers (also referred to as a spider) were free; so obviously there was no income from that. Some engines flirted with 'paid only' inclusion, or 'paid listings' - which promised a higher ranking - but, this negatively effected the user's experience: with commercial rather than educational resources dominating the search results. The 'paid' engines, thus, found their popularity soon in jeopardy.

The answer lay in a new search engine named Google; Google's 'organic' search results were free for inclusion - thus satisfying users and web developers. However, Google also included paid listings that were listed alongside their 'organic' results; enabling the company to generate revenue from their web service. Google also provided some additional innovations: Pagerank, to fight against web spam; the bane of early engines.

Google was one of the first search engines to focus primarily on search results; with a minimalist and fast loading homepage: basically a search box and a logo. When Google was launched, the majority of worldwide users were using a dialup connection: therefore, a fast loading homepage - in comparison to the cluttered offerings from competitors - gave Google an 'edge' in performance and a unique selling point.

Google has continued, to the present day (2015), to be the most popular search engine used on the World Wide Web: simple because, for most users, it provides the most relevant results. This is due to a number of factors, amongst them: the largest database of webpages; and a superior algorithm (mathematical programming system used to determine which web pages are displayed in search results).

Although Google continues to dominate the online search business, they are not without competitors, most notable: Yahoo! and MSN. Since 2004, Yahoo! and MSN have launched new search engines. Rather than invent a new paradigm, Yahoo and MSN have attempted to beat Google at their own game: with an "organic" search index that is free, and paid listing's that are in the same position as Google's. In 2009, MSN renamed it's search engine to Bing.

Search Engines have continued (1994-2015) to be the primary tool used to find content on the World Wide Web. Search Engines now provide a huge online database of Web content - with billions of documents stored - that allow users to search for specific queries on every subject imaginable.

Search Engines: Timeline

The following list includes search engines that provide/provided traditional search results, 'pay per click' search results and 'meta' search results.

1993

1994

1995

1996

1997

1998

1999

2000

2004

2001

Search Engines: Total Search Queries

A range of companies analyse and measure digital data; while the figures they collate may differ, they show a similar trend when it comes to the number of worldwide search queries. ComScore is considered a global leader in measuring search engines and the digital world. They have released the following data:

As ComScore's data highlights, from 2007-2012, the amount of monthly search queries has grown exponentially. While Google's total search queries has grown year on year, it would appear that Yahoo's has stagnated. The other notable trend is the success of Baidu and Yandex: Chinese and Russian language search engines that dominate their domestic markets. While eBay, Time Warner Network, Alibaba.com Corporation and the Ask.com Network make it into the top 10 search worldwide properties, the dominant search engines are: Google, Baidu, Yahoo!, Yandex and Bing.

Search Engines: Robots, Spiders and Crawlers

For search engines to fill their search results with meaningful and up-to-date information, they need to use an automated software program that browses the World Wide Web and collects data. These programs are classified as a robot, and, are referred to as an: Internet bot, web bot, Internet robot, web robot, www bot, www robot, or, simply, a bot.

The primary use of robots is web spidering; also referred to as web crawling. Therefore, a Internet robot that spiders / crawls the World Wide Web is referred to as either:

  1. Web Crawler
  2. Web Spider

Web crawlers/spiders perform the same function: a web crawler is used by a search engine to find and collect information on the World Wide Web. The earliest web crawlers were created for academic purposes: to analyse how large the early World Wide Web was. Web crawlers begin with a core list of URLs, whatever hyperlinks they find at these URLs, they follow, and within time, they should beable to collect information about the majority of popular websites. The problem that web crawlers have is to find data in the 'deep web': where the URLs are not linked to from other web content.

Some of the most famous web crawlers: Googlebot, Bingbot, WebCrawler, ExaBot, Yahoo! Slurp, AskJeeves, Baidu Spider, Yandex Bot, Scooter, Mercator, Facebook External Hit, Atomz, ArchitectSpider, and Lycos_Spider_T-Rex. The robots exclusion standard, initiated by Martijn Koster, led to the creation of robots.txt files: that enable websites to limit access to their website from web crawlers.

Search Engines: Revenue Models

The majority of search engines have free submission; but they do not guarantee to include a website within their search results. Search engine's sometimes employ a sandbox for new websites submitted to their 'organic' search results. There are search engines, like Overture, who haved charged a fee for an express, or, "superior" listing. However, generally speaking, search engines use discrete 'pay per click' advertising to generate revenue. Pay per click adverts are displayed on search results pages, and the search engine charges a fee for every click on the advert. Conversion rate and ROI (from advert impressions) play a critical role in how publishers and advertisers rate a search engine's advert network.

Search Engines: Submitting a Website

Submitting a website to a search engine is extremely simple: all a user needs to do is follow the 'submit a site' (manual submission) link located on a search engine's homepage. The owner of a website will then fill in the relevant information about their website, and the search engine will crawl the website at a later date. The amount of time it takes to get a website listed in a search engine varies greatly.

The quickest way to get a website listed in a search engine: is to get a link (typically a deep link) from a website that is already listed in that search engine. The prominance of a website in search results depends upon external websites linking to it. Once a website is "live" and receiving traffic, there are a number of companies who record and analyse website traffic; categorising traffic as: hits, page views, referrer and unique visitors.

Website traffic anaylsis can be used to optimise a website to rank higher in a search engine. Typically, a webpage's ranking in a search engine is based upon it's link popularity; therefore, search engines ban websites that buy or sell links. Therefore, a reciprocal link policy is frowned upon if it is used extensively; both the selling, buying and exchanging of links is viewed as 'gaming' a search engine: as it is not an 'organic' practice.

If a website has been banned by a search engine, it is possible to submit a re-submission request; if you feel the website was wrongly penalised, or, the content has been changed. Websites can be penalised by search engines for a range of practices, the most obvious of which is: cloaking.