The World Wide Web is a service found on the Internet. The World
Wide Web is a hypertext document system that uses hyperlinks (a
hypertext element) to interlink the documents together. Hyperlinks
have URL's embedded within them, and a URL is a unique address that
points to the location (server/computer) that contains the hypertext
documents. The World Wide Web is based upon a client-server model:
a client (browser) retrieves documents stored on servers (computers)
connected to the Internet. The client (browser) and servers uses
the Hypertext Transfer Protocol (HTTP) to communicate and transfer
data between one another.
The problem that the client has, is finding documents that are
relevent to the information they are looking for. When the World
Wide Web was launched in 1991, there were not many servers and therefore
not many websites. When Internet access expanded in the mid-1990's,
the World Wide Web was probably the most popular service found on
the Internet; alongside email. The problem that clients (users)
had was navigating around the web and finding what they wanted;
due to the ever growing amount of new websites. Directories had
been the original way to locate web documents: a directory is a
website that contains a list of hyperlinks; usually based on a specific
subject. The problem that the major directories had: was that they
could not keep pace with the amount of new websites being created
Search engines were created as a remedy to this problem. Search
engines used an automated tool - called a crawler or a robot - that
automatically followed hyperlinks and collected information about
the documents it visited. The search engines then stored all the
information they collection in a database. Users could then visit
a search engines website and query their database to find hyperlinks
that matched their search term. Search engine performs three functions:
1) it crawls the World Wide Web for content, 2) it indexes this
content into a database, 3) provides a search facility at a website
which allows clients to query the database.
When the World Wide Web was first developed, a list of new web
servers was edited by Tim Berners-Lee. As time went by, the amount
of new servers grew exponentially and it was impossible to manually
edit the amount of new servers and websites created. This is when
it became apparent that an automated computer system was needed
to edit and access new servers.
In the early 1990's, Gopher was a popular protocol that accessed
documents on the Internet, and was a direct competitor to the World
Wide Web. Veronica and Jughead were search engines / indexers that
were developed for Gopher, and predate web search engines. One of
the first programs that was used to 'crawl' the World Wide Web was
'the World Wide Web Wanderer'.
However, the Wanderer was a program designed to survey the size
of the web and was not, strictly speaking, a search engine. The
first web search engines were launched in 1993, and included: ALIWEB,
JumpStation and W3Catalog.
The early web crawlers and search engines were basic in their scope;
WebCrawler was the first crawler that was able to crawl and index
every word from a web document. In 1994, WebCrawler was the pinnacle
of search engine technology, but, it was soon rivaled by the robots
developed by Lycos and Yahoo!; these two companies continue to exert
a large presence on the World Wide Web. The other search engines,
whose prominence was short-lived, were: Magellan, Infoseek, Northern
Light, and AltaVista. Most of these search engines crawled and indexed
content for free: however, as time passed, and they could not find
a suitable revenue model, they changed to paid
One problem that hampered early search engines, was: How to make
money? The early crawlers (also referred
to as a spider) were free; so obviously
there was no income from that. Some engines flirted with 'paid only'
inclusion, or 'paid listings' - which promised a higher ranking
- but, this negatively effected the user's experience: with commercial
rather than educational resources dominating the search results.
The 'paid' engines, thus, found their popularity soon in jeopardy.
The answer lay in a new search engine named Google; Google's 'organic'
search results were free for inclusion - thus satisfying users and
web developers. However, Google also included paid listings that
were listed alongside their 'organic' results; enabling the company
to generate revenue from their web service. Google also provided
some additional innovations: Pagerank, to fight against web spam;
the bane of early engines.
Google was one of the first search engines to focus primarily on
search results; with a minimalist
and fast loading homepage: basically a search box and a logo. When
Google was launched, the majority of worldwide users were using
a dialup connection: therefore, a fast loading homepage - in comparison
to the cluttered offerings from competitors - gave Google an 'edge'
in performance and a unique selling point.
Google has continued, to the present day (2015), to be the most
popular search engine used on the World Wide Web: simple because,
for most users, it provides the most relevant results. This is due
to a number of factors, amongst them: the largest database of webpages;
and a superior algorithm (mathematical
programming system used to determine which web pages are displayed
in search results).
Although Google continues to dominate the online search business,
they are not without competitors, most notable: Yahoo! and MSN.
Since 2004, Yahoo! and MSN have launched new search engines. Rather
than invent a new paradigm, Yahoo and MSN have attempted to beat
Google at their own game: with an "organic" search index
that is free, and paid listing's that are in the same position as
Google's. In 2009, MSN renamed it's search engine to Bing.
Search Engines have continued (1994-2015) to be the primary tool
used to find content on the World Wide Web. Search Engines now provide
a huge online database of Web content - with billions of documents
stored - that allow users to search for specific queries on every
Search Engines: Timeline
The following list includes search engines that provide/provided
traditional search results, 'pay per
click' search results and 'meta'
- ALIWEB: Considered the first search
engine, developed by Martijn Koster.
- W3Catalog: One of the earliest search engines, launched by Oscar
- JumpStation: Launched by Jonathon Fletcher at the University
- WebCrawler: Was the first crawler
that provided a full text search engine.
- Infoseek: A search engine that was
founded by Steve Kirsch.
- AltaVista: AltaVista was
one of the largest and most popular search engines.
- Magellan: Purchased by Excite in 1996, and powered that web
- Yahoo!: One of the Web's most popular
website, providing a range of services.
- Ask Jeeves: Ask Jeeves is a meta
search engine; renamed Ask.com in 2006.
- Dogpile: Meta search engine, popular
in the 1990's.
- Hotbot: Popular search engine in the
- Infospace (BlueCora): Company which
powers Dogpile and WebCrawler.
- Inktomi: Pay per click search results
- Fast: Norwegian company who provided
- Northern Light: Defunct search
results provider; named after the clipper ship.
- Yandex: The most popular Russian search
provider, launched by Arkady Volozh.
- Google: The most widely used search
engine on the Web.
- GoTo: Former name of the Overture search
- MSN Search (Bing): Microsoft's search
engine; renamed to Bing in 2009.
- AlltheWeb: AlltheWeb is/was an
online search engine provided by Fast.
- Teoma: Search engine created by Apostolos
- Baidu: Chinese language-search engine.
- Espotting: Espotting was a European
pay per click advertising search engine.
- A9.com: Provides search results for the Amazon.com website.
- Overture: Pay per click search engine
that was purchased by Yahoo! in 2003.
Search Engines: Total Search Queries
A range of companies analyse and measure digital data; while the
figures they collate may differ, they show a similar trend when
it comes to the number of worldwide search queries. ComScore is
considered a global leader in measuring search engines and the digital
world. They have released the following data:
- December 2007: 66.2
billion total search queries
- Google: 41.3 billion search queries
- Yahoo!: 8.5 billion search queries
- Baidu: 3.4 billion search queries
- MSN: 1.9 billion search queries
- Ask: 0.7 billion search queries
- Yandex: 0.55 billion search queries
- July 2009: 113.6
billion total search queries
- Google: 76.6 billion search queries
- Yahoo!: 8.8 billion search queries
- Baidu: 7.9 billion search queries
- MSN: 3.3 billion search queries
- Ask: 1.29 billion search queries
- Yandex: 1.2 billion search queries
- December 2012: 175.9
billion total search queries
- Google: 114.7 billion search queries
- Baidu: 14.5 billion search queries
- Yahoo!: 8.6 billion search queries
- Yandex: 4.8 billion search queries
- MSN (Bing): 4.5 billion search queries
As ComScore's data highlights, from 2007-2012, the amount of monthly
search queries has grown exponentially. While Google's total search
queries has grown year on year, it would appear that Yahoo's has
stagnated. The other notable trend is the success of Baidu and Yandex:
Chinese and Russian language search engines that dominate their
domestic markets. While eBay, Time Warner Network, Alibaba.com Corporation
and the Ask.com Network make it into the top 10 search worldwide
properties, the dominant search engines are: Google, Baidu, Yahoo!,
Yandex and Bing.
Search Engines: Robots, Spiders and Crawlers
For search engines to fill their search results with meaningful
and up-to-date information, they need to use an automated software
program that browses the World Wide Web and collects data. These
programs are classified as a robot, and,
are referred to as an: Internet bot, web bot, Internet robot, web
robot, www bot, www robot, or, simply, a bot.
The primary use of robots is web spidering; also referred to as
web crawling. Therefore, a Internet robot that spiders / crawls
the World Wide Web is referred to as either:
- Web Crawler
- Web Spider
Web crawlers/spiders perform the same function: a web crawler is
used by a search engine to find and collect information on the World
Wide Web. The earliest web crawlers were created for academic purposes:
to analyse how large the early World Wide Web was. Web crawlers
begin with a core list of URLs, whatever hyperlinks they find at
these URLs, they follow, and within time, they should beable to
collect information about the majority of popular websites. The
problem that web crawlers have is to find data in the 'deep web':
where the URLs are not linked to from other web content.
Some of the most famous web crawlers: Googlebot,
Bingbot, WebCrawler, ExaBot, Yahoo!
Slurp, AskJeeves, Baidu Spider, Yandex Bot, Scooter,
Mercator, Facebook External Hit, Atomz, ArchitectSpider, and Lycos_Spider_T-Rex.
The robots exclusion standard,
initiated by Martijn Koster, led to the creation of robots.txt files:
that enable websites to limit access to their website from web crawlers.
Search Engines: Revenue Models
The majority of search engines have free submission; but they do
not guarantee to include a website within their search results.
Search engine's sometimes employ a sandbox
for new websites submitted to their 'organic' search results. There
are search engines, like Overture, who haved charged a fee for an
express, or, "superior" listing. However, generally speaking,
search engines use discrete 'pay per click' advertising to generate
revenue. Pay per click adverts are displayed on search results pages,
and the search engine charges a fee for every click
on the advert. Conversion rate
and ROI (from advert impressions)
play a critical role in how publishers and advertisers rate a search
engine's advert network.
Search Engines: Submitting a Website
Submitting a website to a search engine is extremely simple: all
a user needs to do is follow the 'submit a site' (manual
submission) link located on a search engine's homepage. The
owner of a website will then fill in the relevant information about
their website, and the search engine will crawl the website at a
later date. The amount of time it takes to get a website listed
in a search engine varies greatly.
The quickest way to get a website listed in a search engine: is
to get a link (typically a deep link)
from a website that is already listed in that search engine. The
prominance of a website in search results depends upon external
websites linking to it. Once a website is "live"
and receiving traffic, there are a number
of companies who record and analyse website traffic; categorising
traffic as: hits, page
views, referrer and unique
Website traffic anaylsis can be used to optimise
a website to rank higher in a search
engine. Typically, a webpage's ranking in a search engine is based
upon it's link popularity; therefore, search engines ban websites
that buy or sell links. Therefore,
a reciprocal link policy is frowned
upon if it is used extensively; both the selling, buying and exchanging
of links is viewed as 'gaming' a search engine: as it is not an
If a website has been banned by a search engine, it is possible
to submit a re-submission request;
if you feel the website was wrongly penalised, or, the content has
been changed. Websites can be penalised by search engines for a
range of practices, the most obvious of which is: cloaking.