Internet Guide Logo

World Wide Web

bullet Introduction

The World Wide Web is a non proprietary hypertext document system that is available on the Internet, and is also referred to as: w3, web, www, and 'the web'. The World Wide Web was not the first hypertext system, but, it was the first hypertext system that was successfully interconnected with the software systems (TCP/IP) of the Internet. Designed with the aim of giving users unfettered access to information, and with no centre, the web aims to give a technological levelling of hierarchy (horizontal); earlier information systems tended to be vertical, with a strong power structure. This, in part, is due to the World Wide Web supporting unidirectional links: where a resource can be linked to without the resource having given permission. Berners-Lee managed to 'tie the knot' between hypertext and the Internet by inventing or co-inventing three technologies:

  1. HTML (universal hypertext language to design Web documents)
  2. HTTP (protocol, using URLs, to retrieve data from Web servers)
  3. URLs (unique identifier for documents hosted on Web servers)

The World Wide Web is designed with a client server model; a client-server model splits the workload between a server (provides the data) and the client (who requests the data). While the server will share its resources with the client, the client usually accesses the resource, and does not share any of it's resources. Therefore, the World Wide Web is a system that consists of web servers (computers that host files) and web browsers (client programs that retrieve the files). The World Wide Web is named a 'Web' because hypertext documents (webpages) are connected together - in a system likened to a 'Web' - through the use of hyperlinks. Hyperlinks are text/images that are embedded into hypertext documents (webpages), and include a URL. Uniform Resource Locators (URLs) are a type of Uniform Resource Identifier (URI), and are used by the Hypertext Transfer Protocol (HTTP) to locate the computer a webpage is stored upon.

The World Wide Web is not the only online document retrieval system: Gopher is an example of another system. The World Wide Web is commonly confused with being the Internet: the World Wide Web is a service accessed on the Internet, the Internet existed before the World Wide Web and could continue to exist without it; that said, the World Wide Web is the most popular service used on the Internet, and is essential to many 'real world' civil and business services.

bullet History

Sir Tim Berners-Lee is credited with inventing the World Wide Web: from 1989-1991, he proposed and developed the software systems of the Web while he was employed at CERN. Berners-Lee had attempted to build a document system for CERN in the early 1980s: the system was named ENQUIRE, and it's purpose was to help CERN scientists share information, and to avoid losing information. Tim Berners-Lee was in the 'right place at the right time' to invent the World Wide Web: from 1983, the CERN Networking Group - which included members: Brian Carpenter, Giorgio Heiman, François Flückiger, Jean-Michel Jouanigot - had begun to build an internal and external networking infrastructure. The CERN Networking Group decided to implement TCP/IP instead of the ISO networking standard, and by 1991, the CERN external network was a hub for international Internet traffic. By 1991, the CERN network was responsible for handling up to 80% of Europe's international Internet traffic. Therefore, CERN was the ideal location to launch a new Internet service.

After the failure of ENQUIRE, Berners-Lee proposed a new hypertext project - taking advantage of CERN's IP network - that would be available to 'everyone'. When Berners-Lee proposed this new project to his boss Mike Sendall, he referred to the system as a "Mesh" - he later decided upon "World Wide Web" when writing the code for the system in 1990. Berners-Lee's proposal was submitted in March 1989, and was titled "Information Management: A Proposal"; Mike Sendall wrote on this paper 'Vague, but exciting'. In May 1989, Berners-Lee appears to have published the same proposal but titled it as "a large hypertext database with typed links". This proposal was not successful, but it led Berners-Lee to ask for assistance from Robert Cailliau to develop a more 'concrete' proposal: the proposal they developed was published on the 12th of November 1990 and was titled: "WorldWideWeb: Proposal for a HyperText Project". This proposal was 'green lit' and Cailliau and Berners-Lee began the process of building a development team to launch their new hypertext project. The project was originally referred to as the CERN WWW Project, and the members of the WWW Project included: Alain Favre, Arthur Secret, Bebo White, Bernd Pollermann, Carl Barker, Dan Connolly, David Foster, Eelco van Asperen, James Whitescarver, Jean-Francois Groff, Jonthan Streets, Nicola Pellow, Peter Dobberstein, Paul Kunz, Pei Wei, Robert Cailliau, Tim Berners-Lee, Tony Johnson, and Willem van Leeuwen.

Berners-Lee decided that the hypertext documents of the World Wide Web would be read-only, and accessed by a client server architecture (browsers). The first World Wide Web server (used by Berners-Lee) was a NeXT computer, and the first web server software was named CERN httpd; which was designed by Ari Luotonen, Henrik Frystyk Nielsen, and Tim Berners-Lee. Since then, there has been a plethora of HTTP server software, such as Apache, that have simplified the process of hosting web servers, and have helped to popularise the World Wide Web. Berners-Lee was responsible for designing the first web browser, unsurprisingly named: WorldWideWeb. The second browser created - a version of the WorldWideWeb browser that was ported to several operating systems - was the Line Mode Browser, and it was designed by: Tim Berners-Lee, Henrik Frystyk Nielsen and Nicola Pellow. In 1992, Robert Cailliau and Nicola Pellow developed the first browser for the Macintosh platform: named MacWWW. Pei Wei, another member of the WWW project, developed the ViolaWWW hypertext browser.

The World Wide Web was first available as an Internet service on the 6th of August 1991: when Berners-Lee released information about his "Hypertext project" on the newsgroup: alt.hypertext. However, Berners-Lee had launched the first web server on the 25th of December 1990; some simple webpages were available for download, but only a select number of people knew the project existed. On the 30th of April 1993, CERN made the World Wide Web's software - such as a library of code - publicly available; with the aim of increasing its popularity. CERN also announced that the World Wide Web would be free to use, and no licence fee would be charged to developers (unlike with Gopher who charged a licence fee to host a server). Alongside CERN decision to disclaim ownership of the Web, another key factor in making the Web the Internet's most popular information system was the Mosaic web browser: this browser was easy to use, stable, and was capable of displaying graphics/text on the same page. Most early web browsers were used by thousands of user, the Mosaic web browser was the first browser to be used by millions of users.

The World Wide Web was fortunate to be launched at the same time that the Internet was transitioning from a U.S. government funded network to a commercial network. By 1995, the World Wide Web was the Internet's most popular service, and was responsible for the creation of large tech companies like Amazon, Yahoo!, Paypal, and eBay. The early growth of Internet and the World Wide Web led to the dot com bubble: where the shares of Internet companies soared in value (1997-2000) and then crashed in value. In 1994, Berners-Lee left CERN and founded the World Wide Web Consortium (October 1994); the purpose of the World Wide Web Consortium is to create new web standards and to educate web developers.

bullet HTTP and Internet Protocols

The World Wide Web is a service/application found on the Internet: the Internet is a system of interconnected computer networks that uses TCP/IP (Internet protocol suite). HTTP (Hypertext Transfer Protocol) is the protocol that the Web uses, and it is located in the application layer of Internet protocol suite. The World Wide Web could not function without HTTP: HTTP is a 'request-response' protocol, which means that one computer sends a request for data and another computer responds to the request.

As with most application layer protocols, the World Wide Web is based upon a client-server computing model: a client application program (browser), residing on a user's computer, will use HTTP to send a request to retrieve data from a web server connected to the Internet. Computer files are retrieved by a client program (browsers) using a Uniform Resource Identifier (URI); the World Wide Web uses a URI that is named: Uniform Resource Locator (URL). URL's are embedded within hyperlinks: hyperlinks, usually referred to as 'links', are embedded within webpages (hypertext documents), and a user simple has to click on a hyperlink and the browser will use HTTP to locate the resource.

When a client program (browser) requests HTTP to locate and retrieve a computer file more than one Internet protocol will be used in the process of retrieving data from a server. HTTP, through a process of encapsulation, typically uses the Transmission Control Protocol (TCP) of the transport layer of the Internet protocol suite: TCP ensures that application layer data is reliable sent and received. The TCP data segments will then be encapsulated (enveloped) into an Internet Protocol (IP) packet, which will then be encapsulated in a link layer frame as it 'hops' across the Internet from host to host. The process is likened to a letter being placed inside an envelope that is placed inside another envelope that is placed within a final envelope.

bullet Content on the World Wide Web

If a user wants to upload content to the World Wide Web, then they need to upload it to a web server: a web server is a computer system that will process requests via HTTP. The most common files uploaded to a web server are images files (gif, jpeg) and html documents. The next issue is how do users access the files / content located at a web server: one option is via the web servers IP address, but the most common way is through the Domain Name System (DNS). A domain name is registered, such as example.com, then DNS records are created that tie the domain name to the web server: all users need to do is enter the domain name address, such as example.com/file.html, to locate files / content uploaded to a web server.

A collection of webpages (typically html documents including a index.html file) are uploaded to a web server - tied to a domain name - the overall resource of content is termed a website. If a user wishes to create their own website: all they need to do setup/rent a web server, register a domain name, and upload web content to the web server. Users can then access the web content by entering its URL (includes a domain name) into a browser and HTTP will retrieve the content. The web server will have a bandwidth limit per month, if this bandwidth (download limit) is exceeded, then the content will be unreachable and requests for the content will be served with an error message.

Web content 'falls' into two broad categories: commercial and non-commercial. The World Wide Web has spawned many successful online commercial businesses; which are referred to as e-tailers (e-tailing) and e-commerce businesses. Some notable e-commerce businesses are Amazon, eBay, and Paypal - most of these businesses are located in California, the state in which ARPANET was launched (the network that evolved into the Internet). Some commercial websites have even launched their own virtual currency, such as Bitcoin. While the world's largest technology corporations are typically located in a tech centre named Silicon Valley (California), UK technology companies can be found in Silicon Glen (Scotland) and Silicon Roundabout (East London Tech City).

bullet Accessing and finding Web content

When the World Wide Web was launched in the early 1990s, the general public required three things to access the web: 1) computer with an operating system and hardware that supported TCP/IP network access; 2) a web browser (client) that could retrieve content from web servers; 3) an access account with an Internet Service Provider (ISP). Due to the relative high cost of purchasing the aforementioned requirements, cybercafes became a new business venture that provided access for those curious about the newfangled 'information superhighway'. The difficulty of accessing the Internet (TCP/IP) is highlighted by the fact that Windows 95 did not originally ship with a default TCP/IP network installation or a browser. Therefore, in 1995, the first edition of the world's most popular operating system did not come Internet ready; later editions did rectify this issue and packaged the Internet Explorer browser with the operating system.

Due to the painfully slow download/upload speed of dialup - the leading access technology before the year 2000 - the content available to web users was fairly basic and mostly consisted of text and images. From 1992-1994, the web was extremely small in comparison to the present day, and content was found either through word of mouth, or from lists of links (directories) provided on newly founded websites like Yahoo! As the web expanded in the mid 1990s, it became clear that maintaining lists of links was not feasible and an automated alternative was required: thus the search engine was born. Alta Vista was launched in 1995, and was the most popular search engine before the creation of Google in 1998.

By the late 1990s, it had become easier to access the web: computer and modem costs had lowered, Windows editions were now Internet ready, and new UK ISP's, like Freeserve, were offering 'pay as you use' access accounts with no expensive startup costs. When broadband (DSL technology) was launched in the UK (year 2000), the Internet and the World Wide Web started to be viewed as more than just a novelty, or the preserve of boffins or nerds. The potential inherent (over 10 times faster than dialup) in broadband gave content creators the ability to rival established media technologies: television and radio (youtube, podcasts etc). From 2000-2005, the World Wide Web was establishing itself as a place to do business and access media: content was increasing, search engine algorithms were becoming more sophisticated, computer technology was expanding to offer voice and video capabilities, and broadband was enabling this evolution.

The biggest difference between web access in the year 2005 and 2017, is the access device being used: in 2005 it was a desktop computer, in 2017 it is now split evenly between computers, tablets and smartphones. The software used is still the same: a browser, mobile browsers have had to be developed, and websites designed to be mobile friendly. Early websites were often designed with frames, and tend to be unsuited for rendering on a smartphone screen. Search engines are still the most popular way in which to find web content, Google has indexed billions of webpages into its search results, and the problem search engines now have is dealing with the amount of new content that is created. Social media sites are beginning to rival Google for providing a 'hub' from where to access and locate content: with most individuals and businesses having created a Twitter and Facebook page. Video content (primarily Youtube) is competing with text pages as the dominant form of content on the web; prior to the launch of broadband, video content was not feasible. Access accounts have largely remained the same in terms of cost, but the download speed and usage limit has improved considerable with the launch of superfast broadband (fiber optic technology). Wireless access is now the norm: through wifi in the home / business and mobile network access 'on the move'.

Efforts have been made to improve access for people with disabilities: the World Wide Web Consortium (W3C)'s launched the Web Accessibility Initiative (WAI) in 1997, whose goal it was to improve Web accessibility for people suffering from: auditory disabilities, cognitive / intellectual disabilities, visual disabilities, motor / mobility disabilities, and seizures disabilities. Persuading webmasters not to use flashing or strobing effects on their websites - that effect people suffering from photo epileptic seizures - is one example where initiatives, like the WAI, can help ensure that web content is correctly designed for the disabled.

bullet Privacy and the World Wide Web

While there is no requirement to record the browsing history of the World Wide Web, the majority of browsing is recorded to some degree. The World Wide Web is based upon a client-server model: a client program (browser) requests and retrieves files (webpages, pictures) from a web server (computer connected to the Internet). Therefore, whenever a file is requested and retrieved upon the World Wide Web: the client and the server usually records the session. Web server are installed with software that usually logs the IP address of all incoming requests for data. Likewise, the browser (client) of the user will usually record the data transmission by keeping a copy of the retrieved files in its cache (directory) and keeping a record in the history feature of the browser. Internet Service Providers - the network a user accesses the Internet with - will also keep a record of each user's usage. The internal policies of ISP's differ: it is difficult to know precisely what an ISP will record and store, and for how long. ISPs will only share their user logs when it is demanded by a legal entity.

Alongside server logs, websites can also record the browsing habits of its user's by using HTTP cookies (invented by Louis Montulli). Cookies are small files, stored on the user's computer, that store information, such as: username authentication, password authentication, past browsing history. Therefore, if a user returns to a commercial website - for example eBay - the user will not be required to enter their username and password again, and the website can serve content to the user that is related to the content they viewed the last time they visited the website. Users can delete cookies whenever they wish, and the typical (first party) cookie does not pose a serious privacy risk; especially if the user has not provided personal identifiable information to the website. However, tracking cookies, referred to as third party cookies, can compile a long term record of a users browsing history - as they record browsing habits at multiple websites - and are sometimes viewed as malware.

Most websites have a 'privacy policy' that typically promise to keep users personal details and usage history secret; though sometimes they will share this data with third parties, which should be disclosed in the 'terms of use' of the website. Social media websites (Twitter, Facebook, LinkedIn, Google+) are by their nature more open when it comes to privacy, with most users openly sharing information publicly. While social media website do include privacy options, many users are probably unaware that the information they upload to these sites is often data mined to identify patterns and establish relationships (usually related to targeted adverts). Additionally, due to the extensive amount of personal information a user shares on social media, the long term impact it may have upon a persons 'real life' is far greater than with other types of websites.

Further Reading:

1. WWW Error Messages: understand the error messages for unavailable Web pages.