LIS 505 - HTML introduction

HTML (HyperText Markup Language) is the language, or coding scheme, used to create Web pages. There are several different versions. Modern browsers will generally support almost all the features of earlier versions. Internet Explorer supports some additional features which are not part of the approved HTML standards. The following is a fairly basic introduction to traditional HTML.

Tags

An HTML file is basically just a text file, which may contain references to other files. Angle brackets (<, >) are used to set off tags, which give information about the structure of the page. For example, the tag <p> indicates the beginning of a new paragraph. Tags often come in pairs, with the second repeating the first but with a slash (/) after the <; for example,
<p>This is a short paragraph.</p>
The paired tags and the material between them are referred to as an element; the illustration given above is an instance of a p element.

Not all tags have to be paired; some can just stand by themselves. For example, the tag <br> indicates a line break. So, while the HTML code

Books
Periodicals
will be rendered by a browser as
Books Periodicals
the HTML code
Books
<br>Periodicals
will be rendered as
Books
Periodicals

Some validating software does object to unpaired tags. A way round this problem for tags like <br> for which a paired tag is meaningless is to insert a space plus slash ( /) just before the >; for example

Books
<br />Periodicals

This style is still relatively rare on actual Web pages.

Basic structure of an HTML file

Generally, the extension .htm or .html on the file name is enough for the browser to recognize the file as being in HTML and to render it accordingly. Still, it is considered good practice to enclose all the HTML code in a <html> and </html> tag pair. Another good practice, that will avoid odd problems with some browsers, is to divide your HTML code into a head element and a body element. The head element is typically used for information about the page that will not appear in the main display area of the browser. The body element, on the other hand, contains material that the browser will render in the main display area.

Thus, the basic template for a normal HTML file is

<html>
<head>
</head>
<body>
</body>
</html>

Again, some validation software will consider this a bit too basic. In addition to the <html> tag, it will expect to see a special !doctype tag before the HTML code. This tag, which is not actually in HTML, serves to identify what version of HTML is being used in what follows. HTML editing software usually supplies this tag automatically when you start a new file. For example, Arachnophilia starts each file with

<!doctype html public "-//w3c//dtd html 3.2//en">

The page title

The head element should normally contain at least a title element; this defines the title of the page, which will appear in the caption bar in most browsers when the page is displayed and is often used by search engines. For example, the HTML code
<html>
<head>
<title>Books and Periodicals</title>
</head>
<body>
</body>
</html>
will show up as empty in the main display in a browser, but with the title Books and Periodicals in the caption.

Attributes

For many types of HTML tags, you can specify one or more attributes. These specifications are inserted in the tag after the tag name and before the > (or />). Each specification should be in the form of a space or end of line, followed by an attribute name, an equals sign (=), and an attribute value. Attribute values should generally be enclosed in quotation marks, though the quotation marks can be omitted for very simple values that contain no spaces or other special characters. The <body> tag has various attributes that can be used to specify characteristics of the page as a whole. The bgcolor attribute specifies the background color of the page. So, for example,
<body bgcolor="red">
would result in the page's being displayed with a red background.

There are a limited number of color names that are recognized as valid color values (16 in official HTML 4, though browsers may recognize additional names, such as "puce"). If a color name is not valid, you may get black or white or something else, depending on the browser and the structure of the invalid color value. For finer control of colors, you can use RGB (red-green-blue) numeric codes. These start with a number sign (#) and then use two octal digits (00 to FF) for each of the three color elements. So, for example,

<body bgcolor="#FF7F3F">
gives a background that is a slightly muted orange (red at 100%, green at 50%, and blue at 25%). Most browsers will forgive you if you omit the #.

Another attribute of the <body> tag is background. This is an example of an attribute that is a reference to another file. In this case, the file referred to should contain an image that you want to appear as a tiled background when the page is displayed. If the image file is in the same directory as the HTML file, it can just be referred to by its name and extension. For instance,

<body background="505.jpg">
would cause copies of the image in 505.jpg to appear in the background on the page. Widely supported image file formats are GIF, JPEG, and PNG. If a format is not supported, most browsers will just ignore the attribute.

Headings

A heading (not to be confused with the head element) can be coded in HTML with the <h1>, <h2>, <h3>, <h4>, <h5>, or <h6> tag, together with a matching </h1>, etc. Typically, the <h1> is used for the main title that you want to appear at the top of the page display. Most browsers display headings in boldface sized according to their level. For example,
<h1>Books and Periodicals</h1>
causes the browser to display

Books and Periodicals

whereas
<h6>Books and Periodicals</h6>
causes it to display
Books and Periodicals

Lists

If you want to include a list in an HTML file, you can use either an ol element, for an ordered (numbered) list, or a ul element, for an unnumbered (bulleted) list. To mark the start of each item within the list, use the <li> tag. For example,
<ul>
<li>Books.
<li>Periodicals.
<li>Software.
</ul>
would be rendered as
whereas
<ol>
<li>Books.
<li>Periodicals.
<li>Software.
</ol>
would be rendered as
  1. Books.
  2. Periodicals.
  3. Software.

Typography

For simple variation of type style within text, you can use tag pairs such as <i> and </i> (for italics), and <b> and </b> (for bold). (For more advanced typographical variation, you can use either the font element or the style attribute that can be specified for a number of other tags.)

Simple centering of text can be accomplished with the tag pair <center> and </center>. (A number of elements also have an align attribute that can be given the value of center.)

Images

To include an image on your page, use the <img> tag. By itself, this tag just results in a generic error image. To be useful, its src (source) attribute has to identify a file containing an image in a supported format. Again, as with the background image, this is quite simple if the image file is in the same directory. It is also possible to refer to an image file in another directory. For example,
<img src="images/portrait.jpg">
tells the browser to look for the file portrait.jpg in the subdirectory images of the directory containing the HTML file.

Links

The <a> </a> tag pair serves two main purposes. The first is to mark a hypertext hot link to another file or to another place in the same page. To do this, the <a> tag has to have the value of its href attribute identify the link's destination. If this destination is just another file in the same directory, the value can be just the file's name and extension. For example,
<a href="index.htm">Home</a>
just specifies a link to the home page index.htm, which the browser renders as
Home
To point to a file on a server somewhere else, you should set the value to the file's URL. For example,
<a href="http://www.uwo.ca/">University of Western Ontario</a>
which the browser renders as
University of Western Ontario

The second main purpose of the <a> tag is to mark a target location (or bookmark) within a page to which a hot link can point. This is done with the name attribute, to which you give an arbitrary (possibly mnemonic) value. For example,

<a name="name_ex">A named anchor</a>
which the browser renders as
A named anchor
(Note how the enclosed text looks plain and is not underlined or its color changed, because it is not itself a link.)

To make a link point to such a target location, set the value of the link's href attribute to a number sign (#) plus the target's name. For example,

<a href="#name_ex">Go to the named anchor.</a>
which the browser renders as
Go to the named anchor.

You can also make a link to a named location in another file; for example,

<a href="505t10.htm#HTML">HTML</a>

In addition to the a element's main uses, there are some other things that you can do with it. For example, if you make the value of the href attribute into mailto: plus an e-mail address, you get something that the user can click on and send an e-mail to the address (assuming there is a default e-mail program set up on the user's computer):

<a href="mailto:craven@uwo.ca">E-mail</a>

Special characters

You may notice how often < and > have appeared in the examples on this page; but, if you tried to include these characters on your own page by just typing them into your HTML code, the browser would mostly just treat them as marking the beginnings and endings of tags, so that they would not be displayed and other parts of your text would also be hidden. For such special characters, you need to use character entity references; these all start with an ampersand (&) and end with a semicolon (;), with a mnemonic for the special character between the two. Commonly useful character entity references are
&lt; < less than sign, or left angle bracket
&gt; > greater than sign, or right angle bracket
&amp; & ampersand
&quot; " double quote

Naming your files

Two guidelines that you should follow when naming web pages and other files for a web site are Spaces in filenames are a problem because, like some other special characters, they have to be "escaped" (translated into special three-character codes beginning with %) before they can be used in URLs, and also because some editing software may change them into ends of lines. Keeping filenames in lower case is a good idea because Windows (the operating system on GRC workstations) is case insensitive but Unix (the operating system on publish.uwo.ca) is case sensitive; so, for example, when you test a Web site locally, a reference to "index.htm" can be recognized by Windows as referring to the file Index.htm, but, when the Web site is published, the server looks only for index.htm and thinks that Index.htm is a different file.
Home

Last updated August 28, 2003.
This page maintained by Prof. Tim Craven
E-mail (text/plain only): craven@uwo.ca
Faculty of Information and Media Studies
University of Western Ontario,
London, Ontario
Canada, N6A 5B7