LIS 525 - Logging
A log file is an automatically updated file
that lists actions that have occurred.
A Web server maintains a log file
listing every request made to it.
Additional information about requesters
may be logged by means of cookies.
Since log files can be very long
and mix information about different requesters,
it is useful to have some log file analysis tools,
which can summarize such things as site performance,
visitor origins and return rates,
and navigation patterns.
Some Web hosting services
(6775 in a search on HostIndex.com, October, 2007),
allow clients to access their own log files.
Many provide access to output from log file analysis tools.
Some Things To Do With Log Files
- Look for incomplete hits by noting the bytes transferred.
- Look for visitors who never click past your homepage
or never get to some page that you consider a target.
- Look for visitors repeatedly entering your site
on pages other than your homepage
(you may want to add more keywords or meta tags to the home
page).
- Look for "file not found" errors
(files may be missing or links or search engine entries
may need updating).
- See whether visitors are using browsers
that support features on your site.
- If you have an on-site search engine,
look for patterns of keywords that visitors use.
- Check for evidence of inappropriate behavior
by content providers
(such as using small images in spam HTML mail
to gather recipients' IP numbers surreptitiously).
- See how your visitors are finding your site
(if you have referrer logging enabled,
and with the caveat that this information may be faked by spammers).
Some Features of Log Analyzers
- Highlighting text passages that meet given criteria.
- Summarizing statistics
(successful requests,
successful requests per day,
successful requests for pages,
successful requests for pages per day,
failed requests,
redirected requests,
distinct files requested,
corrupt logfile lines,
unwanted logfile entries,
data transferred in Gbytes,
data transferred per day,
requests per TLD,
requests per directory,
most requested files,
most requesting hosts,
files cached, etc.)
- 2-D (text or image) and 3-D graphing of statistics.
- Transferring log entries to a database.
- Outputting results in HTML format.
- Playing sounds when certain events happen.
- Identifying paths taken by individual visitors.
- Measuring "stickiness"
(repeat visits)
- Handling cookies.
- Automatic visitor categorization.
Log Files In Apache
By default,
log files in Apache
are in Common Log Format.
This format contains a separate line for each request,
composed of several values separated by spaces,
in the form
host ident authuser date request status bytes
A missing value is represented by a hyphen (-).
The values are as follows:
| host
| host and domain name or IP number of the client
|
| ident
| identity information reported by the client,
if this is enabled on client and server
|
| authuser
| userid,
if a password protected document is requested
|
| date
| date and time of request
|
| request
| client request line in double quotes (")
|
| status
| three digit status code returned to the client
|
| bytes
| number of bytes in the object returned to the client,
not including headers
|
Various custom log formats can also be defined.
A common alternative is Combined Log Format,
which adds referral information
and browser information.
Here is an example of a few lines from a common log format file
(reformated as paragraphs for easier viewing):
205.152.129.34
-
-
[16/Jan/2005:06:03:16 -0500]
"GET /~craven/525prx.htm HTTP/1.0"
200
2412
"http://search.yahoo.com/search?p=proxy+lis&sm=Yahoo%21+Search&fr=FP-tab-web-t&toggle=1&ei=UTF-8"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
205.152.129.34 - - [16/Jan/2005:06:03:17 -0500]
"GET /~craven/525s.jpg HTTP/1.0" 200 740
"http://525.fims.uwo.ca/~craven/525prx.htm"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
68.142.250.47 - - [16/Jan/2005:06:05:16 -0500]
"GET /robots.txt HTTP/1.0" 404 1046
"-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
68.142.251.42 - - [16/Jan/2005:06:05:23 -0500]
"GET /~craven/525est.htm HTTP/1.0" 200 4142
"-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
IP numbers are recorded
and not resolved into host and domain names.
ident and authuser
values are generally missing.
Note the "404" (file not found) error in the third entry,
in response to a spider's request for the site's robots.txt file,
which does not exist
(the spider in this case is Slurp,
Inktomi's Web robot).
The other responses are all "200" (OK),
for files that can be delivered with no problems.
There is no referrer information in the third and fourth entries;
in the first entry,
we can see the Yahoo! query that lead to the page requested;
in the second entry,
the referrer is an HTML page
that requires the requested JPEG file
in order to display completely.
A copy of a sample log file is available on the course SharePoint site
at http://faculty.fims.uwo.ca/craven/lis525/Shared%20Documents/access_log.
There is a Linux logresolve utility
that will attempt to translate IP numbers in a log file
into host and domain names.
This utility was used to create the sample resolved log file
at http://faculty.fims.uwo.ca/craven/lis525/Shared%20Documents/access_log_r
(using the command
/usr/bin/logresolve <access_log >access_log_r).
For example,
resolving the log shows that the first two requests
in the sample above came from ns2.co.escambia.fl.us.
In this case,
looking up the IP number with ARIN
is actually more informative,
telling us that it belongs to BellSouth.net, Inc.,
in Atlanta.
Since log files continue to grow,
they need to be pruned, deleted, or rotated periodically.
Apache has a program rotatelogs
that rotates the log file
without having to restart the server.
Using Analog to Analyze a Log File
Analog is a free log analyzer
with versions that can be run under a variety of operating
systems,
including Windows and Linux.
You can download Analog from
one of many mirror sites.
To install it,
just unzip the downloaded file
and extract the contents to a new directory.
Read the introduction in how-to/startwin/index.html
and/or the guide in docs/Readme.html.
For practical use,
you will need to edit the analog.cfg configuration file,
at least by changing the HOSTNAME
and HOSTURL commands;
if you want to keep Analog's sample log file logfile.log,
you should also change the LOGFILE command
to specify the log file that you want to analyze instead.
Run analog.exe
from Windows or from the Command (DOS) prompt.
To view the report,
open report.html in a browser.
One section that is included in the report by default
is actually of no use for the sample log files:
the Monthly Report,
because each file covers only a few days.
The Domain and Organization reports
are also useless if the program is applied to the raw log file,
which contains only IP numbers, not domain names.
Using Webalizer
Webalizer is another free package for log file analysis.
For this program,
you have to specify parameters
following webalizer.exe in the command line;
for example,
webalizer.exe -n 525.fims.uwo.ca -o u:\ u:\access_log_r
Webalizer will create a file index.html
and some other files
(whose names will all contain the word usage)
in the directory specified
(the root directory of the U drive, in the example above).
It will also create, or update, a file webalizer.hist.
For More Information
- Barrett, B.L. 2006.
Home of the Webalizer.
http://www.mrunix.net/webalizer/.
- Delio, M. 2002.
"When the Spam Hits the Blogs".
Wired News.
http://www.wired.com/news/culture/0,1284,56017,00.html.
(Notes that the referrer field may be faked.)
- Hostway. 2007.
Urchin Web Statistics and Log Analysis Tool.
http://www.hostway.com/smb/web-analytics/index.html.
(US$5/month to add this service to a Hostway hosting plan.)
- McDunn, R.A. 2007.
Web Server Log File Analysis.
http://www.si.umich.edu/Classes/540/Readings/ServerLogFileAnalysis.htm.
(A basic introduction.)
- Scheeres, J. 2001.
"Follow your e-mail everywhere".
Wired News.
http://www.wired.com/news/privacy/0,1848,41686,00.html.
(How to trace HTML e-mail recipients.)
- Turner, S. 2005.
Analog: WWW Logfile Analysis.
http://www.analog.cx/
(and many mirror sites).
("The most popular logfile analyser in the world ";
free.)
- University of Western Ontario. 2006.
Information Technology Services - Web Usage Statistics.
http://www.uwo.ca/Usage/servers.html.
(Links to various current summaries of requests for files
on the university's Web servers.)
- Uppsala Universitet. 2004.
Access Log Analyzers.
http://www.uu.se/Software/Analyzers/Access-analyzers.html.
(Links to information on log file analyzers
for Unix and Windows.)
- Winett, B. 2007.
Tracking Tutorial.
Lycos.
http://hotwired.lycos.com/webmonkey/e-business/tracking/tutorials/tutorial2.html.
(What tracking visitors to a site can tell you,
and its limitations.)
Home
Last updated October 31, 2007.
This page maintained by
Prof. Tim Craven
E-mail (text/plain only): craven@uwo.ca
Faculty of Information and
Media Studies
University of Western
Ontario,
London, Ontario
Canada, N6A 5B7