Webalizer – Apache web server log file analysis Tool

Sponsored Link
The Webalizer is a fast, free web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for viewing with a standard web browser.

Install webalizer in ubuntu

sudo apt-get install webalizer
This will complete the installation.

Configuring Webalizer

Enable the apache2 hostname resolution for this you need to edit /etc/apache2/apache2.conf file and

sudo vi /etc/apache2/apache2.conf

HostnameLookups Off

into

HostnameLookups On

By default the package will install a daily cron job which will cause the system to process the logfiles once a day, it will always run after the default Apache logfile rotation, which means that instead of examining the logfile /var/log/apache2/access.log it will use the previous one /var/log/apache2/access.log.1

If you want to change to current apache log file you need to edit /etc/webalizer/webalizer.conf

sudo vi /etc/webalizer/webalizer.conf

change the following parameters

LogFile /var/log/apache2/access.log.1

to

LogFile /var/log/apache2/access.log

OutputDir /home/www/webalizer

Incremental yes

PageType htm*

PageType cgi

PageType php

HideURL *.gif

HideURL *.GIF

HideURL *.jpg

HideURL *.JPG

HideURL *.ra

IgnoreURL /taskbar*

Each parameter in Detail

LogFile /var/log/apache2/access.log

The option LogFile specifies the logfile to use with Webalizer. The default log file is supposed to be the access_log of Apache Web Server, but you can specifies a different one, like the one Squid Proxy Server makes named access.log if you use it in httpd-accelerator mode.

OutputDir /home/www/webalizer

The option OutputDir specifies the location of the output directory to use for the reports of Webalizer. All present and future report files generated by the Webalizer program will be hosted in this directory. It is recommended that you create this directory where your Apache web site resides.

Incremental yes

The option Incremental if set to Yes tells the program only to process partial logs file, and allows you to rotate your log files as much as you want without the loss of access information. It's recommended to set this option to Yes.

PageType htm* cgi php

The option PageType specifies what file extensions you want Webalizer to consider as a page to count. Each added file extensions must be specified on its own line as shown in the Webalizer configuration file above.

HideURL *.gif *.GIF *.jpg *.JPG *.ra

The option HideURL specifies what kind of items such as graphic files, audio files or other non-html files to hide from the reports page. Each added item must be specified on its own line as shown in the Webalizer configuration file above.

IgnoreURL /taskbar*

The option IgnoreURL specifies URLs to be completely ignored from the generated statistics reports. This option can be used to ignore directories that are not important in our statistics reports. It's also useful when you want to manage and class which URLs should be monitored and which should be ignored.

After configuring all the required options .Now you need to restart the Apache2 server using

sudo /etc/init.d/apache2 restart

Running Webalizer manually first time

Now it's time to run the program to generate reports, html and graphics in the default Webalizer directory so that we can see them in our web browser interface. This step is required just the first time you install and use Webalizer, since it's preferable to use a cron job to automate this task in the future. To run Webalizer manually, to generate reports, use the following command

webalizer

At this stage, we should verify that Webalizer is working on the system. To do that, point your web browser to the following address:

http://my-web-server/webalizer/. The my-web-server is the address where your Apache web server lives, and usage is the directory that host all the Webalizer reports files.

This works well for single sites, but if you have a group of websites all on the same machine you might need to make some changes.

The way that you handle multiple websites on one host is to place all the files beneath a common directory /home/www, such as:

/home/www/

www.domain1.com/logs/ and www.domain1.com/stats

www.domain2.com/logs and www.domain2.com/stats

Here we have two sites www.domain1.com, and www.domain2.com, each has its own logs,stats/ subdirectory where Apache places the logfiles and stat files.

To handle this simply you merely copy the default webalizer.conf file from /etc into each of the log directories:

cp /etc/webalizer.conf /home/www.domain1.com/logs

cp /etc/webalizer.conf /home/www.domain2.com/logs

Now if you make the changes to the configuration file so that each one has:

Logfile access.log

OutputDir ../stats/

You can update the stats by running

cd /home/www/www.domain1.com/logs

webalizer -q

cd /home/www/www.domain2.com/logs

webalizer -q

(The -q flag merely makes the program run quietly).

These two commands can be placed inside a shell script and invoked automatically be a cron job belonging to a user who can write to the stats directory -- and you can remove the default job by running

rm /etc/cron.daily/webalizer

Sponsored Link

You may also like...

15 Responses

  1. Lee Dickey says:

    Poor choice of Web stat applications. Webalizer is no longer in development and hasn’t been since 2002. AWstats would have been a more logical choice for this tutorial.

  2. Jose Garcia says:

    Lee is mistaken, and even it he were not, a good tool is valuable regardless of it’s development status.

  3. Scott says:

    Webalizer was just updated in mid July 2008, the Ubuntu package will probably not be getting this update for a while. Any chance you woould be willing to tackle attempting to upgrade to the new tool? According to the readme, installing the new version is as easy as copying over the old release. That would be an interesting how to. 😉

  4. ienabellamy says:

    Hello ! i followed all the tutorial, but when i do “webalizer -q” (with sudo) in the “logs” directory, i taken this error:
    Error: Can’t open log file access.log

    So, i had create a “touch access.log”, but the situations is this: “No valid records found!”

    any helps ? thx

  5. ienabellamy says:

    i resolved: first i’ve say to virtual host to save the error log and the access log in dir of virtual host…and not on: /var/log/apache2/……

    second: the second error: no valid records found, it’s because webalizer cannot create stats for an access.log WITHOUT ANY CONNECTION.

    so, connect to your virtual host from a proxy, and after re-do sudo webalizer -q 🙂

  6. lee.j says:

    Poor choice of Web stat applications. Webalizer is no longer in development and hasn’t been since 2002. AWstats would have been a more logical choice for this tutorial.

    i have to disagree there, i have found it very powerful and returning exactly what i want.
    as for awstats i found it over complex for a log display system.

  7. Rob says:

    Hello
    thanks for this page.

    I did some tests and at least with Debian Etch and webalizer Version: 2.01.10-32 , the daily cron script check the current log when LogFile is set to /var/log/apache2/access.log.1

    so this part could be changed above:

    vi /etc/webalizer/webalizer.conf

    ## the cron script seems to check current log file always.
    # this WILL get current and 1-st rotated log:
    LogFile /var/log/apache2/access.log.1

  8. Rob says:

    I was wrong, ignore the prev post.

    the current log is not read when log is set to /var/log/apache2/access.log.1

    So I think webalizer needs to be run before logs are rotated. To do so logrotate in /etc/cron.daily nust be run after webalizer . otherwise access.log could get rotated before webalizer processes it.
    I think /etc/cron.daily scripts are in alphabetical order. so /etc/cron.daily/webalizer should be renamed to /etc/cron.daily/00webalizer , at least that is what I’ll try.

  9. Rob says:

    i forgot to write: : set this
    LogFile /var/log/apache2/access.log

  10. John says:

    Hello Everyone,

    I finally got a change to configure webalizer and Im running a little bit delema. Here is my scenario:
    I’m running 2 websites (virtual server) on my ubuntu 8.04 server. both sites are already up and running http://www.johnlauder.com & http://www.statewideinvestors.com
    My server hostname = server01.statewideinvestors.com

    both sites have their own subdirectory …/site1/ & …/site2/
    I have copied webalizer.conf together with access.log to …/site1/webstats/ & /site2/webstats/

    I have also edited webalizer.conf on each site changing:
    – Logfile access.log
    – OutputDir …/site1/webstats (for 1st site webalizer.conf) & …/site2/webstats (for 2nd site webalizer.conf)
    – Incremental yes
    – HostName http://www.statewideinvestors.com (site1) & http://www.johnlauder.com (site2)

    my problem, when i check the sites:
    http://www.johnlauder.com/webstats —-> it shows server01.statewideinvestors.com
    http://www.statewideinvestors.com/webstats —-> it also shows server01.statewideinvestors.com

    I want it to show the individual site statistics NOT the server01 stats.

    What I understand is, the HostName defines the report if specified (which I did) and I dont know why it is not doing it. I have also run webalizer and webalizer -q (which I think same effect <– pls confirm it) on each site directory I have rebooted and restart server but no luck.

    The other thing i dont understand is, when I ran Webalizer, It shows “Hostname for report is ‘www.johnlauder.com'” and “Hostname for report is ‘www.statewideinvestors.com'” respectively

    Anything I miss? pls help. and thank you in advance

    John

  11. John says:

    Resolved.
    Just in case someone in the future will be in same scenario as mine and have same problem. Change the Logfile directory from:

    Logfile access.log (as mentioned above)

    to Logfile …/site1/log/access.log
    & Logfile …/site2/log/access.log

    I also edit the …/sites-available/site1 and …/sites-available/site2 config file

    on the line CustomLog, I point to the directory above.

    in my case: CustomLog …/site1/log/access.log
    and
    CustomLog …/site2/log/access.log

    restart everything and it works… Thanks for the howto page.

  12. gauri says:

    whenever I run webalizer it shows:
    Webalizer V2.01-10 (Linux 2.6.27-9-generic) locale: en_IN
    Using logfile /var/log/apache2/access.log (clf)
    Using default GeoIP database
    Creating output in /var/www/webalizer
    Hostname for reports is ‘bios-4’
    Reading history file… webalizer.hist
    Generating report for March 2009
    Generating summary report
    Saving history information…
    670 records (660 ignored) in 1.00 seconds, 670/sec

    But no new report is shown in the html pages..
    why are the records getting ignored

  13. Djoh says:

    To enable the apache2 hostname resolution is necessary ?

    OMFG, what about the website performances ??? This is really bad idea…

  14. lol says:

    for multiple sites, set up a different config file and specify the hostname that you want displayed and the logs for each site, and then schedule a cron job to run with webalizer -c webalize-this-site.conf
    You will also want to configure different output directories or subdirectories for each site so it doesn’t overwrite things.

  15. Hentry says:

    Hi I am new to this webalizer. Its kinda basic or senseless question for you, but its more important for me. One of my colleague was working on my Linux based web server he left the job.

    How do I know my webalizer is working or not? because its been a year I generated report.

    If yes where its creating a log file for hits and views ( not the HTML output ) the file which will be used to generate output.

    Please guys its very urgent.

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *