Fast web log analyzer and interactive viewer.
goaccess [-f input-file][-c][-r][-d][-m][-q][-o][-h][...]
goaccess is a free (GPL) real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems. It provides fast and valuable HTTP statistics for system administrators that require a visual server report on the fly. GoAccess parses the specified web log file and outputs the data to the X terminal. Features include:
Number of valid requests, number of invalid requests, time to analyze the data, unique visitors, unique requested files, unique static files (css, ico, jpg, js, swf, gif, png) unique HTTP referrers (URLs), unique 404s (not found), size of the parsed log file, bandwidth consumption.
HTTP requests having the same IP, same date and same agent will be considered a unique visit. This includes crawlers.
Hit totals are based on total requests. This module will display hits, percent, bandwidth [time served], [protocol] and [method].
Hit totals are based on total requests. Includes files such as: jpg, css, swf, js, gif, png etc. This module will display hits, percent, bandwidth, [time served], [protocol] and [method].
Hit totals are based on total requests. This module will display hits, percent, bandwidth, [time served], [protocol] and [method].
Hit totals are based on total requests. This module will display hits, percent, [bandwidth, time served]. The expanded module can display extra information such as reverse DNS and country. If -a is enabled, a list of user agents will be displayed by selecting the IP and hitting the return key.
Hit totals are based on unique visitors. This module will display hits and percent. The expanded module shows all available versions of the parent node.
Hit totals are based on unique visitors. This module will display hits and percent. The expanded module shows all available versions of the parent node.
The URL where the request came from. Hit totals are based on total requests. This module will display hits and percent.
This module will display only the host but not the whole URL. The URL where the request came from. Hit totals are based on total requests. This module will display hits and percent.
This module will report keyphrases used on Google search, Google cache, and Google translate. Hit totals are based on total requests. This module will display hits and percent.
Determines where an IP address is geographically located. It outputs the continent and country. If it's unable to determine the country, location will be marked as unknown.
The values of the numeric status code to HTTP requests. Hit totals are based on total requests. This module will display hits and percent.
There are three storage options that can be used with GoAccess. Choosing one will depend on your environment and needs.
GLib Hash Tables
By default GoAccess uses GLib Hash Tables. If your dataset can fit in memory, then this will perform fine. It has average memory usage and pretty good performance. For better performance with memory trade-off see Tokyo Cabinet on-memory hash database.
Tokyo Cabinet On-Disk B+ Tree
Use this storage method for large datasets where is not possible to fit everything in memory. The B+ tree database is slower than any of the hash databases since it has to hit the disk. However, using an SSD greatly increases the performance. You may also use this storage method if you need data persistence to quickly load statistics at a later date.
Tokyo Cabinet On-Memory Hash Database
Although this may vary across different systems, in general the on-memory hash database should perform slightly better than GLib Hash Tables.
Multiple options can be used to configure GoAccess. For a complete up-to-date list of configure options, run ./configure --help
--enable-debug
Compile with debugging symbols and turn off compiler optimizations.
--enable-utf8
Compile with wide character support. Ncursesw is required.
--enable-geoip
Compile with GeoLocation support. MaxMind's GeoIP is required.
--enable-tcb=<memhash|btree>
Compile with Tokyo Cabinet storage support. memhash will utilize Tokyo Cabinet's on-memory hash database. btree will utilize Tokyo Cabinet's on-disk B+ Tree database.
--disable-zlib
Disable zlib compression on B+ Tree database.
--disable-bzip
Disable bzip2 compression on B+ Tree database.
The following options can be supplied via the command line or the long options through the configuration file.
--date-format=<dateformat>
The date_format variable followed by a space, specifies the log format date containing any combination of regular characters and special format specifiers. They all begin with a percentage (%) sign. See `man strftime`.
Note that there is no need to use time specifiers since they are not used by GoAccess. It's recommended to use only date specifiers, i.e., %Y-%m-%d.
--log-format=<logformat>
The log_format variable followed by a space or \\t for tab-delimited, specifies the log format string.
Note that if there are spaces within the format, the string needs to be enclosed in double quotes. Inner quotes need to be escaped.
-c --config-dialog
Prompt log/date configuration window on program start.
--color-scheme<1|2>
Choose among color schemes. 1 for the default grey scheme. 2 for the green scheme.
--no-color
Turn off colored output. This is the default output on terminals that do not support colors.
-f --log-file=<logfile>
Specify the path to the input log file. If set in the config file, it will take priority over -f from the command line.
--debug-file=<debugfile>
Send all debug messages to the specified file. Needs to be configured with --enable-debug
--config-file=<configfile>
Specify a custom configuration file to use. If set, it will take priority over the global configuration file (if any).
--no-global-config
Do not load the global configuration file. This directory should normally be /usr/local/etc, unless specified with --sysconfdir=/dir.
-e --exclude-ip=<IP|IP-range>
Exclude one or multiple IPv4/6, includes IP ranges. i.e., 192.168.0.1-192.168.0.10
-a --agent-list
Enable a list of user-agents by host. For faster parsing, do not enable this flag.
-M --http-method
Include HTTP request method if found. This will create a request key containing the request method + the actual request.
-H --http-protocol
Include HTTP request protocol if found. This will create a request key containing the request protocol + the actual request.
-q --no-query-string
Ignore request's query string. i.e., www.google.com/page.htm?query => www.google.com/page.htm
-r --no-term-resolver
Disable IP resolver on terminal output.
-o --output-format=<json|csv>
Write output to stdout given one of the following formats: csv : Comma-separated values (CSV) json : JSON (JavaScript Object Notation)
--real-os
Display real OS names. e.g, Windows XP, Snow Leopard.
--static-file=<extension>
Add static file extension. e.g.: .mp3 Extensions are case sensitive.
--ignore-crawlers
Ignore crawlers.
--no-progress
Disable progress metrics [total requests/requests per second].
-m --with-mouse
Enable mouse support on main dashboard.
-d --with-output-resolver
Enable IP resolver on HTML|JSON output.
-g --std-geoip
Standard GeoIP database for less memory usage.
--geoip-city-data=<geocityfile>
Specify path to GeoIP City database file. i.e., GeoLiteCity.dat. File needs to be downloaded from maxmind.com.
--keep-db-files
Persist parsed data into disk. This should be set to the first dataset prior to use `load-from-disk`. Setting it to false will delete all database files when exiting the program.
Only if configured with --enable-tcb=btree
--load-from-disk
Load previously stored data from disk. Database files need to exist. See keep-db-files.
Only if configured with --enable-tcb=btree
--db-path=<dir>
Path where the on-disk database files are stored. The default value is the /tmp directory.
Only if configured with --enable-tcb=btree
--xmmap=<num>
Set the size in bytes of the extra mapped memory. The default value is 0.
Only if configured with --enable-tcb=btree
--cache-lcnum=<num>
Specifies the maximum number of leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 1024. Setting a larger value will increase speed performance, however, memory consumption will increase. Lower value will decrease memory consumption.
Only if configured with --enable-tcb=btree
--cache-ncnum=<num>
Specifies the maximum number of non-leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 512.
Only if configured with --enable-tcb=btree
--tune-lmemb=<num>
Specifies the number of members in each leaf page. If it is not more than 0, the default value is specified. The default value is 128.
Only if configured with --enable-tcb=btree
--tune-nmemb=<num>
Specifies the number of members in each non-leaf page. If it is not more than 0, the default value is specified. The default value is 256.
Only if configured with --enable-tcb=btree
--tune-bnum=<num>
Specifies the number of elements of the bucket array. If it is not more than 0, the default value is specified. The default value is 32749. Suggested size of the bucket array is about from 1 to 4 times of the number of all pages to be stored.
Only if configured with --enable-tcb=btree
--compression=<zlib|bz2>
Specifies that each page is compressed with ZLIB|BZ2 encoding.
Only if configured with --enable-tcb=btree
-h --help
The help.
-V --version
Display version information and exit.
-s --storage
Display current storage method. i.e., B+ Tree, Hash.
GoAccess can parse virtually any web log format.
Predefined options include, Common Log Format (CLF), Combined Log Format (XLF/ELF), including virtual host, Amazon CloudFront (Download Distribution) and W3C format (IIS).
GoAccess allows any custom format string as well.
There are two ways to configure the log format. The easiest is to run GoAccess with -c to prompt a configuration window. Otherwise, it can be configured under ~/.goaccessrc.
The date_format variable followed by a space, specifies the log format date containing any combination of regular characters and special format specifiers. They all begin with a percentage (%) sign. See http://linux.die.net/man/3/strftime
Note that there is no need to use time specifiers since they are not used by GoAccess. It's recommended to use only date specifiers, i.e., %Y-%m-%d.
The log_format variable followed by a space or \\t , specifies the log format string.
date field matching the date_format variable.
host (the client IP address, either IPv4 or IPv6)
The request line from the client. This requires specific delimiters around the request (as single quotes, double quotes, or anything else) to be parsable. If not, we have to use a combination of special format specifiers as %m %U %H.
The request method.
The URL path requested (including any query string).
The request protocol.
The status code that the server sends back to the client.
The size of the object returned to the client.
The "Referer" HTTP request header.
The user-agent HTTP request header.
The time taken to serve the request, in microseconds.
The time taken to serve the request, in seconds or milliseconds. Note: %D will take priority over %T if both are used.
Ignore this field.
GoAccess requires the following fields:
%h a valid IPv4/6
%d a valid date
%s server status code
%r the request
Main help.
Redraw main window.
Quit the program, current window or collapse active module
Expand selected module or open window
Set selected module to active
Scroll down within expanded module
Scroll up within expanded module
Set or change scheme color.
Forward iteration of modules. Starts from current active module.
Backward iteration of modules. Starts from current active module.
Scroll forward one screen within an active module.
Scroll backward one screen within an active module.
Sort options for active module
Search across all modules (regex allowed)
Find the position of the next occurrence across all modules.
Move to the first item or top of screen.
Move to the last item or bottom of screen.
The simplest and fastest usage would be:
# goaccess -f access.log
That will generate an interactive text-only output.
To generate full statistics we can run GoAccess as:
# goaccess -f access.log -a
To generate an HTML report:
# goaccess -f access.log -a > report.html
To generate a JSON file:
# goaccess -f access.log -a -d -o json > report.json
To generate a CSV file:
# goaccess -f access.log -o csv > report.csv
The -a flag indicates that we want to process an agent-list for every host parsed.
The -d flag indicates that we want to enable the IP resolver on the HTML | JSON output. (It will take longer time to output since it has to resolve all queries.)
The -c flag will prompt the date and log format configuration window. Only when curses is initialized.
Now if we want to add more flexibility to GoAccess, we can do a series of pipes. For instance:
If we would like to process all access.log.*.gz we can do:
# zcat access.log.*.gz | goaccess
OR
# zcat -f access.log* | goaccess
Another useful pipe would be filtering dates out of the web log
The following will get all HTTP requests starting on 05/Dec/2010 until the end of the file.
# sed -n '/05\/Dec\/2010/,$ p' access.log | goaccess -a
If we want to parse only a certain time-frame from DATE a to DATE b, we can do:
sed -n '/5\/Nov\/2010/,/5\/Dec\/2010/ p' access.log | goaccess -a
Note that this could take longer time to parse depending on the speed of sed.
To exclude a list of virtual hosts you can do the following:
grep -v "`cat exclude_vhost_list_file`" vhost_access.log | goaccess
Also, it is worth pointing out that if we want to run GoAccess at lower priority, we can run it as:
# nice -n 19 goaccess -f access.log -a
and if you don't want to install it on your server, you can still run it from your local machine:
# ssh root@server 'cat /var/log/apache2/access.log' | goaccess -a
For now, each active window has a total of 300 items. Eventually this will be customizable.
Piping a log to GoAccess will disable the real-time functionality. This is due to the portability issue on determining the actual size of STDIN. However, a future release *might* include this feature.
If you think you have found a bug, please send me an email to [email protected] or use the issue tracker in https://github.com/allinurl/goaccess/issues
Gerardo Orellana <[email protected]> For more details about it, or new releases, please visit http://goaccess.prosoftcorp.com