CentOS 7使用goaccess分析nginx日志

Posted by 云谷计算 on January 16, 2018

Web服务器如Nginx,Apache的access_log一般都包括了很多关键信息,可以用于性能分析或者运营分析,CentOS 7的EPEL源集成了一个非常好用access_log分析工具goaccess,本文主要记录了如何使用goaccess进行nginx的日志分析:


安装

#安装epel源
yum install epel-release && yum -y update

#安装goaccess
yum install -y goaccess

配置

Nginx的log文件默认存放在/var/log/nginx/access_log, 基本的用法如下:

goaccess /var/log/nginx/access-log

第一次使用goaccess的时候,会弹出下面的对话框,要求配置access_log的时间和日志格式, nginx默认的日志格式就是NCSA, 所以一般就选第一个就可以了,然后goaccess在分析完成后会在终端上显示分析结果。

Goaccess还可以生成html格式的报表,有点可惜的是,目前好像仅支持English,命令也非常简单:

goaccess /var/log/nginx/access_log > goaccess.html

最后生成的报表包括pv, hits, agents等各种统计:

高级用法

Goaccess也支持日志定制,比如现在绝大部分nginx可能都会包括vhost,即一个物理nginx服务器可以支持多个域名的虚拟主机,但是nginx默认的NCSA日志格式在/etc/nginx/nginx.conf中配置如下,不包括vhost字段, 所以我们就无法在报表中区分不同虚拟主机的请求:

log_format main \
'$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" ';

为了支持vhost,我将nginx的log_format定制成下面这个样子, 主要是添加了$server_name字段用来显示vhost, 还有末尾的$xxx_time, 用来显示每个请求的耗时:

log_format main \
'[$time_local] $remote_addr $server_name "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'$upstream_addr $request_time $upstream_response_time';

为了让goaccess能适配这个格式,需要将/etc/goaccess.conf中修改成如下格式:

#NCSA Combined Log Format
#
log-format %^[%d:%t %^] %h %v "%r" %s %b "%R" "%u"

其中,具体的指示符号可以参考goaccess的官方文档, 比如%^,代表忽略对应的字段:

SPECIFIERS

  • %x A date and time field matching the time-format and date-format variables. This is used when a timestamp is given instead of the date and time being in two separate variables.

  • %t time field matching the time-format variable.

  • %d date field matching the date-format variable.

  • %v The server name according to the canonical name setting (Server Blocks or Virtual Host).

  • %e This is the userid of the person requesting the document as determined by HTTP authentication.

  • %h host (the client IP address, either IPv4 or IPv6)

  • %r The request line from the client. This requires specific delimiters around the request (single quotes, double quotes, etc) to be parsable. Otherwise, use a combination of special format specifiers such as %m, %U, %q and %H to parse individual fields.

  • Note: Use either %r to get the full request OR %m, %U, %q and %H to form your request, do not use both.

  • %m The request method.

  • %U The URL path requested.

  • Note: If the query string is in %U, there is no need to use %q. However, if the URL path, does not include any query string, you may use %q and the query string will be appended to the request.

  • %q The query string.

  • %H The request protocol.

  • %s The status code that the server sends back to the client.

  • %b The size of the object returned to the client.

  • %R The “Referer” HTTP request header.

  • %u The user-agent HTTP request header.

  • %D The time taken to serve the request, in microseconds.

  • %T The time taken to serve the request, in seconds with milliseconds resolution.

  • %L The time taken to serve the request, in milliseconds as a decimal number.

  • %^ Ignore this field.

  • %~ Move forward through the log string until a non-space (!isspace) char is found.

  • ~h The host (the client IP address, either IPv4 or IPv6) in a X-Forwarded-For (XFF) field.

Goaccess还可以从管道支持输入,这样可以做一些更灵活的处理,比如,access_log日志中的request字段并不包括server_name, 为了让request能够包含完整的url, 我们可以用下面的命令,因为awk的第4列包含了server_name, 这个命令将server_name放到了request的前面组成了完整的url,再用作goaccess的输入:

awk '$6=$4$6' access.log | goaccess -a -