
首页

归档

关于

友链
Nginx 和 Logrotate 的化学反应

Nginx 和 Logrotate 的化学反应

文章目录

  1. 1. 以下推断内容已废弃,事实性描述保留作为参考
z0z0r4
z0z0r4
文章
21
分类
15
标签
14

首页

归档

关于

友链
2026-05-17 2026-05-21

记录 Nginx 日志因为 Logrotate 失效而爆盘

TLDR: 更新后发现 Nginx 写入 access.log.1 而 Logrotate 逻辑只覆盖 /var/log/nginx/*.log 导致轮转失效。

Warning:

本篇文章不完整,仅记录,无验证、诊断

本篇记录于 2026-05-21 更新,之前推断已废弃,事实性描述保留作为参考

将 cron 执行 logrotate 的频率降低到半小时一次后(之前为五分钟),不再出现写入 access.log.1,同时 journalctl -u logrotate 也没有报错了(之前观察到会和写入 access.log.1 同时出现 logrotate 的报错,但没有存留日志,但时间上应该一致)

后续 3 天再未触发,也许是过于频繁的 logrotate 导致,并非 Nginx 更新。

此外重新安装后发现官方的 Logrotate 配置和我之前不知道从哪抄来/旧版本的配置不同,如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
❯ cat /etc/logrotate.d/nginx.dpkg-dist
/var/log/nginx/*.log {
daily
missingok
rotate 52
compress
delaycompress
notifempty
create 640 nginx adm
sharedscripts
postrotate
if [ -f /run/nginx.pid ]; then
kill -USR1 `cat /run/nginx.pid`
fi
endscript
}

其中直接 kill -USR1 `cat /run/nginx.pid` ,而不是 invoke-rc.d nginx rotate,也可能是不再出错的变量之一。

仍未为 *\.log\.[0-9]+$ 的情况做处理…先观望 😐

以下推断内容已废弃,事实性描述保留作为参考

最近由于 Nginx CVE-2026-42945 (NGINX Rift) 的影响,需要更新 Nginx,然后触发了一个奇怪的 corner case,导致日志挤满硬盘,在此记录推测的原因。

前言:周四执行了 apt update && apt install --only-upgrade nginx 来更新 Nginx,更新完成后确认了 nginx -v 显示的版本号确实更新了,之后就没处理了。

表现在于每几十个小时之后就会因为 /var/log/nginx 目录占用过大而挤满硬盘,MongoDB Down 了。但第一次触发我没仔细看,发现 /var/log/nginx/access.log.1 和 /var/log/nginx/error.log.1 占了十多G,直接 echo "" > /var/log/nginx/access.log.1 和 echo "" > /var/log/nginx/error.log.1 就恢复了。

其实我此时就该发现的,为什么是 *.log.1 而不是 *.log

第二次是周日早上 8 点,一看报警 524,再看怎么又满了,先紧急清理了,一顿捣鼓(检查 /etc/logrotate.d/nginx 配置、检查 crontab -l)后注意到 /var/log/nginx/access.log 和 /var/log/nginx/error.log 都是 0 字节的空文件,而 *.log.1 却占了十多G,意识到可能是更新导致的。(但此时还怀疑过是新版本的轮转日志覆盖了我之前的日志,还是不够敏锐=-=)

遂试着 systemctl restart nginx,然后 access.log 和 error.log 终于开始正常记录了。


参考 Logrotate 的配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
z0z0r4@z0z0r41:~$ cat /etc/logrotate.d/nginx
/var/log/nginx/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
prerotate
if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
run-parts /etc/logrotate.d/httpd-prerotate; \
fi \
endscript
postrotate
invoke-rc.d nginx rotate >/dev/null 2>&1
endscript
}

肯定是 postrotate 失败了,但是到底是不是因为更新导致的失败,我试了下没法复现=-=

这是 Nginx 的包更新时的脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
z0z0r4@z0z0r41:~$ cat /var/lib/dpkg/info/nginx.prerm
#!/bin/sh
set -e

case "$1" in
remove|remove-in-favour|deconfigure|deconfigure-in-favour)
if [ -x /etc/init.d/nginx ]; then
invoke-rc.d nginx stop || exit $?
fi
;;

upgrade|failed-upgrade)
;;

*)
echo "prerm called with unknown argument \`$1'" >&2
exit 1
;;
esac



exit 0
z0z0r4@z0z0r41:~$ cat /var/lib/dpkg/info/nginx.postinst
#!/bin/sh
set -e

case "$1" in
abort-upgrade|abort-remove|abort-deconfigure|configure)
;;
triggered)
if invoke-rc.d --quiet nginx status >/dev/null; then
echo "Triggering nginx reload ..."
invoke-rc.d nginx reload || true
fi
exit 0
;;
*)
echo "postinst called with unknown argument \`$1'" >&2
exit 1
;;
esac

if invoke-rc.d --quiet nginx status >/dev/null; then
invoke-rc.d nginx upgrade || invoke-rc.d nginx restart
exit $?
else
if ! invoke-rc.d nginx start; then
echo "Failed to start NGINX in postinst script, please check the logs" >&2
exit 0
fi
fi



exit 0

里面更新后会调用 invoke-rc.d nginx upgrade。

这是 /etc/init.d/nginx:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
z0z0r4@z0z0r41:~$ cat /etc/init.d/nginx
#!/bin/sh

### BEGIN INIT INFO
# Provides: nginx
# Required-Start: $local_fs $remote_fs $network $syslog $named
# Required-Stop: $local_fs $remote_fs $network $syslog $named
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: starts the nginx web server
# Description: starts nginx using start-stop-daemon
### END INIT INFO

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/sbin/nginx
NAME=nginx
DESC=nginx

# Include nginx defaults if available
if [ -r /etc/default/nginx ]; then
. /etc/default/nginx
fi

STOP_SCHEDULE="${STOP_SCHEDULE:-QUIT/5/TERM/5/KILL/5}"

test -x $DAEMON || exit 0

. /lib/init/vars.sh
. /lib/lsb/init-functions

# Try to extract nginx pidfile
PID=$(cat /etc/nginx/nginx.conf | grep -Ev '^\s*#' | awk 'BEGIN { RS="[;{}]" } { if ($1 == "pid") print $2 }' | head -n1)
if [ -z "$PID" ]; then
PID=/run/nginx.pid
fi

if [ -n "$ULIMIT" ]; then
# Set ulimit if it is set in /etc/default/nginx
ulimit $ULIMIT
fi

start_nginx() {
# Start the daemon/service
#
# Returns:
# 0 if daemon has been started
# 1 if daemon was already running
# 2 if daemon could not be started
start-stop-daemon --start --quiet --pidfile $PID --exec $DAEMON --test > /dev/null \
|| return 1
start-stop-daemon --start --quiet --pidfile $PID --exec $DAEMON -- \
$DAEMON_OPTS 2>/dev/null \
|| return 2
}

test_config() {
# Test the nginx configuration
$DAEMON -t $DAEMON_OPTS >/dev/null 2>&1
}

stop_nginx() {
# Stops the daemon/service
#
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
start-stop-daemon --stop --quiet --retry=$STOP_SCHEDULE --pidfile $PID --name $NAME
RETVAL="$?"
sleep 1
return "$RETVAL"
}

reload_nginx() {
# Function that sends a SIGHUP to the daemon/service
start-stop-daemon --stop --signal HUP --quiet --pidfile $PID --name $NAME
return 0
}

rotate_logs() {
# Rotate log files
start-stop-daemon --stop --signal USR1 --quiet --pidfile $PID --name $NAME
return 0
}

upgrade_nginx() {
# Online upgrade nginx executable
# http://nginx.org/en/docs/control.html
#
# Return
# 0 if nginx has been successfully upgraded
# 1 if nginx is not running
# 2 if the pid files were not created on time
# 3 if the old master could not be killed
if start-stop-daemon --stop --signal USR2 --quiet --pidfile $PID --name $NAME; then
# Wait for both old and new master to write their pid file
while [ ! -s "${PID}.oldbin" ] || [ ! -s "${PID}" ]; do
cnt=`expr $cnt + 1`
if [ $cnt -gt 10 ]; then
return 2
fi
sleep 1
done
# Everything is ready, gracefully stop the old master
if start-stop-daemon --stop --signal QUIT --quiet --pidfile "${PID}.oldbin" --name $NAME; then
return 0
else
return 3
fi
else
return 1
fi
}

case "$1" in
start)
log_daemon_msg "Starting $DESC" "$NAME"
start_nginx
case "$?" in
0|1) log_end_msg 0 ;;
2) log_end_msg 1 ;;
esac
;;
stop)
log_daemon_msg "Stopping $DESC" "$NAME"
stop_nginx
case "$?" in
0|1) log_end_msg 0 ;;
2) log_end_msg 1 ;;
esac
;;
restart)
log_daemon_msg "Restarting $DESC" "$NAME"

# Check configuration before stopping nginx
if ! test_config; then
log_end_msg 1 # Configuration error
exit $?
fi

stop_nginx
case "$?" in
0|1)
start_nginx
case "$?" in
0) log_end_msg 0 ;;
1) log_end_msg 1 ;; # Old process is still running
*) log_end_msg 1 ;; # Failed to start
esac
;;
*)
# Failed to stop
log_end_msg 1
;;
esac
;;
reload|force-reload)
log_daemon_msg "Reloading $DESC configuration" "$NAME"

# Check configuration before stopping nginx
#
# This is not entirely correct since the on-disk nginx binary
# may differ from the in-memory one, but that's not common.
# We prefer to check the configuration and return an error
# to the administrator.
if ! test_config; then
log_end_msg 1 # Configuration error
exit $?
fi

reload_nginx
log_end_msg $?
;;
configtest|testconfig)
log_daemon_msg "Testing $DESC configuration"
test_config
log_end_msg $?
;;
status)
status_of_proc -p $PID "$DAEMON" "$NAME" && exit 0 || exit $?
;;
upgrade)
log_daemon_msg "Upgrading binary" "$NAME"
upgrade_nginx
log_end_msg $?
;;
rotate)
log_daemon_msg "Re-opening $DESC log files" "$NAME"
rotate_logs
log_end_msg $?
;;
*)
echo "Usage: $NAME {start|stop|restart|reload|force-reload|status|configtest|rotate|upgrade}" >&2
exit 3
;;
esac

里面 rotate 函数是通过向 Nginx 发送 USR1 信号来让它重新打开日志文件的,但是都是静默发生,一直返回 0 的,如果失败了确实无法得知。

肯定是 rotate 失败了,但我没法复现出来,在此记录,下次遇到再跟踪下=-=。

DevOps
  • DevOps
  • Nginx
CSAPP Malloc Lab
后一篇

CSAPP Malloc Lab

说些什么吧!

giscus
Creative Commons License All website licensed under CC BY 4.0
2025-2026 z0z0r4
基于 Hexo  Theme.Reimu
105k  |  08:32
粤ICP备2025511811号
粤公网安备44130302100361号
总访问量   |  总访客量 

文章目录

  1. 1. 以下推断内容已废弃,事实性描述保留作为参考
z0z0r4
z0z0r4
文章
21
分类
15
标签
14

首页

归档

关于

友链