背景

我负责的广告下发业务. upstream模块配置转发到4台机器,后来我停止了1台机器,但是忘记改upstream配置了。

这个错误到配置持续了2天多,直到今天周一才发现,但是业务却没有任何问题。

这引起了我的困惑。

和同事简单讨论,了解到nginx有重试机制。

猜想应该是这个原因才使得业务没有出现。

另外,我也考虑增加一些监控,使得能够及时发现错误的配置信息,主要思路是监控nginx的错误日志(error.logk)

错误配置时,错误日志如下

2021/10/18 16:48:29 [error] 2828#0: *88397155 connect() failed (111: Connection refused) while connecting to upstream, client: 49.7.38.70, server: open.aplum.com, request: "GET /adds/weibo-notify?微博广告监测链接", host: "xxx.xxx.com"

重试的具体过程(待验证)

If an error occurs during communication with a server, the request will be passed to the next server, and so on until all of the functioning servers will be tried. If a successful response could not be obtained from any of the servers, the client will receive the result of the communication with the last server.

附录

1.Nginx失败重试机制 2.Module ngx_http_upstream_module

原创文章转载请注明出处: Nginx失败重试机制