2012-03-23

Redis被bgsave和bgrewriteaof阻塞的解决方法

Views: 64028 | Add Comments

Redis 是一个性能非常高效的内存 Key-Value 存储服务, 同时它还具有两个非常重要的特性: 1. 持久化; 2. Value 数据结构. 这两个特性让它在不少场景轻松击败了 Memcached 和 Casandra 等.

Redis 的持久化在两种方式: Snapshotting(快照) 和 Append-only file(aof). 在一个采用了 aof 模式的 Redis 服务器上, 当执行 bgrewriteaof 对 aof 进行归并优化时, 出现了 Redis 被阻塞的问题, 此时, Redis 无法提供任何读取和写入操作.

按字面理解, bgrewriteaof 是在后台进行操作, 不应该影响 Redis 的正常服务. 原理也确实是这样的, Redis 首先 fork 一个子进程, 并在该子进程里进行归并和写持久化存储设备(如硬盘)的. 按照正常逻辑, 在一台多核的机器上, 即使子进程占满 CPU 和硬盘, 也不应该导致 Redis 服务阻塞啊!

其实, 问题就出在硬盘上.

Redis 服务设置了 appendfsync everysec, 主进程每秒钟便会调用 fsync(), 要求内核将数据"确实"写到存储硬件里. 但由于子进程同时也在写硬盘, 从而导致主进程 fsync()/write() 操作被阻塞, 最终导致 Redis 主进程阻塞了.

解决方法便是设置 no-appendfsync-on-rewrite yes, 在子进程处理和写硬盘时, 主进程不调用 fsync() 操作. 注意, 即使进程不调用 fsync(), 系统内核也会根据自己的算法在适当的时机将数据写到硬盘(Linux 默认最长不超过 30 秒), 同样会千万 IO 阻塞.

不过, 虽然某个 Redis A 并没有在 rewriteaof, 这时如果进行 fsync(), 系统里的其它进程, 如其它的 Redis 实例也可能导致 A 的 fsync 阻塞.

相关讨论:
https://groups.google.com/group/redis-db/browse_thread/thread/a9c5a6019108319e
http://antirez.com/post/fsync-different-thread-useless.html
http://redis.io/topics/latency

* When appendfsync is set to the value of no Redis performs no fsync. In this configuration the only source of latency can be write(2). When this happens usually there is no solution since simply the disk can't cope with the speed at which Redis is receiving data, however this is uncommon if the disk is not seriously slowed down by other processes doing I/O.

* When appendfsync is set to the value of everysec Redis performs an fsync every second. It uses a different thread, and if the fsync is still in progress Redis uses a buffer to delay the write(2) call up to two seconds (since write would block on Linux if an fsync is in progress against the same file). However if the fsync is taking too long Redis will eventually perform the write(2) call even if the fsync is still in progress, and this can be a source of latency.

* When appendfsync is set to the value of always an fsync is performed at every write operation before replying back to the client with an OK code (actually Redis will try to cluster many commands executed at the same time into a single fsync). In this mode performances are very low in general and it is strongly recommended to use a fast disk and a file system implementation that can perform the fsync in short time.

Related posts:

  1. Linux 核心编程 – fsync, write
  2. 企业级SSD硬盘fsync速度
  3. 数据库的持久化等级
  4. LevelDB 会丢数据吗?
  5. 用mplayer,toolame提取rmvb等视频文件中的音频为mp3
Posted by ideawu at 2012-03-23 13:50:34 Tags: , , , , , , ,

Leave a Comment