blk_update_request: I/O 错误,开发 sda,扇区 xxxxxxxxxxx

blk_update_request: I/O error, dev sda, sector xxxxxxxxxxx

我刚刚订购了一台配备 1TB 三星 SSD 的新服务器。已安装 Ubuntu 14.04.5 LTS。

引导进入新安装的系统后,我在 dmesg 和 /var/lib/syslog 中看到了这个。 grep error /var/log/syslog 的输出:

May 12 03:47:34 lf5 kernel: [    0.373789] HEST: Enabling Firmware First mode for corrected errors.
May 12 03:47:34 lf5 kernel: [   10.382147] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [   10.382152]          res 40/00:e0:f8:69:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   10.712517] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [   10.712521]          res 40/00:d0:38:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.119541] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [   11.119545]          res 40/00:40:30:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526336] ata8.00: irq_stat 0x08000008, interface fatal error
May 12 03:47:34 lf5 kernel: [   11.526341]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526345]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526348]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526351]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   21.349950] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 03:51:10 lf5 kernel: [    0.389787] HEST: Enabling Firmware First mode for corrected errors.
May 12 03:51:10 lf5 kernel: [   10.906423] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   10.906429]          res 40/00:80:08:00:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   11.488276] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   11.488281]          res 40/00:c0:28:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   11.960792] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   11.960796]          res 40/00:b8:b0:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   12.366482] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   12.366486]          res 40/00:60:e0:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   20.918620] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 17:07:19 lf5 kernel: [    0.390011] HEST: Enabling Firmware First mode for corrected errors.
May 12 17:07:19 lf5 kernel: [   10.349119] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   10.349124]          res 40/00:88:a8:6d:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   10.738449] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   10.738453]          res 40/00:20:60:6b:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   11.072972] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   11.072976]          res 40/00:60:50:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   11.471777] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   11.471781]          res 40/00:48:c8:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   20.651217] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 17:18:16 lf5 kernel: [    0.389808] HEST: Enabling Firmware First mode for corrected errors.
May 12 17:18:17 lf5 kernel: [   10.762352] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:18:17 lf5 kernel: [   10.762360]          res 40/00:40:08:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   11.338565]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   11.338569]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   11.338572]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   11.338576]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   20.087229]          res 41/84:08:b8:14:7d/00:00:63:00:00/00 Emask 0x410 (ATA bus error) <F>
May 12 17:18:17 lf5 kernel: [   20.298295] ata8.00: error: { ICRC ABRT }
May 12 17:18:17 lf5 kernel: [   21.176551] sd 7:0:0:0: [sda] tag#0 Add. Sense: Scsi parity error
May 12 17:18:17 lf5 kernel: [   21.316632] blk_update_request: I/O error, dev sda, sector 1669074520
May 12 17:18:17 lf5 kernel: [   21.542013] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:18:17 lf5 kernel: [   21.759477]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.052681]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.347138]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.642363]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.938868]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   23.239764]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   23.542336]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   23.840288]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   24.138769]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   24.439063]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   24.740494]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.047057]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.354884]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.662079]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.967498]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   26.273208]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   26.579035]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   26.884890]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   27.190868]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   27.496523]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   27.801825]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   28.106876]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   28.412223]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   28.717662]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.022620]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.326675]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.629826]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.932271]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   30.234666]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   30.537024]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   31.765128] blk_update_request: I/O error, dev sda, sector 1669071496
May 12 17:18:17 lf5 kernel: [   32.143969] blk_update_request: I/O error, dev sda, sector 1669071504
May 12 17:18:17 lf5 kernel: [   32.527171] blk_update_request: I/O error, dev sda, sector 1669071512
May 12 17:18:17 lf5 kernel: [   32.915371] blk_update_request: I/O error, dev sda, sector 1669071544
May 12 17:18:17 lf5 kernel: [   33.308218] blk_update_request: I/O error, dev sda, sector 1669071552
May 12 17:18:17 lf5 kernel: [   33.706503] blk_update_request: I/O error, dev sda, sector 1669071520
May 12 17:18:17 lf5 kernel: [   34.108892] blk_update_request: I/O error, dev sda, sector 1669071528
May 12 17:18:17 lf5 kernel: [   34.516541] blk_update_request: I/O error, dev sda, sector 1669071536
May 12 17:18:17 lf5 kernel: [   34.929267] blk_update_request: I/O error, dev sda, sector 1669071368
May 12 17:18:17 lf5 kernel: [   35.347838] blk_update_request: I/O error, dev sda, sector 1669071376
May 12 17:18:17 lf5 kernel: [   36.004437]          res 41/04:a8:90:d2:89/00:00:5f:00:00/00 Emask 0x401 (device error) <F>
May 12 17:18:17 lf5 kernel: [   36.257143] ata8.00: error: { ABRT }
May 12 17:18:17 lf5 kernel: [   37.681581] ata8.00: irq_stat 0x08000008, interface fatal error
May 12 17:18:17 lf5 kernel: [   37.681586]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681590]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681593]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681596]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681599]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681602]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681605]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681608]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681611]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681615]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681618]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681621]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681624]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681627]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681630]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681633]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681636]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681639]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681642]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681645]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681649]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681652]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681655]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   38.005003] blk_update_request: I/O error, dev sda, sector 1891370112
May 12 17:18:17 lf5 kernel: [   38.005009] blk_update_request: I/O error, dev sda, sector 1891370120
May 12 17:18:17 lf5 kernel: [   38.005013] blk_update_request: I/O error, dev sda, sector 1891370128
May 12 17:18:17 lf5 kernel: [   38.005017] blk_update_request: I/O error, dev sda, sector 1891370136
May 12 17:18:17 lf5 kernel: [   38.005021] blk_update_request: I/O error, dev sda, sector 1891370144
May 12 17:18:17 lf5 kernel: [   38.005025] blk_update_request: I/O error, dev sda, sector 1891370152
May 12 17:18:17 lf5 kernel: [   38.005029] blk_update_request: I/O error, dev sda, sector 1891370160
May 12 17:18:17 lf5 kernel: [   38.005032] blk_update_request: I/O error, dev sda, sector 1891370168
May 12 17:18:17 lf5 kernel: [   38.005036] blk_update_request: I/O error, dev sda, sector 1891370176
May 12 17:18:17 lf5 kernel: [   38.005040] blk_update_request: I/O error, dev sda, sector 1891370184
May 12 17:18:17 lf5 kernel: [   49.093973] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro

我最关心的是这些条目:blk_update_request: I/O error, dev sda, sector xxxxxxxxxxx

I 运行 badblocks -v /dev/sda 没有返回错误。

然后我 运行 smartctl --all /dev/sda,也没有返回任何错误。请参阅下面的输出。这包括一个简短的自测

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-31-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 850 EVO 1TB
Serial Number:    S3PHNF0JC00710K
LU WWN Device Id: 5 002538 d428254a0
Firmware Version: EMT03B6Q
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat May 12 19:08:22 2018 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 512) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       8
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       31
177 Wear_Leveling_Count     0x0013   100   100   000    Pre-fail  Always       -       0
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   099   010    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   069   067   000    Old_age   Always       -       31
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   099   099   000    Old_age   Always       -       20
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       25
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       55078112

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         8         -

SMART Selective self-test log data structure revision number 1
 SPAN    MIN_LBA    MAX_LBA  CURRENT_TEST_STATUS
    1          0          0  Not_testing
    2          0          0  Not_testing
    3          0          0  Not_testing
    4          0          0  Not_testing
    5          0          0  Not_testing
  255  116055040  116120575  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

我的问题很简单:你认为可能哪里出了问题? SSD 应该是 b运行d 新的。出于良心,我很难将日志中的这些错误投入生产。而且这个盒子在其他方面表现正常。

您看到的错误是接口错误,它们不是来自磁盘本身,而是来自与磁盘的连接。它可以是电缆或连接中的任何端口。

由于驱动器上的 CRC 错误没有增加,我只能假设问题出在您使用的机器的接收端。您应该检查电缆并尝试在服务器上使用不同的 SATA 端口。