我的磁盘是否出现故障并导致Linux负载过高?

今天我做了apt-get update&& Linux Ubuntu 12.04.5 LTS服务器上的apt-get升级.一切都很顺利.四个小时后,监控工具提醒我磁盘I / O过载.在8核系统上,I / O等待已达到10-40%,系统平均负载从1上升到20.网站变得非常缓慢.
看起来像磁盘或硬件不好,但我不太确定.我应该去哪里挖?任何帮助赞赏.

uname -a:

Linux p-de-www 3.2.0-77-generic #114-Ubuntu SMP Tue Mar 10 17:26:03 UTC 2015 x86_64     x86_64 x86_64 GNU/Linux

最佳:

top - 16:19:59 up  1:38,3 users,load average: 11.54,7.46,5.76
    Tasks: 217 total,1 running,216 sleeping,0 stopped,0 zombie
    Cpu(s):  1.3%us,0.2%sy,0.0%ni,80.9%id,17.6%wa,0.0%hi,0.0%si,0.0%st
    Mem:  16126212k total,4153684k used,11972528k free,193392k buffers
    Swap:  8387568k total,0k used,8387568k free,2281864k cached

在syslog中有一堆ACPI错误.

在/ var / log / messages中:

root@p-de-www:~# tail -n 100 /var/log/messages
Mar 19 15:51:01 p-de-www kernel: [ 4184.716158] ata1: hard resetting link
Mar 19 15:51:02 p-de-www kernel: [ 4185.763378] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 19 15:51:02 p-de-www kernel: [ 4185.882753] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 15:51:02 p-de-www kernel: [ 4185.882761] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 15:51:02 p-de-www kernel: [ 4185.883514] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 15:51:02 p-de-www kernel: [ 4185.883523] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 15:51:02 p-de-www kernel: [ 4185.883842] ata1.00: configured for UDMA/133
Mar 19 15:51:02 p-de-www kernel: [ 4185.883860] ata1: EH complete
Mar 19 15:52:19 p-de-www kernel: [ 4262.752244] ata1: hard resetting link
Mar 19 15:52:24 p-de-www kernel: [ 4268.109057] ata1: link is slow to respond,please be patient (ready=0)
Mar 19 15:52:26 p-de-www kernel: [ 4269.676180] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 19 15:52:26 p-de-www kernel: [ 4269.769475] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 15:52:26 p-de-www kernel: [ 4269.769483] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 15:52:26 p-de-www kernel: [ 4269.770244] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 15:52:26 p-de-www kernel: [ 4269.770251] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 15:52:26 p-de-www kernel: [ 4269.770483] ata1.00: configured for UDMA/133
Mar 19 15:52:26 p-de-www kernel: [ 4269.770496] ata1.00: retrying FLUSH 0xea Emask 0x4
Mar 19 15:52:26 p-de-www kernel: [ 4269.770587] ata1.00: device reported invalid CHS sector 0
Mar 19 15:52:26 p-de-www kernel: [ 4269.770604] ata1: EH complete
Mar 19 15:54:39 p-de-www kernel: [ 4402.577394] ata1.00: limiting speed to UDMA/100:PIO4
Mar 19 15:54:39 p-de-www kernel: [ 4402.577557] ata1: hard resetting link
Mar 19 15:54:44 p-de-www kernel: [ 4407.934367] ata1: link is slow to respond,please be patient (ready=0)
Mar 19 15:54:49 p-de-www kernel: [ 4412.579786] ata1: hard resetting link
Mar 19 15:54:51 p-de-www kernel: [ 4415.362269] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 19 15:54:51 p-de-www kernel: [ 4415.475792] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 15:54:51 p-de-www kernel: [ 4415.475800] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 15:54:51 p-de-www kernel: [ 4415.476645] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 15:54:51 p-de-www kernel: [ 4415.476653] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 15:54:51 p-de-www kernel: [ 4415.476905] ata1.00: configured for UDMA/100
Mar 19 15:54:51 p-de-www kernel: [ 4415.476934] ata1: EH complete
Mar 19 15:55:13 p-de-www kernel: [ 4436.542443] ata1: hard resetting link
Mar 19 15:55:15 p-de-www kernel: [ 4438.876963] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 19 15:55:15 p-de-www kernel: [ 4438.959075] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 15:55:15 p-de-www kernel: [ 4438.959084] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 15:55:15 p-de-www kernel: [ 4438.959905] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 15:55:15 p-de-www kernel: [ 4438.959914] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 15:55:15 p-de-www kernel: [ 4438.960212] ata1.00: configured for UDMA/100
Mar 19 15:55:15 p-de-www kernel: [ 4438.960235] ata1: EH complete
Mar 19 16:17:32 p-de-www kernel: [ 5774.861347] ata1: hard resetting link
Mar 19 16:17:33 p-de-www kernel: [ 5776.132497] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 19 16:17:33 p-de-www kernel: [ 5776.248345] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:17:33 p-de-www kernel: [ 5776.248353] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:17:33 p-de-www kernel: [ 5776.249163] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:17:33 p-de-www kernel: [ 5776.249172] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:17:33 p-de-www kernel: [ 5776.249441] ata1.00: configured for UDMA/100
Mar 19 16:17:33 p-de-www kernel: [ 5776.249445] ata1.00: retrying FLUSH 0xea Emask 0x4
Mar 19 16:17:33 p-de-www kernel: [ 5776.249538] ata1.00: device reported invalid CHS sector 0
Mar 19 16:17:33 p-de-www kernel: [ 5776.249547] ata1: EH complete
Mar 19 16:18:34 p-de-www kernel: [ 5836.778503] ata1: hard resetting link
Mar 19 16:18:37 p-de-www kernel: [ 5840.400297] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 19 16:18:37 p-de-www kernel: [ 5840.500401] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:18:37 p-de-www kernel: [ 5840.500409] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:18:37 p-de-www kernel: [ 5840.501223] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:18:37 p-de-www kernel: [ 5840.501231] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:18:37 p-de-www kernel: [ 5840.501468] ata1.00: configured for UDMA/100
Mar 19 16:18:37 p-de-www kernel: [ 5840.501481] ata1.00: retrying FLUSH 0xea Emask 0x4
Mar 19 16:18:37 p-de-www kernel: [ 5840.501589] ata1: EH complete
Mar 19 16:19:38 p-de-www kernel: [ 5900.742501] ata1: hard resetting link
Mar 19 16:19:40 p-de-www kernel: [ 5903.077048] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 19 16:19:40 p-de-www kernel: [ 5903.077537] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:19:40 p-de-www kernel: [ 5903.077546] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:19:40 p-de-www kernel: [ 5903.078334] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:19:40 p-de-www kernel: [ 5903.078342] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:19:40 p-de-www kernel: [ 5903.078579] ata1.00: configured for UDMA/100
Mar 19 16:19:40 p-de-www kernel: [ 5903.078582] ata1.00: retrying FLUSH 0xea Emask 0x4
Mar 19 16:19:40 p-de-www kernel: [ 5903.078679] ata1: EH complete
Mar 19 16:21:24 p-de-www kernel: [ 6006.666736] ata1.00: limiting speed to UDMA/33:PIO4
Mar 19 16:21:24 p-de-www kernel: [ 6006.666867] ata1: hard resetting link
Mar 19 16:21:29 p-de-www kernel: [ 6012.023734] ata1: link is slow to respond,please be patient (ready=0)
Mar 19 16:21:34 p-de-www kernel: [ 6016.669145] ata1: hard resetting link
Mar 19 16:21:39 p-de-www kernel: [ 6022.026105] ata1: link is slow to respond,please be patient (ready=0)
Mar 19 16:21:44 p-de-www kernel: [ 6026.671575] ata1: hard resetting link
Mar 19 16:21:46 p-de-www kernel: [ 6028.726319] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 19 16:21:46 p-de-www kernel: [ 6028.824829] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:21:46 p-de-www kernel: [ 6028.824836] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:21:46 p-de-www kernel: [ 6028.825575] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:21:46 p-de-www kernel: [ 6028.825579] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:21:46 p-de-www kernel: [ 6028.825811] ata1.00: configured for UDMA/33
Mar 19 16:21:46 p-de-www kernel: [ 6028.825815] ata1.00: retrying FLUSH 0xea Emask 0x4
Mar 19 16:21:46 p-de-www kernel: [ 6028.825918] ata1.00: device reported invalid CHS sector 0
Mar 19 16:21:46 p-de-www kernel: [ 6028.825925] ata1: EH complete
Mar 19 16:22:07 p-de-www kernel: [ 6049.650737] ata1: hard resetting link
Mar 19 16:22:12 p-de-www kernel: [ 6055.007538] ata1: link is slow to respond,please be patient (ready=0)
Mar 19 16:22:17 p-de-www kernel: [ 6059.652963] ata1: hard resetting link
Mar 19 16:22:22 p-de-www kernel: [ 6065.009914] ata1: link is slow to respond,please be patient (ready=0)
Mar 19 16:22:23 p-de-www kernel: [ 6065.849433] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 19 16:22:23 p-de-www kernel: [ 6065.978240] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:22:23 p-de-www kernel: [ 6065.978248] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:22:23 p-de-www kernel: [ 6065.979084] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:22:23 p-de-www kernel: [ 6065.979092] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:22:23 p-de-www kernel: [ 6065.979403] ata1.00: configured for UDMA/33
Mar 19 16:22:23 p-de-www kernel: [ 6065.979424] ata1: EH complete
Mar 19 16:22:51 p-de-www kernel: [ 6093.626046] ata1: hard resetting link
Mar 19 16:22:51 p-de-www kernel: [ 6094.113597] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 19 16:22:51 p-de-www kernel: [ 6094.226485] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:22:51 p-de-www kernel: [ 6094.226492] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:22:51 p-de-www kernel: [ 6094.227269] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359)
Mar 19 16:22:51 p-de-www kernel: [ 6094.227276] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536)
Mar 19 16:22:51 p-de-www kernel: [ 6094.227513] ata1.00: configured for UDMA/33
Mar 19 16:22:51 p-de-www kernel: [ 6094.227541] ata1: EH complete

软件RAID1中有2个磁盘:

root@p-de-www:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb2[1] sda2[0]
      524276 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
      8387572 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sdb4[1] sda4[0]
      1847608639 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sdb3[1] sda3[0]
      1073740664 blocks super 1.2 [2/2] [UU]

iotop看起来很好,这些尖刺很少:

377 be/3 root        0.00 B/s   82.29 K/s  0.00 %  7.28 % [jbd2/md2-8]

smartctl -a / dev / sda的输出:

=== START OF INFORMATION SECTION ===
Device Model:     ST3000DM001-1CH166
Serial Number:    W1F1YLLX
LU WWN Device Id: 5 000c50 05dd292d0
Firmware Version: CC24
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical,4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Mar 19 16:55:48 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  584) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   102   099   006    Pre-fail  Always       -       195880648
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       10
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       502482545
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       18486
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       10
183 Runtime_Bad_Block       0x0032   097   097   000    Old_age   Always       -       3
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   097   095   000    Old_age   Always       -       197571510318
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   070   061   045    Old_age   Always       -       30 (Min/Max 27/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       9
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       877
194 Temperature_Celsius     0x0022   030   040   000    Old_age   Always       -       30 (0 20 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       80
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       80
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       79985175971891
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       32009289003
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       178724571355

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         5         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans,do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up,resume after 0 minute delay.

smartctl -a / dev / sdb的输出:

=== START OF INFORMATION SECTION ===
Device Model:     ST3000DM001-1CH166
Serial Number:    W1F1VM8Q
LU WWN Device Id: 5 000c50 05dbdcafe
Firmware Version: CC24
User Capacity:    3,4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Mar 19 16:57:57 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  600) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       178849088
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       10
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       498642529
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       18467
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       10
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   082   082   000    Old_age   Always       -       18
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   070   062   045    Old_age   Always       -       30 (Min/Max 26/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       9
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       876
194 Temperature_Celsius     0x0022   030   040   000    Old_age   Always       -       30 (0 20 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       44448616564768
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       55043480738
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       154979931141

SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on,and printed as
DDd+hh:mm:SS.sss where DD=days,hh=hours,mm=minutes,SS=sec,and sss=millisec. It "wraps" after 49.710 days.

Error 18 occurred at disk power-on lifetime: 18168 hours (757 days + 0 hours)
  When the command that caused the error occurred,the device was active or idle.

  After command completion occurred,registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 a0 7e 17 05  Error: UNC at LBA = 0x05177ea0 = 85425824

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 a0 7e 17 45 00  16d+20:43:03.906  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  16d+20:43:03.905  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  16d+20:43:03.905  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  16d+20:43:03.905  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  16d+20:43:03.905  SET FEATURES [Set transfer mode]

Error 17 occurred at disk power-on lifetime: 18168 hours (757 days + 0 hours)
  When the command that caused the error occurred,registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 a0 7e 17 05  Error: UNC at LBA = 0x05177ea0 = 85425824

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 a0 7e 17 45 00  16d+20:43:01.000  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  16d+20:43:01.000  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  16d+20:43:01.000  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  16d+20:43:01.000  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  16d+20:43:01.000  SET FEATURES [Set transfer mode]

Error 16 occurred at disk power-on lifetime: 18168 hours (757 days + 0 hours)
  When the command that caused the error occurred,registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 a0 7e 17 05  Error: UNC at LBA = 0x05177ea0 = 85425824

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 a0 7e 17 45 00  16d+20:42:58.104  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  16d+20:42:58.104  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  16d+20:42:58.104  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  16d+20:42:58.104  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  16d+20:42:58.104  SET FEATURES [Set transfer mode]

Error 15 occurred at disk power-on lifetime: 18168 hours (757 days + 0 hours)
  When the command that caused the error occurred,registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 a0 7e 17 05  Error: UNC at LBA = 0x05177ea0 = 85425824

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 a0 7e 17 45 00  16d+20:42:55.196  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  16d+20:42:55.196  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  16d+20:42:55.196  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  16d+20:42:55.196  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  16d+20:42:55.196  SET FEATURES [Set transfer mode]

Error 14 occurred at disk power-on lifetime: 18168 hours (757 days + 0 hours)
  When the command that caused the error occurred,registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 a0 7e 17 05  Error: UNC at LBA = 0x05177ea0 = 85425824

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 a0 7e 17 45 00  16d+20:42:52.257  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  16d+20:42:52.257  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  16d+20:42:52.257  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  16d+20:42:52.256  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  16d+20:42:52.256  SET FEATURES [Set transfer mode]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         5         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans,resume after 0 minute delay.

解决方法

你有一段时间没有运行SMART自测.尝试运行smartctl -t long< device>.
它应该需要几个小时,你可以看到smartctl -a的进展:
Self-test execution status:      (   0) The previous self-test routine completed
                                    without error or no self-test has ever
                                    been run.

如果它没有像上一次运行那样没有完成,那么当驱动器是新的时:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         5         -

只是摆脱驱动器.

我的猜测是@kasperd是对的.智能日志中出现sata错误/错误的驱动器已损坏.

顺便说一句.高负载和损坏驱动器之间的关系来自负载测量. load是许多等待执行的进程.等待驱动器返回数据的进程确实正在等待执行.

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


linux常用进程通信方式包括管道(pipe)、有名管道(FIFO)、信号(signal)、消息队列、共享内存、信号量、套接字(socket)。管道用于具有亲缘关系的进程间通信,有名管道的每个管道具有名字,使没有亲缘关系的进程间也可以通信。信号是比较复杂的通信方式,用于通知接受进程有某种事件发生,除
Linux性能观测工具按类别可分为系统级别和进程级别,系统级别对整个系统的性能做统计,而进程级别则具体到进程,为每个进程维护统计信息。&#xD;&#xA;&#xD;&#xA;按实现原理分,可分为基于计数器和跟踪以及剖析。含义如下:&#xD;&#xA;&#xD;&#xA;计数器:内核维护的统计数据,通常为无符号整型,用于对发生的事件计数,比如,网络包接收计数器,磁
本文详细介绍了curl命令基础和高级用法,包括跳过https的证书验证,详细追踪整个交互过程,可用于调用网络后端接口,诊断http和https网络服务故障。
本文包含作者工作中常用到的一些命令,用于诊断网络、磁盘占满、fd泄漏等问题。命令包括ping、fping、tcpdump、lsof、netstat、/proc/$pid/fd、du、grep、traceroute、dig。
linux的平均负载表示运行态和就绪态及不可中断状态(正在io)的进程数目,用uptime查看到负载很高,既有可能是CPU利用率高,也可能是大量在等待io的进程导致,用mpstat查看每个CPU的使用情况,查看CPU的使用率或者CPU花在等待io的时间,接着用pidstat定位具体的进程
CPU上下文频繁切换会导致系统性能下降,切换分为进程切换、线程切换及中断切换,进程切换的开销较大,除了需要保存寄存器和程序计数器中的值还需保存全局变量、栈等到内存中,以便下次运行恢复,而同一进程中的线程切换开销会小很多,只需更新寄存器和线程独有的栈,共享资源如打开的文件、全局变量等无需切换,当硬件中
1.top命令 作用:该命令可以按CPU使用.内存使用和执行时间对任务进行排序,常用来监控系统中占用CPU或内存较高的程序及CPU和内存的负载。 默认视图: 当想看系统负载时,可观察汇总的%CPU中的us用户进程和sy系统进程是否占用CPU很高,相加接近100%就说明占用很高了,有些程序可能得不到及
文章浏览阅读1.8k次,点赞63次,收藏54次。Linux下的目录权限!!!粘滞位!!!超详解!!!
文章浏览阅读1.6k次,点赞44次,收藏38次。关于Qt的安装、Windows、Linux、MacBook_mack book 安装qt
本文介绍了使用shell脚本编写一个 Hello
文章浏览阅读1.5k次,点赞37次,收藏43次。【Linux】初识Linux——了解操作系统的发展历史以及初次体验Linux编程环境
文章浏览阅读3k次,点赞34次,收藏156次。Linux超详细笔记,个人学习时很认真的记录的,觉得好的麻烦点个赞。
文章浏览阅读6.8k次,点赞109次,收藏114次。【Linux】 OpenSSH_9.3p1 升级到 OpenSSH_9.5p1(亲测无问题,建议收藏)_openssh_9.5p1
文章浏览阅读3.5k次,点赞93次,收藏78次。初识Linux中的线程,理解线程的各种概念,理解进程地址空间中的页表转换,介绍pthread线程库并理解线程库!
文章浏览阅读863次。出现此问题为Linux文件权限问题,解决方案为回到引擎目录执行命令。输入用户密码后运行./UnrealEditor。_increasing per-process limit of core file size to infinity.
文章浏览阅读2.9k次。使用文本编辑器:打开CSV文件,并使用文本编辑器(如Notepad++、Sublime Text、Visual Studio Code等)来查看文件的字符编码格式。通常在编辑器的底部状态栏或设置中可以找到当前编码的显示。请注意,上述方法并非绝对准确,特别是当文件没有明确的编码标识时。因此,如果你发现CSV文件在不同的工具或方法中显示不同的编码格式,可能需要进行进一步的分析和判断,或者尝试使用不同的编码转换方法。该命令将输出文件的MIME类型和编码信息。使用命令行工具:在命令行中,你可以使用。_shell读取csv文件逐行处理
本文介绍了如何在Linux系统中升级gcc版本,以便更好地支持C++11及以上版本的新特性。通过升级gcc,可以提升编译器的功能和性能,获得更好的开发体验。详细的步骤和方法请参考原文链接。
文章浏览阅读4.4k次,点赞6次,收藏19次。Mosquitto是一个开源的MQTT消息代理服务器。MQTT是一个轻量级的、基于发布/订阅模式的消息传输协议。 mosquitto的安装使用比较简单,可以方便的来进行一些测试。_linux mosquitto
文章浏览阅读7.2k次,点赞2次,收藏12次。Linux中,用于根目录下有一个.ssh目录,保存了ssh相关的key和一些记录文件。_~/.ssh/
文章浏览阅读4.5k次,点赞5次,收藏18次。首先需要安装 snmp ,使用下面的命令进行安装安装完毕之后,使用下面的命令查看是否安装成功当命令行显示如图即为安装成功。_snmp工具