5月 262012
 

 我が家のネットワーク環境には、3ware 9650SE-4LPML で RAID 運用しているサーバが 2 台ある。そのうち 1 台、この blog 用の MySQL も稼動しているマシンの 3DM2 から以下のようなメールが来た。

May 24, 2012 02:02.03PM - Controller 0
ERROR - Drive timeout detected: port=1

 タイムアウトしただけでデグレードはしていないので HDD の故障というわけではない。そこで、smartmontools で S.M.A.R.T. 情報を取得してみた。

 通常、ハードウェア RAID カードの場合、RAID カード上のプロセッサが各 HDD にデータを読み書きする仕様になっている。そのため、ホスト側の CPU からはドライバがなくても RAID アレイが 1 台のディスクドライブに見えるしブートも可能となる反面、実ドライブにアクセスすることはできない。だが、3ware や LSI の RAID カードの場合、smartctl にオプション -d 3ware,N を追加することで実ドライブの S.M.A.R.T. 情報を取得することができる。N はポート番号で、デバイスファイルは /dev/twaX を指定する。X はコントローラ番号となる。

 今回は 1 番ポートでタイムアウトが発生しているので、1 番ポートのドライブを確認する。

# smartctl -a /dev/twa0 -d 3ware,1
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.0-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar P7K500
Device Model:     Hitachi HDP725050GLA360
Serial Number:    GEA5**********
LU WWN Device Id: 5 000cca 32cd9c149
Firmware Version: GM4OA52A
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri May 25 01:18:33 2012 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		( 7890) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 131) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       139
  3 Spin_Up_Time            0x0007   120   120   024    Pre-fail  Always       -       330 (Average 307)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       117
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail  Offline      -       144
  9 Power_On_Hours          0x0012   095   095   000    Old_age   Always       -       35613
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       117
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       145
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       145
194 Temperature_Celsius     0x0002   176   176   000    Old_age   Always       -       34 (Min/Max 15/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 SMART overall-health self-assessment test resultPASSED なので、ドライブに致命的な問題が発生しているわけではなさそう。稼働時間が 35,613 時間 = 4 年 23 日 21 時間と相当長く稼動しているドライブだが、問題なのは Current_Pending_Sector が 3 になっていることで、これは不良セクタのうち再配置がまだのセクタの数らしい。セクタに障害が発生しているのか確認するため、自己テストを実行する。

# smartctl --test=long /dev/twa0 -d 3ware,1
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.0-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 131 minutes for test to complete.
Test will complete after Fri May 25 03:40:28 2012

 容量と I/O 性能にもよるが概して時間がかかるので、寝るなり仕事をするなりして完了予定時間まで待つ。完了したらドライブ内にログが残るのでそれを確認する。

 smartctl -l selftest /dev/twa0 -d 3ware,1
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.0-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       10%     35615         948956112

 やはり問題があるらしい。Bad block HOWTO for smartmontools によると、エラーのあるセクタに無理やりファイルを作成してそのセクタを読まないようにすればいいらしい。これは大昔の MS-DOS にあった CHKDSK にもあった機能ではある。ただ問題は、このドライブが RAID カードにぶら下がっていて RAID5 を構成しているということ。

 このような場合、RAID カード側で対処できるらしい。3ware 9650SE の場合、RAID アレイのベリファイを行えば自動的に修復してくれる。ベリファイは 3DM2 から Management → Maintenance で「Verify Unit」を押下するか、tw_cli より

# tw_cli
//poseidon> maint verify c0 u0 start

のどちらかで即時に実行できる。ただ、今回は 9650SE 側で土曜 0 時より定時ベリファイをするようになっていた(デフォルト設定)ので、それで対応できた。

 ベリファイでセクタが再配置されれば、それがログに表示される。

不良セクタが再配置されたログ

不良セクタが再配置されたログ

 tw_cli でも確認できる。

//poseidon> info c0 diag

### CLI Version:      x86_64 (64 bit)
### Time Stamp:       22:31.19 26-May-2012
### Host Name:        poseidon
### OS Version:       Linux 3.2.0-gentoo
### Driver Version:   2.26.02.014
### Controller ID:    0
### Model:            9650SE-4LPML
### Firmware:         FE9X 4.10.00.007
### BIOS:             BE9X 4.08.00.002
### Serial #:         L326***********
### Available Memory: 224MB

==========================================================================
Diagnostic Information on Controller //poseidon/c0 ...
--------------------------------------------------------------------------
〜

 タイムアウトが発生した箇所は

E=0204 T=14:55:57     : Port timeout (ext)
task file written out : cd dh ch cl sn sc ft
                      : 61 68 43 AD 0A 08 08
Send AEN (code, time): 0009h, 05/24/2012 14:55:57
Drive timeout detected
(EC:0x09, SK=0x04, ASC=0x00, ASCQ=0x00, SEV=01, Type=0x71)
port=1
  task file read back : st dh ch cl sn sc er
                      : 50 A0 C2 4F 00 00 00
E=0204 T=14:55:57 P=1 : Soft reset drive
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0207 T=14:55:57 P=1 : ResetDriveWait
E=0204 T=14:55:57 P=1 : Inserting Set UDMA command
E=0204 T=14:55:57 P=1 : Check drive swap, same drive
E=0204 T=14:55:57 P=1 : Check power cycles, initial=117, current=117, port=1
E=0204 T=14:56:02 P=1h: exitCode = 0
Retrying chain

再配置のログは

[A:78][A:79][A:7a]
[A:7b][A:7c]
E=0202 T=07:33:32     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:33:32 P=1 : Soft reset drive
E=0207 T=07:33:32 P=1 : ResetDriveWait
E=0207 T=07:33:32 P=1 : ResetDriveWait
E=0202 T=07:33:32 P=1 : Inserting Set UDMA command
E=0202 T=07:33:32 P=1h: Repair LBA 0x388febc6...OK
E=0202 T=07:33:32 P=1h: Repair LBA 0x388febc7...OK
E=0202 T=07:33:32 P=1h: Repair LBA 0x388febc8...OK
Send AEN (code, time): 0023h, 05/26/2012 07:33:32
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBC7
E=0202 T=07:33:32 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba0fb0 (ErrorCode): 202
 Start Stripe #: 00711fd7

E=0202 T=07:33:57     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:33:57 P=1 : Soft reset drive
E=0207 T=07:33:57 P=1 : ResetDriveWait
E=0207 T=07:33:57 P=1 : ResetDriveWait
E=0202 T=07:33:57 P=1 : Inserting Set UDMA command
E=0202 T=07:33:57 P=1h: Repair LBA 0x388febc8...OK
E=0202 T=07:33:57 P=1h: Repair LBA 0x388febc9...OK
E=0202 T=07:33:57 P=1h: Repair LBA 0x388febca...OK
Send AEN (code, time): 0023h, 05/26/2012 07:33:57
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBC9
E=0202 T=07:33:57 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba1941 (ErrorCode): 202
 Start Stripe #: 00711fd7

E=0202 T=07:34:17     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:34:17 P=1 : Soft reset drive
E=0207 T=07:34:17 P=1 : ResetDriveWait
E=0207 T=07:34:17 P=1 : ResetDriveWait
E=0202 T=07:34:17 P=1 : Inserting Set UDMA command
E=0202 T=07:34:17 P=1h: Repair LBA 0x388febca...OK
E=0202 T=07:34:17 P=1h: Repair LBA 0x388febcb...OK
E=0202 T=07:34:17 P=1h: Repair LBA 0x388febcc...OK
Send AEN (code, time): 0023h, 05/26/2012 07:34:17
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBCB
E=0202 T=07:34:17 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba228c (ErrorCode): 202
 Start Stripe #: 00711fd7

E=0202 T=07:34:42     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:34:42 P=1 : Soft reset drive
E=0207 T=07:34:42 P=1 : ResetDriveWait
E=0207 T=07:34:42 P=1 : ResetDriveWait
E=0202 T=07:34:42 P=1 : Inserting Set UDMA command
E=0202 T=07:34:42 P=1h: Repair LBA 0x388febcd...OK
E=0202 T=07:34:42 P=1h: Repair LBA 0x388febce...OK
E=0202 T=07:34:42 P=1h: Repair LBA 0x388febcf...OK
Send AEN (code, time): 0023h, 05/26/2012 07:34:42
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBCE
E=0202 T=07:34:42 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba2c2e (ErrorCode): 202
 Start Stripe #: 00711fd7

E=0202 T=07:35:07     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:35:07 P=1 : Soft reset drive
E=0207 T=07:35:07 P=1 : ResetDriveWait
E=0207 T=07:35:07 P=1 : ResetDriveWait
E=0202 T=07:35:07 P=1 : Inserting Set UDMA command
E=0202 T=07:35:07 P=1h: Repair LBA 0x388febcf...OK
E=0202 T=07:35:07 P=1h: Repair LBA 0x388febd0...OK
E=0202 T=07:35:07 P=1h: Repair LBA 0x388febd1...OK
Send AEN (code, time): 0023h, 05/26/2012 07:35:07
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBD0
E=0202 T=07:35:07 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba358d (ErrorCode): 202
 Start Stripe #: 00711fd7

E=0202 T=07:35:32     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:35:32 P=1 : Soft reset drive
E=0207 T=07:35:32 P=1 : ResetDriveWait
E=0207 T=07:35:32 P=1 : ResetDriveWait
E=0202 T=07:35:32 P=1 : Inserting Set UDMA command
E=0202 T=07:35:32 P=1h: Repair LBA 0x388febd1...OK
E=0202 T=07:35:32 P=1h: Repair LBA 0x388febd2...OK
E=0202 T=07:35:32 P=1h: Repair LBA 0x388febd3...OK
Send AEN (code, time): 0023h, 05/26/2012 07:35:32
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBD2
E=0202 T=07:35:32 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba3f05 (ErrorCode): 202
 Start Stripe #: 00711fd7

E=0202 T=07:35:57     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:35:57 P=1 : Soft reset drive
E=0207 T=07:35:57 P=1 : ResetDriveWait
E=0207 T=07:35:57 P=1 : ResetDriveWait
E=0202 T=07:35:57 P=1 : Inserting Set UDMA command
E=0202 T=07:35:57 P=1h: Repair LBA 0x388febd3...OK
E=0202 T=07:35:57 P=1h: Repair LBA 0x388febd4...OK
E=0202 T=07:35:57 P=1h: Repair LBA 0x388febd5...OK
Send AEN (code, time): 0023h, 05/26/2012 07:35:57
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBD4
E=0202 T=07:35:57 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba488d (ErrorCode): 202
 Start Stripe #: 00711fd7

E=0202 T=07:36:27     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:36:27 P=1 : Soft reset drive
E=0207 T=07:36:27 P=1 : ResetDriveWait
E=0207 T=07:36:27 P=1 : ResetDriveWait
E=0202 T=07:36:27 P=1 : Inserting Set UDMA command
E=0202 T=07:36:27 P=1h: Repair LBA 0x388febd8...OK
E=0202 T=07:36:27 P=1h: Repair LBA 0x388febd9...OK
E=0202 T=07:36:27 P=1h: Repair LBA 0x388febda...OK
Send AEN (code, time): 0023h, 05/26/2012 07:36:27
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBD9
E=0202 T=07:36:27 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba53f3 (ErrorCode): 202
 Start Stripe #: 00711fd7

E=0202 T=07:36:52     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:36:52 P=1 : Soft reset drive
E=0207 T=07:36:52 P=1 : ResetDriveWait
E=0207 T=07:36:52 P=1 : ResetDriveWait
E=0202 T=07:36:52 P=1 : Inserting Set UDMA command
E=0202 T=07:36:52 P=1h: Repair LBA 0x388febd5...OK
E=0202 T=07:36:52 P=1h: Repair LBA 0x388febd6...OK
E=0202 T=07:36:52 P=1h: Repair LBA 0x388febd7...OK
Send AEN (code, time): 0023h, 05/26/2012 07:36:52
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBD6
E=0202 T=07:36:52 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba5d6c (ErrorCode): 202
 Start Stripe #: 00711fd7

E=0202 T=07:37:12     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:37:12 P=1 : Soft reset drive
E=0207 T=07:37:12 P=1 : ResetDriveWait
E=0207 T=07:37:12 P=1 : ResetDriveWait
E=0202 T=07:37:12 P=1 : Inserting Set UDMA command
E=0202 T=07:37:12 P=1h: Repair LBA 0x388febda...OK
E=0202 T=07:37:12 P=1h: Repair LBA 0x388febdb...OK
E=0202 T=07:37:12 P=1h: Repair LBA 0x388febdc...OK
Send AEN (code, time): 0023h, 05/26/2012 07:37:12
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBDB
E=0202 T=07:37:12 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba66db (ErrorCode): 202
 Start Stripe #: 00711fd7

E=0202 T=07:37:42     : Data ECC error (int)
task file written out : cd dh ch cl sn sc ft
                      : 60 78 8F EB 80 80 80
E=0202 T=07:37:42 P=1 : Soft reset drive
E=0207 T=07:37:42 P=1 : ResetDriveWait
E=0207 T=07:37:42 P=1 : ResetDriveWait
E=0202 T=07:37:42 P=1 : Inserting Set UDMA command
E=0202 T=07:37:42 P=1h: Repair LBA 0x388febdf...OK
E=0202 T=07:37:42 P=1h: Repair LBA 0x388febe0...OK
E=0202 T=07:37:42 P=1h: Repair LBA 0x388febe1...OK
Send AEN (code, time): 0023h, 05/26/2012 07:37:42
Sector repair completed
(EC:0x23, SK=0x01, ASC=0x11, ASCQ=0x00, SEV=02, Type=0x71)
port=1, LBA=0x388FEBE0
E=0202 T=07:37:42 P=1h: Complete IPRs in error

A-Verifier ERROR, time: 39ba7256 (ErrorCode): 202
 Start Stripe #: 00711fd7

といった感じになる。

 ベリファイが完了したら、smartmontools で S.M.A.R.T. 情報を取得する。

# smartctl -a /dev/twa0 -d 3ware,1
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.0-gentoo] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar P7K500
Device Model:     Hitachi HDP725050GLA360
Serial Number:    GEA5**********
LU WWN Device Id: 5 000cca 32cd9c149
Firmware Version: GM4OA52A
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sat May 26 22:40:34 2012 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		( 7890) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 131) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   054    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   120   120   024    Pre-fail  Always       -       330 (Average 307)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       117
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail  Offline      -       144
  9 Power_On_Hours          0x0012   095   095   000    Old_age   Always       -       35658
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       117
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       147
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       147
194 Temperature_Celsius     0x0002   139   139   000    Old_age   Always       -       43 (Min/Max 15/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     35650         -
# 2  Extended offline    Completed: read failure       10%     35622         948956110
# 3  Short offline       Completed without error       00%     35620         -
# 4  Extended offline    Completed: read failure       10%     35615         948956112
2 of 2 failed self-tests are outdated by newer successful extended offline self-test # 1

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 Current_Pending_Sector が 0 となり、かわりに Reallocated_Sector_CtReallocated_Event_Count がそれぞれ 1 になっている。これで修復完了となる。

 このマシンは他にも PostgreSQL に某相場情報が 1 億行以上入っていたりと割と大量のデータを保存していたりする関係で、予備ドライブを 1 本確保してある。それに交換しようか迷ったが、アレイがデグレードしていないこと、息子がおもちゃにすることがあり、投入する際にチェックする必要もあることから今回はセクタの再配置のみで済ませることにした。


Comments

comments

Powered by Facebook Comments