Below is a list of smartctl commands I frequently use to quickly verify disk health and status, specially when you have smartd logging errors to messages log file.
Print all SMART (Self-Monitoring, Analysis and Reporting Technology) information for drive /dev/sda (Primary Master).
smartctl -a /dev/sda
Enable SMART on device.
smartctl --smart=on /dev/sda
Get info about the device:
smartctl -i /dev/sda
Show the capabilities of drive. Also provides status when tests are being carried out.
smartctl -c /dev/sda
Basic health status:
smartctl -H /dev/sda
Display attributes. The attributes to look out for failing disk is Reallocated_Sector_Ct, Reallocated_Event_Count, Current_Pending_Sector and Offline_Uncorrectable. Their RAW_VALUE should normally be "0".
smartctl -A /dev/sda
Immediate offline test which updates attributes value. Good to run after a badblocks fsck check before checking on the attributes values.
smartctl -t offline /dev/sda
Run a thorough long test if you see suspect attributes with -A option as mentioned above.
smartctl -t long /dev/sda
Examine self-test log. Shows if tests failed or passed.
smartctl -l selftest /dev/sda
Display most recent error log.
smartctl -l error /dev/sda
There are more examples in man smartctl.
Bookmark/Search this post with
Resolving sector errors on raid partition
On software raid partitions, CurrentPendingSector or OfflineUncorrectableSector errors as logged in syslog could be corrected just failing/removing the drive and re-attaching it back so the drive is rebuilt and the problem sectors get over-written.
Below, I have 4 CurrentPending and OfflineUncorrectable sectors:
# smartctl -A /dev/sdb | grep "Current_Pending_Sector\|Offline_Uncorrectable"
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 4
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 4
Doing a selftest, confirms that the first sector lies in the second partition:
# smartctl -l selftest /dev/sdb
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 18654 3166126
Sector 3166126 lies in the second partition:
# fdisk -lu /dev/sdb
Disk /dev/sdb: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders, total 1465149168 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 * 63 401624 200781 fd Linux raid autodetect
/dev/sdb2 401625 1465144064 732371220 fd Linux raid autodetect
Locate the raid partition:
# grep sdb2 /proc/mdstat
md1 : active raid10 sdb2[4] sdd2[3] sdc2[2] sda2[0]
Make the partition faulty and remove:
# mdadm --manage /dev/md1 -f /dev/sdb2
# mdadm --manage /dev/md1 -r /dev/sdb2
Re-attach the partition and let it rebuild:
# mdadm --manage /dev/md1 -a /dev/sdb2
Once rebuilt redo selftest and check on errors:
# smartctl -t long /dev/sdb
# smartctl -A /dev/sdb | grep "Current_Pending_Sector\|Offline_Uncorrectable"
# smartctl -l selftest /dev/sdb
Drive keeps extra space available to "remap" bad sectors. This happens automatically. If uncorrectable sector errors does not resolve or comes back time and again, it means re-mappable sectors are used up and drive will probably fail soon, so best to just replace the drive.