Device: /dev/sda, 1 Currently unreadable (pending) sectors: Unterschied zwischen den Versionen

Aus NOBAQ
Zur Navigation springenZur Suche springen
(Die Seite wurde neu angelegt: „<section begin="head"/> I sometimes get these annoying messages from smartd. They only occur with Samsung HD103SIS SATA disks. According to my google research, th…“)
 
 
Zeile 2: Zeile 2:
 
I sometimes get these annoying messages from smartd. They only occur with Samsung HD103SIS SATA disks. According to my google research, these messages are not harmful and a Samsung speciality. One reason for me to avoid Samsung in future. The messages should disappear after a reboot. However, since the disks are built into a RAID1 on a 24x7 server I want to avoid rebooting the server. Instead I can get rid of these messages without a reboot when overwriting the whole disk with zeros. Please click [[Device: /dev/sda, 1 Currently unreadable (pending) sectors|here]] for details.
 
I sometimes get these annoying messages from smartd. They only occur with Samsung HD103SIS SATA disks. According to my google research, these messages are not harmful and a Samsung speciality. One reason for me to avoid Samsung in future. The messages should disappear after a reboot. However, since the disks are built into a RAID1 on a 24x7 server I want to avoid rebooting the server. Instead I can get rid of these messages without a reboot when overwriting the whole disk with zeros. Please click [[Device: /dev/sda, 1 Currently unreadable (pending) sectors|here]] for details.
 
<section end="head"/>
 
<section end="head"/>
 +
 +
== Removing the disk from the array ==
 +
 +
The first step is remove the corresponding disk from the array. This is first done by failing the disk and afterwards removing it. In my case, the disk is part of three RAID1 volumes md0, md1 and md2:
 +
 +
mdadm --manage /dev/md0 --fail /dev/sda1
 +
mdadm --manage /dev/md0 --remove /dev/sda1
 +
 +
mdadm --manage /dev/md2 --fail /dev/sda3
 +
mdadm --manage /dev/md2 --remove /dev/sda3
 +
 +
mdadm --manage /dev/md1 --fail /dev/sda4
 +
mdadm --manage /dev/md1 --remove /dev/sda4
 +
 +
== Zeroing the disk ==
 +
 +
Since the disk is removed from the RAID it is safe now to overwrite is with zeroes. As soon as the disk is completely zeroed out, the messages disappear. In my case, the disk is 1TB:
 +
 +
cat /dev/zero | pv -s 1000G | dd of=/dev/sda bs=100M
 +
 +
== Recreate partition layout ==
 +
 +
Now you have to recreate the partition layout with fdisk or cfdisk. In my case, the two RAID1 disks are identical, so I can copy the partition layout from the other disk:
 +
 +
sfdisk -d /dev/sdb | sfdisk /dev/sda
 +
sfdisk -R /dev/sda
 +
 +
== Adding the disk to the RAID ==
 +
 +
This is simple as before:
 +
 +
mdadm --manage /dev/md0 --add /dev/sda1
 +
mdadm --manage /dev/md2 --add /dev/sda3
 +
mdadm --manage /dev/md1 --add /dev/sda4
 +
 +
== reinstall grub ==
 +
 +
In my case I installed grub on both disks. In case of failure of a single disk, the system is always fully bootable and running.
 +
 +
grub-install /dev/sda
 +
 +
Or if you use symbolic names (like me), just reinstall on both devices:
 +
 +
grub-install /dev/disk/by-id/scsi-SATA_SAMSUNG_HD103SIS1VSJ1KS300499
 +
grub-install /dev/disk/by-id/scsi-SATA_SAMSUNG_HD103SIS1VSJ1KS300505
  
 
= Comments =
 
= Comments =

Aktuelle Version vom 2. Mai 2011, 14:13 Uhr

I sometimes get these annoying messages from smartd. They only occur with Samsung HD103SIS SATA disks. According to my google research, these messages are not harmful and a Samsung speciality. One reason for me to avoid Samsung in future. The messages should disappear after a reboot. However, since the disks are built into a RAID1 on a 24x7 server I want to avoid rebooting the server. Instead I can get rid of these messages without a reboot when overwriting the whole disk with zeros. Please click here for details.


Removing the disk from the array

The first step is remove the corresponding disk from the array. This is first done by failing the disk and afterwards removing it. In my case, the disk is part of three RAID1 volumes md0, md1 and md2:

mdadm --manage /dev/md0 --fail /dev/sda1
mdadm --manage /dev/md0 --remove /dev/sda1

mdadm --manage /dev/md2 --fail /dev/sda3
mdadm --manage /dev/md2 --remove /dev/sda3

mdadm --manage /dev/md1 --fail /dev/sda4
mdadm --manage /dev/md1 --remove /dev/sda4

Zeroing the disk

Since the disk is removed from the RAID it is safe now to overwrite is with zeroes. As soon as the disk is completely zeroed out, the messages disappear. In my case, the disk is 1TB:

cat /dev/zero | pv -s 1000G | dd of=/dev/sda bs=100M

Recreate partition layout

Now you have to recreate the partition layout with fdisk or cfdisk. In my case, the two RAID1 disks are identical, so I can copy the partition layout from the other disk:

sfdisk -d /dev/sdb | sfdisk /dev/sda 
sfdisk -R /dev/sda

Adding the disk to the RAID

This is simple as before:

mdadm --manage /dev/md0 --add /dev/sda1
mdadm --manage /dev/md2 --add /dev/sda3
mdadm --manage /dev/md1 --add /dev/sda4

reinstall grub

In my case I installed grub on both disks. In case of failure of a single disk, the system is always fully bootable and running.

grub-install /dev/sda

Or if you use symbolic names (like me), just reinstall on both devices:

grub-install /dev/disk/by-id/scsi-SATA_SAMSUNG_HD103SIS1VSJ1KS300499
grub-install /dev/disk/by-id/scsi-SATA_SAMSUNG_HD103SIS1VSJ1KS300505

Comments

<comments />

Niki Hammler meinte …

<comment date="2012-08-30T08:30:04Z" name="Niki Hammler"> One additional comment: The problem is NOT solved by rebooting. It seems that zeroing out is necessary :-( </comment>

mike meinte …

<comment date="2012-12-26T17:07:31Z" name="mike"> hello! i have /dev/sdb 3TB how to perform Zeroing the disk for 3TB

cat /dev/zero | pv -s 3000G | dd of=/dev/sda bs=100M

or this is not right ? please show the correct command

I'm afraid to make a mistake

</comment>

Niki meinte …

<comment date="2012-12-29T17:29:49Z" name="Niki"> Are you sure you have the same config as I (RAID1)? Only in this case my howto makes sense, otherwise you will destroy all your data!!

If you are sure (really sure!) then the disk in the command must match:

cat /dev/zero | pv -s 3000G | dd of=/dev/sdb bs=100M 


</comment>

James Hightower meinte …

<comment date="2013-06-13T15:37:50Z" name="James Hightower"> I think your can skip the zero-ing part. Unless you have write-intent bitmaps enabled, then just removing and re-adding a drive will cause MD to overwrite the entire disk, including your pending-bad blocks. Thiis has worked for me many times. Remember, though, that the SMART Offline_Uncorrectable won't get updated until the next offline collection occours, like with smartctl -t offline. </comment>

Niki meinte …

<comment date="2013-06-14T05:27:11Z" name="Niki"> Dear James,

Thank your for your comment! I will try it our the next time I encounter the message!

Niki </comment>