Главная > Freebsd, Всякое важное, Железо > Мониторинг HDD S.M.A.R.T.

Мониторинг HDD S.M.A.R.T.

3 837

Понадобилось мониторить состояние жестких дисков на удаленных серверах. Для этого использовал smartmontools (S.M.A.R.T. disk monitoring tools)

[sourcecode language=»js»]
# cd /usr/ports/sysutils/smartmontools && make install clean
[/sourcecode]

После установки добавляем в rc.conf:

#  SMART
smartd_enable=»YES»
smartd_flags=»-l local2 —interval=300″

Попробуем опросить диск:

[sourcecode language=»js»]
# atacontrol list

ATA channel 0:
Master:      no device present
Slave:       no device present
ATA channel 2:
Master: acd0 <Optiarc DVD RW AD-5200S/1.09> SATA revision 1.x
Slave:       no device present
ATA channel 3:
Master:  ad6 <ST3200820AS/3.AAC> SATA revision 1.x
Slave:       no device present

# /usr/local/sbin/smartctl -a /dev/ad6

smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-RELEASE amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10
Device Model:     ST3200820AS
Serial Number:    5QE0ER7T
Firmware Version: 3.AAC
User Capacity:    200,049,647,616 bytes [200 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Jun  6 09:16:11 2012 MSD
SMART support is: Available — device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection:                (  430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  74) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x000f   117   088   006    Pre-fail  Always       —       155087889
3 Spin_Up_Time            0x0003   091   090   000    Pre-fail  Always       —       0
4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       —       1113
5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       —       0
7 Seek_Error_Rate         0x000f   086   060   030    Pre-fail  Always       —       416806164
9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       —       14299
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       —       0
12 Power_Cycle_Count       0x0032   099   099   020    Old_age   Always       —       1130
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       —       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       —       0
190 Airflow_Temperature_Cel 0x0022   063   048   045    Old_age   Always       —       37 (Min/Max 31/37)
194 Temperature_Celsius     0x0022   037   052   000    Old_age   Always       —       37 (0 14 0 0 0)
195 Hardware_ECC_Recovered  0x001a   077   066   000    Old_age   Always       —       218276512
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       —       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      —       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       —       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      —       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       —       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
1        0        0  Not_testing
2        0        0  Not_testing
3        0        0  Not_testing
4        0        0  Not_testing
5        0        0  Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[/sourcecode]
Видим, что SMART поддерживается и включен:
[sourcecode language=»js»]
SMART support is: Available — device has SMART capability.
SMART support is: Enabled
[/sourcecode]

Если выключен, то необходимо включить:

[sourcecode language=»js»]
# /usr/local/sbin/smartctl -s on /dev/ad6
[/sourcecode]

Редактируем конфигурационный файл:
[sourcecode language=»js»]
# cp /usr/local/etc/smartd.conf.sample /usr/local/etc/smartd.conf
[/sourcecode]

/dev/ad6 -a -I 194 -W 4,45,55 -R 5 -m monit@serkas.pp.ru -o on -S on -s (S/../.././02|L/../../6/03)

Настроим логирование и ротацию:
[sourcecode language=»js»]
# ee /etc/syslog.conf
[/sourcecode]

# Мониторинг SMART
!smartd
*.*                                        /var/log/smartd.log

[sourcecode language=»js»]
# touch /var/log/smartd.log
[/sourcecode]

Добавляем сжатие логов

[sourcecode language=»js»]
# ee /etc/newsyslog.conf
[/sourcecode]

/var/log/smartd.log                     644  2     500  *     JC

[sourcecode language=»js»]
# killall -1 syslogd
# /etc/rc.d/newsyslog restart
# /usr/local/etc/rc.d/smartd start
Starting smartd.
# ps -ax | grep smartd
67455  ??  I      0:00.01 /usr/local/sbin/smartd -p /var/run/smartd.pid -l local2 —interval=300
67467   0  S+     0:00.00 grep smartd
[/sourcecode]
Смотрим что у нас в логе начало писаться:
[sourcecode language=»js»]
# cat /var/log/smartd.log
Jun  6 09:44:29 ramenskoe smartd[67453]: smartd 5.42 2011-10-20 r3458 [FreeBSD 8.2-RELEASE amd64] (local build)
Jun  6 09:44:29 ramenskoe smartd[67453]: Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Jun  6 09:44:29 ramenskoe smartd[67453]: Opened configuration file /usr/local/etc/smartd.conf
Jun  6 09:44:29 ramenskoe smartd[67453]: Configuration file /usr/local/etc/smartd.conf parsed.
Jun  6 09:44:29 ramenskoe smartd[67453]: Device: /dev/ad6, opened
Jun  6 09:44:29 ramenskoe smartd[67453]: Device: /dev/ad6, ST3200820AS, S/N:5QE0ER7T, FW:3.AAC, 200 GB
Jun  6 09:44:29 ramenskoe smartd[67453]: Device: /dev/ad6, found in smartd database: Seagate Barracuda 7200.10
Jun  6 09:44:29 ramenskoe smartd[67453]: Device: /dev/ad6, enabled SMART Attribute Autosave.
Jun  6 09:44:29 ramenskoe smartd[67453]: Device: /dev/ad6, enabled SMART Automatic Offline Testing.
Jun  6 09:44:29 ramenskoe smartd[67453]: Device: /dev/ad6, is SMART capable. Adding to "monitor" list.
Jun  6 09:44:29 ramenskoe smartd[67453]: Monitoring 1 ATA and 0 SCSI devices
Jun  6 09:44:29 ramenskoe smartd[67453]: Device: /dev/ad6, initial Temperature is 38 Celsius (Min/Max ??/38)
Jun  6 09:44:29 ramenskoe smartd[67455]: smartd has fork()ed into background mode. New PID=67455.
Jun  6 09:44:29 ramenskoe smartd[67455]: file /var/run/smartd.pid written containing PID 67455
[/sourcecode]

При возникновении проблемы, администратор получит соответствующее письмо на почту

  1. Интерны
    13 ноября 2012 в 17:03 | #1

    Благодарю!

  1. Пока что нет уведомлений.

.