[email protected]
[Top] [All Lists]

Bug#405919: marked as forwarded (mdadm: checkarray does not report or fi

Subject: Bug#405919: marked as forwarded mdadm: checkarray does not report or fix mismatch_cnt issues
From: Debian Bug Tracking System
Date: Sun, 07 Jan 2007 05:18:54 -0800
Your message dated Sun, 7 Jan 2007 14:07:52 +0100
with message-id <[email protected]>
has caused the Debian Bug report #405919,
regarding mdadm: checkarray does not report or fix mismatch_cnt issues
to be marked as having been forwarded to the upstream software
author(s) [email protected]

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--- Begin Message ---
Subject: Re: Bug#405919: mdadm: checkarray does not report or fix mismatch_cnt issues
From: martin f krafft
Date: Sun, 7 Jan 2007 14:07:52 +0100
tags 405919 confirmed moreinfo

Forwarding this to Neil Brown, the md upstream maintainer. Neil, if
you don't feel like reading all of this, skip to "*** Neil:" below.
The full log is at http://bugs.debian.org/405919 .

also sprach Michel Lespinasse <[email protected]> [2007.01.07.1258 +0100]:
> I have noticed that /sys/block/md1/md/mismatch_cnt reports a count of 
> 128 unsynchronized blocks. checkarray does not report or fix this issue.
> Doing the same manually (echo check >/sys/block/md1/md/sync_action) does
> not fix the issue either - mismatch_cnt is reset to 0 at the start of 
> the resync, and goes up to 128 somewhere between 40% and 50% of the 
> resync

Well, checkarray is called checkarray, not fixarray. Anyway, I agree
that it should report any problems and I thank you for pointing me
to this problem -- I thought that the kernel would log problems
itself, but apparently it does not. Could you please verify this by
checking all your logs?

I am going to address two points in turn: first, repairing the
array, then user notification:

This is the relevant information from md.txt:

      This can be used to monitor and control the resync/recovery
      process of MD. In particular, writing "check" here will cause
      the array to read all data block and check that they are
      consistent (e.g. parity is correct, or all mirror replicas are
      the same). Any discrepancies found are NOT corrected.

      A count of problems found will be stored in md/mismatch_count.

      Alternately, "repair" can be written which will cause the same
      check to be performed, but any errors will be corrected. 

So you can easily repair the array yourself with 'repair' instead of
'check'. checkarray could be doing this, but I'd much rather not
have checkarray write to the array every first Sunday of a month
while the admin may be sleeping. Thus, I am convinced that repairing
should be the job for 'repairarray', which I'll add in a future

So this leaves user notification. The problem here is that
checkarray is asynchronous (sync_action is asynchronous), meaning
that it just tells the array to run a check and quits -- it does not
actually know when the check finishes.

*** Neil:

I see three solutions, which I will present in decreasing order of
preference. I am not opposed to combining 1&2:

  1. IMHO, the best solution would be if the md kernel driver would
     tell klogd if it finds a mismatch on an array. This would then
     end up with syslog and thus hopefully reach the admin.
     Alternatively, the kernel could be told to call a user-space
     programme specified via /proc, similar to how hotplug works.

  2. I introduce another cron job or daemon, which doesn't do
     anything but monitor the /sys/block/*/md/mismatch_cnt files and
     report any non-zero contents via email. Obviously, it could
     also write a log entry so the kernel would not have to. The
     problem is simply the delay caused by the period of the checks
     (e.g. only every 5 minutes), and the extra system load, which
     I guess is negligible.

     2b. The monitoring *could* well be done by mdadm --monitor.
         I would greatly favour that.

  3. I make checkarray synchronous by looping until a check
     completes, then checking mismatch_cnt and taking similar action
     to (2). I am not totally opposed to this, apart from the extra
     complexity, since it would advance checkarray from being
     a simple helper to a user-space tool that the admin could use
     at will and be kept up to date on the progress. checkarray
     could even crop /proc/mdstat and keep displaying it for visual

I am interested as to what Neil has to say. Also, Michel, what are
your thoughts?

> I just noticed today that mdadm.conf only lists 3 of my 5 RAID1
> volumes, I do not know why (I did not edit the file after it was
> auto-generated).

Did you add the two arrays after it was auto-generated? What does
/usr/share/mdadm/mkconf output?

 .''`.   martin f. krafft <[email protected]>
: :'  :  proud Debian developer, author, administrator, and user
`. `'`   http://people.debian.org/~madduck - http://debiansystem.info
  `-  Debian - when you have better things to do than fixing systems

--- End Message ---
<Prev in Thread] Current Thread [Next in Thread>
  • Bug#405919: marked as forwarded (mdadm: checkarray does not report or fix mismatch_cnt issues), Debian Bug Tracking System <=