Your message dated Sun, 7 Jan 2007 14:07:52 +0100
with message-id <[email protected]>
has caused the Debian Bug report #405919,
regarding mdadm: checkarray does not report or fix mismatch_cnt issues
to be marked as having been forwarded to the upstream software
author(s) [email protected]
(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere. Please contact me immediately.)
Debian bug tracking system administrator
(administrator, Debian Bugs database)
--- Begin Message ---
Re: Bug#405919: mdadm: checkarray does not report or fix mismatch_cnt issues
martin f krafft
Sun, 7 Jan 2007 14:07:52 +0100
tags 405919 confirmed moreinfo
Forwarding this to Neil Brown, the md upstream maintainer. Neil, if
you don't feel like reading all of this, skip to "*** Neil:" below.
The full log is at http://bugs.debian.org/405919 .
also sprach Michel Lespinasse <[email protected]> [2007.01.07.1258 +0100]:
> I have noticed that /sys/block/md1/md/mismatch_cnt reports a count of
> 128 unsynchronized blocks. checkarray does not report or fix this issue.
> Doing the same manually (echo check >/sys/block/md1/md/sync_action) does
> not fix the issue either - mismatch_cnt is reset to 0 at the start of
> the resync, and goes up to 128 somewhere between 40% and 50% of the
Well, checkarray is called checkarray, not fixarray. Anyway, I agree
that it should report any problems and I thank you for pointing me
to this problem -- I thought that the kernel would log problems
itself, but apparently it does not. Could you please verify this by
checking all your logs?
I am going to address two points in turn: first, repairing the
array, then user notification:
This is the relevant information from md.txt:
This can be used to monitor and control the resync/recovery
process of MD. In particular, writing "check" here will cause
the array to read all data block and check that they are
consistent (e.g. parity is correct, or all mirror replicas are
the same). Any discrepancies found are NOT corrected.
A count of problems found will be stored in md/mismatch_count.
Alternately, "repair" can be written which will cause the same
check to be performed, but any errors will be corrected.
So you can easily repair the array yourself with 'repair' instead of
'check'. checkarray could be doing this, but I'd much rather not
have checkarray write to the array every first Sunday of a month
while the admin may be sleeping. Thus, I am convinced that repairing
should be the job for 'repairarray', which I'll add in a future
So this leaves user notification. The problem here is that
checkarray is asynchronous (sync_action is asynchronous), meaning
that it just tells the array to run a check and quits -- it does not
actually know when the check finishes.
I see three solutions, which I will present in decreasing order of
preference. I am not opposed to combining 1&2:
1. IMHO, the best solution would be if the md kernel driver would
tell klogd if it finds a mismatch on an array. This would then
end up with syslog and thus hopefully reach the admin.
Alternatively, the kernel could be told to call a user-space
programme specified via /proc, similar to how hotplug works.
2. I introduce another cron job or daemon, which doesn't do
anything but monitor the /sys/block/*/md/mismatch_cnt files and
report any non-zero contents via email. Obviously, it could
also write a log entry so the kernel would not have to. The
problem is simply the delay caused by the period of the checks
(e.g. only every 5 minutes), and the extra system load, which
I guess is negligible.
2b. The monitoring *could* well be done by mdadm --monitor.
I would greatly favour that.
3. I make checkarray synchronous by looping until a check
completes, then checking mismatch_cnt and taking similar action
to (2). I am not totally opposed to this, apart from the extra
complexity, since it would advance checkarray from being
a simple helper to a user-space tool that the admin could use
at will and be kept up to date on the progress. checkarray
could even crop /proc/mdstat and keep displaying it for visual
I am interested as to what Neil has to say. Also, Michel, what are
> I just noticed today that mdadm.conf only lists 3 of my 5 RAID1
> volumes, I do not know why (I did not edit the file after it was
Did you add the two arrays after it was auto-generated? What does
.''`. martin f. krafft <[email protected]>
: :' : proud Debian developer, author, administrator, and user
`. `'` http://people.debian.org/~madduck - http://debiansystem.info
`- Debian - when you have better things to do than fixing systems
--- End Message ---