Most ECC setups are as you describe. The memory hardware detects and corrects
all 1-bit errors, and detects all two-bit errors on its own. What ... should
... happen is that the OS should get an interrupt when this happens so it has
the opportunity to note the error in logs and to higher level stuff if needed -
map out the memory in question, call an operator, halt and catch fire, etc. But
the hardware must have that interrupt line connected to something to even make
this possible. And the OS doesn't have to do anything, necessarily.
Although, as mentioned, if the OS has a low level read-through-memory routine,
it does guarantee that memory is scrubbed of one-bit error and bad pages found.
This message posted from opensolaris.org
zfs-discuss mailing list