|
|
On Wed, 18 Mar 2009, Joerg Schilling wrote:
The problem in this case is not whether rename() is atomic but whether the
file that replaces the old file in an atomic rename() operation is in a
stable state on the disk before calling rename().
This topic is quite disturbing to me ...
The calling sequence of the failing code was:
f = open("new", O_WRONLY|O_CREATE|O_TRUNC, 0666);
write(f, "dat", size);
close(f);
rename("new", "old");
The only granted way to have the file "new" in a stable state on the disk
is to call:
f = open("new", O_WRONLY|O_CREATE|O_TRUNC, 0666);
write(f, "dat", size);
fsync(f);
close(f);
But the problem is not that the file "new" is in an unstable state.
The problem is that it seems that some filesystems are not preserving
the ordering of requests. Failing to preserve the ordering of
requests is fraught with peril.
POSIX does not care about "disks" or "filesystems". The only correct
behavior is for operations to be applied in the order that they are
requested of the operating system. This is a core function of any
operating system. It is therefore ok for some (or all) of the data
which was written to "new" to be lost, or for the rename operation to
be lost, but it is not ok for the rename to end up with a corrupted
file with the new name.
In summary, I don't agree with you that the misbehavior is correct,
but I do agree that copious expensive fsync()s should be assured to
work around the problem.
As it happens, current versions of my own application should be safe
from this Linux filesystem bug, but older versions are not. There is
even a way to request fsync() on every file close, but that could be
quite expensive so it is not the default.
Bob
--
Bob Friesenhahn
bfriesen@xxxxxxxxxxxxxxxxxxx, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@xxxxxxxxxxxxxxx
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|