|
|
On 05/31/2010 02:32 PM, Sandon Van Ness wrote:
> well it seems like when messing with the txg sync times and stuff like
> that it did make the transfer more smooth but didn't actually help with
> speeds as it just meant the hangs happened for a shorter time but at a
> smaller interval and actually lowering the time between writes just
> seemed to make things worse (slightly).
>
> I think I have came to the conclusion that the problem here is CPU due
> to the fact that its only doing this with parity raid. I would think if
> it was I/O based then it would be the same as if anything its heavier on
> I/O on non parity raid due to the fact that it is no longer CPU
> bottlenecked (dd write test gives me near 700 megabytes/sec vs 450 with
> parity raidz2).
>
> So if I am understnading things the issueI am seeing should be fixed but
> aparrantly its not (in my case) as CPU usage from parity/zfs
> calculations are takin g precidence over my process doing the writting
> (rsync)?
>
> I think I have near 100% come to the conclusion that the issue is CPU
> based due the fact I saw the same dips even when using mbuffer
And here is some top output the slowdowns occur when zfs-pool starts
using cpu and rsnc gets CPU starved:
Normal activity shows:
last pid: 22635; load avg: 2.17, 2.18, 2.16; up
0+18:04:42
14:53:29
59 processes: 57 sleeping, 1 running, 1 on cpu
CPU states: 54.7% idle, 23.4% user, 21.9% kernel, 0.0% iowait, 0.0% swap
Kernel: 37646 ctxsw, 193 trap, 20914 intr, 45295 syscall
Memory: 4027M phys mem, 190M free mem, 2013M total swap, 2013M free swap
PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND
1326 root 1 59 -20 383M 44M run 496:47 28.87% rsync
1322 root 1 59 -20 383M 357M sleep 11:21 0.70% rsync
3 root 1 60 -20 0K 0K sleep 1:24 0.06% fsflush
when starved:
last pid: 22636; load avg: 2.16, 2.18, 2.16; up
0+18:05:16
14:54:03
59 processes: 57 sleeping, 2 on cpu
CPU states: 24.9% idle, 10.5% user, 64.6% kernel, 0.0% iowait, 0.0% swap
Kernel: 17855 ctxsw, 18 trap, 12831 intr, 21090 syscall
Memory: 4027M phys mem, 198M free mem, 2013M total swap, 2013M free swap
PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND
604 root 39 99 -20 0K 0K cpu/0 316:55 53.36% zpool-data
1326 root 1 59 -20 383M 44M sleep 497:03 13.49% rsync
1322 root 1 59 -20 383M 357M sleep 11:21 0.33% rsync
22635 root 1 59 0 3852K 1912K cpu/1 0:00 0.06% top
3 root 1 60 -20 0K 0K sleep 1:24 0.06% fsflush
The stall actually happens less than a second but the solaris version of
top doesn't seem to be able to take <1 values (other than 0) when using
-s like you can on linux (-d .5) otherwise I think the zpool-data would
be near 100% cpu during the stall.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@xxxxxxxxxxxxxxx
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|