zfs-discuss@opensolaris.org
[Top] [All Lists]

Re: [zfs-discuss] periodic slow responsiveness

Subject: Re: [zfs-discuss] periodic slow responsiveness
From: Ross Walker
Date: Mon, 7 Sep 2009 12:01:35 -0400
On Sep 7, 2009, at 1:32 AM, James Lever <j@xxxxxxxxxxxx> wrote:


On 07/09/2009, at 10:46 AM, Ross Walker wrote:

zpool is RAIDZ2 comprised of 10 * 15kRPM SAS drives behind an LSI 1078 w/ 512MB BBWC exposed as RAID0 LUNs (Dell MD1000 behind PERC 6/E) with 2x SSDs each partitioned as 10GB slog and 36GB remainder as l2arc behind another LSI 1078 w/ 256MB BBWC (Dell R710 server with PERC 6/i).

This config might lead to heavy sync writes (NFS) starving reads due to the fact that the whole RAIDZ2 behaves as a single disk on writes. How about a 2 5 disk RAIDZ2s or 3 4 disk RAIDZs?

Just one or two other vdevs to spread the load can make the world of difference.

This was a management decision. I wanted to go down the striped mirrored pair solution, but the amount of space lost was considered too great. RAIDZ2 was considered the best value option for our environment.

Well a MD1000 holds 15 drives a good compromise might be 2 7 drive RAIDZ2s with a hotspare... That should provide 320 IOPS instead of 160, big difference.


The system is configured as an NFS (currently serving NFSv3), iSCSI (COMSTAR) and CIFS (using the SUN SFW package running Samba 3.0.34) with authentication taking place from a remote openLDAP server.

There are a lot of services here, all off one pool? You might be trying to bite off more then the config can chew.

Thatâs not a lot of services, really. We have 6 users doing builds on multiple platforms and using the storage as their home directory (windows and unix).

Ok, six users, but what happens during a build?

The issue is interactive responsiveness and if there is a way to tune the system to give that while still having good performance for builds when they are run.

Look at the write IOPS of the pool with the zpool iostat -v and look at how many are happening on the RAIDZ2 vdev.

Try taking a particularly bad problem station and configuring it static for a bit to see if it is.

That has been considered also, but the issue has also been observed locally on the fileserver.

Then I suppose you have eliminated automounter as a culprit at this point then.

That doesn't make a lot of sense to me the L2ARC is secondary read cache, if writes are starving reads then the L2ARC would only help here.

I was suggesting that slog write were possibly starving reads from the l2arc as they were on the same device. This appears not to have been the issue as the problem has persisted even with the l2arc devices removed from the pool.

The SSD will handle a lot more IOPS then the pool and L2ARC is a lazy reader, it mostly just holds on to read cache data.

It just may be that the pool configuration just can't handle the write IOPS needed and reads are starving.

Possible, but hard to tell. Have a look at the iostat results Iâve posted.

The busy times of the disks while the issue is occurring should let you know.

-Ross

_______________________________________________
zfs-discuss mailing list
zfs-discuss@xxxxxxxxxxxxxxx
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
<Prev in Thread] Current Thread [Next in Thread>