netbsd-help@netbsd.org
[Top] [All Lists]

Core2 Duo 1.8 NetBSD 4BETA SLOWER than Celeron M 1.3 NetBSD3 - Help!

Subject: Core2 Duo 1.8 NetBSD 4BETA SLOWER than Celeron M 1.3 NetBSD3 - Help!
From: Lasse Hillerøe Petersen
Date: Wed, 18 Oct 2006 22:41:58 +0200
Help!

This is really beyond me.

I have posted a few times about this "great fast" new Core2 Duo machine I bought a while ago. After a problem with a defective 512 MB RAM block, which I swapped for two 1 GB block of a less unknown brand, I thought my problems were solved. The machine runs NetBSD 4.0BETA, from dmesg (which I have posted before, so here only some relevant excerpts):
NetBSD 4.0_BETA (GENERIC.MPACPI) #0: Fri Sep 15 03:25:05 UTC 2006
builds@xxxxxxxxxxxxx:/home/builds/ab/netbsd-4/i386/200609140000Z-obj/home/builds/ab/netbsd-4/src/sys/arch/i386/compile/GENER
IC.MPACPI
total memory = 2039 MB
avail memory = 1994 MB

The machine is equipped with a Samsung 80 GB SATA II disk, and I added an older Maxtor because I need to clean up a lot of old "garbage".

atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 1: <LITE-ON DVD SOHD-16P9S, , FS09> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
wd0 at atabus0 drive 0: <Maxtor 6Y080L0>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 78167 MB, 158816 cyl, 16 head, 63 sec, 512 bytes/sect x 160086528 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
cd0(piixide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
wd1 at atabus1 drive 0: <SAMSUNG HD080HJ>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 76319 MB, 155061 cyl, 16 head, 63 sec, 512 bytes/sect x 156301488 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7
wd1(piixide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
boot device: wd1
root on wd1a dumps on wd1b
root file system type: ffs

I also have a slightly old, mediocre ThinkPad R50e, with a 1.3 GHz Celeron M CPU and 1270 MB RAM. I have upgraded its disk to a 120 GB WD Scorpion. It is this machine that has accumulated the "garbage" (Music CD images, ripped music from my CD collection, huge Maildir mail archives with thousands of small mail files, disk dumps, source code, etc etc.) 43722 MB in total.

Thinking that the new machine will be better suited for sorting all this out, and because I want to clean the ThinkPad and reinstall it differently, I have moved all this data to the new machine. As I intend to buy bigger SATA disks when economy allows, and as the two disks are dissimilar, I have not configured RAID-1, so to be safe, I copied the data twice.

Because I really am an insane, paranoid nut, I use this little script (md5dir) to verify data integrity:
#! /bin/sh
cd $1
find . -type f |(IFS="" ; while read f ; do echo `md5 <"$f"`"##$f" ; done)

It's perhaps not as fast/efficient/smart as mtree, but it does precisely what I need. Generates a simple list that can easily be manipulated with sed, cut, uniq, sort, perl, diff etc. It's also great for identifying duplicate files and so on.

Here is the time from the ThinkPad (which is called "able". I use the old phonetic alphabet to name my machines, but as you will see later, this has turned ironic on me):

able:/home $ time sudo md5dir >/tmp/lhp.md5sums
/home/lhp/bin/md5dir: cannot open ./MUSIK/Modest_Musorgskij/Nina Kavtaradze/Piano Music (Disc 2)/2-12 Limoges. Le MarcheÌ (La Grande Nouvelle).m4a: no such file /home/lhp/bin/md5dir: cannot open ./MUSIK/Modest_Musorgskij/Nina Kavtaradze/Piano Music (Disc 2)/2-17 Hopak De Jeunes Ukrainiens (De L'opeÌra) _La Foire De Sorotchintsy_.m4a: no such file /home/lhp/bin/md5dir: cannot open ./MUSIK/Modest_Musorgskij/Nina Kavtaradze/Piano Music (Disc 2)/2-18 SceÌne De Foire (Fragment De L'opeÌra) _La Foire De Sorotchintsy_.m4a: no such file /home/lhp/bin/md5dir: cannot open ./MAIL_NEWS/News/Archive/Re Tintins "far" HergeÌ i Horn: no such file
4264.78s real  1336.45s user  1363.77s system

As you see, it took 71 minutes to completely hash everything. /bin/sh has problems with some filenames, but that's unimportant.

Of course I ran the same on the two copies on the Core2Duo machine. Now, I *have* had problems with /bin/sh giving segmentation faults now and then, even after replacing the RAM, but no more memory faults. As a temporary fix, I did "mv /bin/sh /bin/osh ; ln /bin/ksh /bin/sh", which helped a bit when I built stuff from pkgsrc.

This had the added bonus of not giving errors with 8bit characters in filenames as seen above. When I ran my script on the copy on the Maxtor disk, it ran OK. I let it run over night, so I don't know the time it took. (I just reran on the Maxtor with /bin/osh, and it crashed after 20 minutes. I then timed the Maxtor disk with ksh, and this time it ran - again without any fault: dog:/disk2/usr/ablehome $ time sudo md5dirKSH lhp >ksh.md5sums 3909.83s real 1189.11s user 1362.91s system

It is worth noticing that this was barely faster than "able". Presumably this just implies that I/O is the limiting bottleneck in this operation.)

But when I tried to do the same on the Samsung SATA disk, I got *memory fault* errrors after processing about 250000 of the 1.4 million files. Sometimes sooner, sometimes later. I tried to switch to /bin/osh, and to /rescue/sh and /rescue/ksh, but still I would get a memory fault after some (fairly long) time. Also, it would take noticeably longer. After up a way to do a shorter list of files at a time, and then cat together the complete list, I remembered that I had bash installed in /usr/pkg/bin. I have now been running the script for more than 6 hours - but at least bash didn't crash! (It just finished right now.)

So, to sum things up:

I have a supposedly "wicked fast" machine, which turns out to live up to the name I happened to bestow upon it: dog. I get occasional segfaults with /bin/sh, whereas /bin/ksh works slightly better, but in some situations, it also segfaults - at least when doing stuff with the SATA disk. Bash seems to work better, but is slow as hell.

The whole mess seems to be related to the system it's running: i386-MPACPI, 4.0BETA build 200609140000Z, the size of its memory (?), and the type of task I try to perform: a shell script going through 1,405,214 files of varying size, doing an MD5 sum on each. This I suppose implies large pipes, lots of memory mapped file I/O, etc.

However I don't really have the knowledge to even find out where to begin debugging this mess. I can barely come up with a few questions, which I hope some knowledgeable persons may have answers for: * Am I correct in assuming that the RAM is not necessarily to blame here, IOW, can memory faults occur due to other reasons than bad RAM? * Is there a more suitable system/kernel than i386-GENERIC.MPACPI I could use? Switch to amd64 perhaps? Others have been talking about XEN in connection with Core2Duo machines? * Why is there such a difference between the SATA disk and the PATA disk? Running the same script on the same data on the PATA is fine, on the other I eventually get memory faults. Consistently. * Am I doing myself a disservice by running 4.0BETA rather than 3.x? I had hoped I would gain support for the Realtek 8168B ethernet device on the motherboard (ASRock ConRoe 945G-DVI), but I haven't had any luck there either. * What I am most concerned about is whether there is still a hardware fault, which only shows up under heavy load. But after having replaced the RAM, I feel this is rather unlikely? Am I being too optimistic?

Any suggestions as to what I should do with this machine (well, obviously excluding suggestions to donate it, trash it etc) would be most welcome! And if I have accidentally stumbled upon some rare - maybe even subtle - bug, that only shows up under special circumstances and loads, I sure would like to help get this fixed. I would file a PR - if I wasn't so unsure as to what to write in it! If someone could suggest some tests to run, I would be delighted to do so!

My plan was for this machine to replace the Pentium II 233 MHz with it's whining 40GB drive, which is my current home server. (This machine was set up quickly to stand in for a 350 MHz machine, which didn't come up after a power outage.) The high-pitch howling of "fox" is getting on my nerves, but before I put the "dog" on its watch, I want to be sure it can handle the job!

-Lasse

<Prev in Thread] Current Thread [Next in Thread>