> How do you intend to determine if a component needs to be case or format
> canonicalized without parsing it?
I didn't say we wouldn't parse a DN given to us by a user. A DN stored
on disk should not need to be parsed in the same way, as long as we
always store on disk in a canonicalised format.
> Remember, we can't just upper or lower case the whole string. Parts may
> be case sensitive, while other parts may not be.
I know, and the old code that treated the dn as a 'char *' knew about
this and did the right thing. Just because its a char* doesn't mean
the only operation you can do on it is strcmp() :-)
The point is to optimise the common paths. At the moment the ldb_dn.c
code makes the worst case assumption at just about every point, which
is why its slow. We just be just as correct while being fast.
A simple example is the DNs stored on disk. If they have been stored
on disk then they must already be canonicalised and have been
validated. There is no point in validating them again when they are
read back from disk.
Please don't assume I'm going to be throwing away ldap semantics in
this quest for speed :) I just happen to believe that it is possible
to achieve those semantics with 5x less CPU cost than we have now. The
code will end up looking quite different from the current code, but it
should have the same semantics.
I also think we could probably gain about a 2x speedup while keeping
the current ldb_dn structure, and that is certainly an
alternative. Just look at routines like seek_to_separator(),
get_quotes_position(), ldb_dn_unescape_value() and you can easily find
some fat that could be trimmed to make the current DN handling much