At Fri, 08 Oct 2004 16:04:03 +0200,
Bas Wijnen <b.wijnen@xxxxxxxxxxx> wrote:
> > It consumes kernel memory.
> > (The Hurd design doesn't even use recursive mappings - all mappings
> > are done flat by physmem.)
> I tend to agree with Neal that using redirectors for all IPC operations
> is too much overhead, however, it would be neccesary if this problem
> cannot be solved.
Well, you wouldn't need to do it for all tasks, just for tasks from
users you don't trust. And putting a redirector into a small space
(on ia32) which just checks if there are any map items in the message
and if not, propagates the message could be pretty fast, I assume.
The actual speed loss depends on the architecure, of course, but on
arches where many message registers fit into physical registers, you
wouldn't even need to reload them. You'd probably need one redirector
per CPU, though.
In any case, this is a paranoia security fix, along the line of fixing
the fork bomb problem in Unix systems. For me it's enough to know
that our design does not prevent such security enhancements.
> However, I think it can, there are two ways that I
> can think of (both of which involve changing the kernel, which is not
> nice, but IMO much better than redirecting all IPCs.)
> - One solution is to make IPC of map items a priviledged action, which
> can only be done by threads in the root server's address space
I don't like this very much, because it is like cutting off the whole
arm. But you could actually extend the meaning of the redirector
field by saying: If the redirector is set to "any local thread" then
map items (in inter-AS IPC) are not allowed, otherwise the message
will be processed normally.
Sort of a built-in redirector with the behaviour I described :)
> - The other solution is to put the page tables in user space, like the
> UTCB area. In that case, an address space will have a maximum number of
> mappings. This seems like a theoretically nice solution, however I
> don't know if it fits at all into the current design of the kernel, and
> I expect it to be a huge change. I also don't know the performance
> implications. I'm guessing the L4 people thought of it and rejected it
> for some reason. It's not something we should do, anyway.
Yeah, this is the holy grail here, I guess. I think EROS has
something like that. You could ask the L4 people, of course. Maybe
it's planned for a future version.
> So I think there are two options here: use redirectors or make mapping a
> priviledged operation. I prefer the last one.
I think a kernel change like the first one you described (well, I
prefer my idea, but anyway, a small non-intrusive kernel change) would
be acceptable. We don't need to do it. We just need to ensure the
1. We understand the security limitations of our design, and their
2. We can offer strategies to people who want to go beyond these
limitations that are practical.
At least for this issue, we have done 1 (by understanding how L4
works) and 2 (redirectors, or a small kernel extension + corresponding
small change in the OS code), so we are fine.
> > String items can be supported, but one must be careful.
> > The idea is that you allow page faults on the server side to happen,
> > but not on the client side. The client must then lock down all
> > memory needed for string items to or from the server. This is easier
> > said than done, especially if you don't know up-front how long the
> > reply is going to be. There is a whole paper on this issue (by
> > someone else, in fact, the paper that made me start to worry about
> > this in detail), and I have written about it in length before. Look
> > at the xfer timeout parameter of the IPC system call for how to avoid
> > being blocked when page faults occur.
> If the server can not trust the client, how can it know about the memory
> it uses? I thought the server could only send a string item (or accept
> one), not know where it's coming from.
I don't understand your question. String items are copied from the
address space of the sender to the address space of the receiver at
IPC time. You specify the source of the string item in the sender via
StringItems and the destination buffers in the receiver via
In fact, I might have make a mistake in the above paragraph you
replied to. It may be indeed the case that you can not differentiate
between "local pagefaults" and "remote pagefaults" when specifying the
xfer timeout. Which is a pity. However, in some situations you can
compensate that by retrying the IPC (not in all, for example it would
not be possible in reply messages). In any case, the whole issue is a
bit messy, and I don't have all the details in my head.
This is an area where somebody could make himself useful by analysing
this carefully, and writing all possible scenarios and useful
applications down, picking a set of likely RPC candidates for string
items as test cases.
> > It's thus a bit complicated to use string items, but they are an
> > important optimization. You can't just always pass container handles
> > to mappable memory around, for example for every filename you want to
> > lookup. We have to decide how to use this on a case-by-case basis.
> I was expecting to use the untyped words for short messages (perhaps
> using some follow-up code to use slightly more space if needed by using
> several IPCs) and map containers for large messages. There are 64 MRs,
> of which at least 1, but likely 2 will be needed to encode the purpose
> of the transfer. That leaves (on a 32-bit system) 62*4=248 bytes per
> IPC for a short message. Things like filenames should usually fit in it.
I was at one time thinking the same. However, using many message
registers for this can actually be slower than using string items, and
the cut-off point is architecture specific (and even kernel and IDL
compiler specific). So, such decisions are usually better left to the
However, as noted above, string items are peculiar in their use, so I
think that it's not a good idea if the IDL compiler feels free to use
them instead of MRs whenever it wants.
In any case, it doesn't even matter: Certainly there are messages
longer than 200 and a couple bytes, and we _will_ have to deal with
them properly. Resistance is futile, we might just as well embrace
Using containers is _very_ expensive and may be even more complicated
to use properly than string items.
> Of course at some point using a container will be a better idea. There
> can be a library function which considers using IPC and a container, and
> just uses which is appropriate. The caller doesn't need to know how the
> transfer was done :-)
That's easier said than done! :) The whole question of how to wrap
RPCs in glibc properly and all issues around the IDL compiler is not
something I have worked much on yet, although I have given it some
thought. It's complicated!
> I understand that string items can speed things up, especially if the
> size is just a little more than 62 words. However, doing all the
> checking for correct string buffers probably slows the whole thing down
String items are usually faster for more than around 40 words on ia32
(IIRC). And, as I said above, you are underestimating the overhead
for containers, which is many more times than the overhead to deal
with string items.
To do string items properly is in fact not too complicated: You only
need to wire down the memory areas involved. That's just a call to
the internal pager library which just has to mark those pages as
currently not available for swap out. That's fast.
In fact, in some scenarios, you don't even need to do that, it's
probably sufficient to touch the pages before sending the message and
hope they are not swapped out immediately. If they are, you just do
it again. (And if that fails like three times, you can still try to
lock them down and try again, although you are probably in deep
problems if you are thrashing the swap so hard).
> Just one more idea: Allow transfers to be done using a third task, which
> holds a permanent container with the server.
A variant of this is described in the paper I mentioned in my mail (I
should dig out a proper reference for you). I think it's a model
considered by the EROS people. In their case it is a system service,
I think, so it has mutual trust with the server (which is the
important thing to have, so that you can allow page faults to happen
in transit). However, this approach corrupts our self-paged model.
It's really great you are thinking about all these issues at this
L4-hurd mailing list