On Mon, Dec 05, 2005 at 11:22:44AM -0500, Jonathan S. Shapiro wrote:
> We're getting closer, but your answer leads me to think that we may have a
> terminology problem.
> When you dig down deep, all of the real-time scheduling mechanisms that I
> have seen are scheduling "hard" resource. The resource may be provisioned
> probabilistically, but whatever the details there is an assumption that the
> RT scheduler has some amount of hard guarantee that it is dividing.
That makes sense.
> It follows immediately that nothing else can be permitted to violate the
> *schedulers* guarantee. In practice, this tends to mean that *anything* that
> schedules hard resource commitments is real time (in the sense that it needs
> to talk to the RT scheduler to get the guarantee).
> When you talk about not swapping a process out, you are talking about
> pinning its pages. This is a hard guarantee, so indeed you are proposing a
> real-time situation.
No, I'm not. What I'm saying is that its priority is so high compared to the
others that the result is the pages will not be swapped out. However, there's
no guarantee at all: If the user lowers the priority, or sets the priority of
some other process to a comparable value (or much higher), then it will be
> > In my proposal, every process (or address space, really) has a physical
> > memory quota. It may be 0, which means it is fully swapped out. A
> > process cannot have more pages in memory than it has quota. When the
> > quota shrinks, there are immediately pages swapped out (if the process was
> > using its full quota). Because we don't want to ask the process which
> > pages that should be (and in particular, we don't want to wait for an
> > answer), the answer has to be prepared beforehand. That's what the list
> > is all about.
> You need to pick one system or the other. Either the process makes the
> decisions or the system does. If the list is provided in advance it needs to
> be provided **to the kernel**, which becomes very tricky in the face of
Well, it needs to be provided to whoever does the swapping. We'll assume that
is the kernel, although I think the space bank is really a better place.
However, I may completely misunderstand what a space bank is, see below for my
interpretation. I don't see why this is tricky at all.
> > Why must this be without user intervention? Because it'd be very annoying
> > for the user to turn the knobs all the time. What the user wants to do is
> > give the game high priority.
> Nonsense. What the user wants to do is say "make that one run better". The
> user has probably never heard of priority.
If the user doesn't know this is called "giving it higher priority", that
doesn't mean it's not what he wants.
> And in any case, this *is* user
> intervention. Now we are just arguing about the UI.
What I call user intervention is that the user actually has to do something.
I don't want the user to do something every time the physical memory quota
needs to change (which is on every big allocation, _and_ deallocation).
> The point I'm trying to make is that tuning knobs is actually a good
> metaphor. The user wants to say "make that one go faster", but the problem
> with this is that they don't really know why it is slow. Turning up the CPU
> allotment on a swap-bound process won't help and vice versa. Ideally, we
> don't even want to have to talk to the user about this; we want to just have
> the user's statement that a certain thing is important.
Right. This is why I said the physical memory priority and the CPU priority
may be the same variable. As Antrik wrote, a market-based approach is an
other option which aims at the same target.
> > ... Then the system should just give it the memory it wants when it asks
> > for it (at the cost of others, which have a lower priority).
> I think you are confusing two things here: the allotment vs. the allocation.
> The part that is tricky from the scheduling point of view is the allotment.
> That was done when the user changed the tuning knob (at least indirectly, in
> the sense that the user has told the long-term scheduler how to rebalance).
> The allocation is then done according to process demand.
I don't think I was confusing things. What you write here is exactly what I
thought. Which part of it did you think was not what I wrote?
> > > I understand that you want these things to be fast. That is not the same
> > > as wanting them to be real-time.
> > Indeed. I wasn't talking about real-time (although giving it all the
> > memory it asks for may hinder other applications which do want
> > real-time)...
> Please re-read your sentence. Do you now agree that you *were* talking about
> real time?
No. I was talking about having them done fast. If real-time is the way to do
that, then it should be real-time. But that is not essential. If it gets
done as fast without being real-time, that's fine too. And in particular, if
some other process received a real-time guarantee (from the user, implicitly),
then that guarantee should not be violated. This may mean that it's not even
possible to make the background thing real-time.
> > The difference is between ping-time and throughput on a network line. I
> > was talking about giving the process high throughput, not about giving it
> > a low ping-time.
> This is a good goal. There's about a billion papers on this, none
> conclusive. It's a deep research problem, and at this point it seems safe to
> speculate that there simply *isn't* a general solution.
> I think that our short term challenge is to come up with the right kernel
> mechanisms so that people can experiment successfully with scheduling.
> > I wasn't thinking kernel-level at all, I'd think this would all be in user
> > space (in physmem and some global pager). Is it just that those tasks
> > need to be in the kernel for you, or are we misunderstanding each other?
> > I'll assume the former for now, that you need physmem and the global pager
> > in the kernel. I'd be interested to know why though.
> In EROS/Coyotos, the eviction decisions are made in the kernel guided by
> application-defined policy. This is largely because of checkpoint. In
> practice, it doesn't seem to restrict the feasible policies.
That sounds ok then. On the other hand, this seems to be a part which can be
moved out of the kernel to user space now you no longer demand that the kernel
can handle any misbehaving application, even one in the TCB.
> The problem with your approach is that we must now ask what keeps the
> *pager* in core (because the whole point here was response time). Your
> answer will be "well, the pager's pager makes this guaranteee...", and it's
> "turtles all the way down". Somewhere, something in the kernel has to be in
> on the joke.
I see no problem is having no pager's pager at all. If the process sets a
limit on the number of pages in physical memory before it wants to run at all,
the global pager can make sure these pages are mapped into memory before the
scheduler gives it a new time slice. In effect this makes the global pager a
fallback pager for every process. This can be moved to some other process if
pagers having pagers is desirable. However, in the end there will be a pager
without a pager, and it must have its pages pinned. One way to do that is by
moving it into the kernel, but I don't think that's required.
> > First of all, note that there is a list of pages per process. So
> > "position 4" is meaningless to the process who puts it at position 10 in
> > its own list. The page will be stored in two lists, at its own position
> > for each.
> Yes, the problem is that the kernel is now going to have to ask two parties
> for an eviction policy, and (whichever one is chosen) it will pick the wrong
> one to ask...
There is no "wrong one". When the kernel wants to increase the physical
memory quota of a process (and there are no unused pages), it will have to
decrease someone else's quota. It can do this one page at a time, according
to their priorities, until enough pages are freed. That is, if the kernel
wants 1 page, and address space A is the first one to lose one, then A's quota
is lowered by one. This means A loses the last page from the list which was
still in memory. If this was a shared page, then it didn't actually free a
page. So the kernel will take another one, from whoever is the next to lose
one (possibly A again, possibly some other process).
> > Every space bank has a certain number of "active" pages. Those can be of
> > three types:
> > - currently in memory
> > - currently in swap
> > - currently nonexistant ("swapped-out" cache)
> > The third one exists because the process doesn't need to reallocate it
> > when it is "swapped out", it can just remap it, however there's no
> > guarantee about the contents.
> This has nothing whatsoever to do with space banks! Space banks allocate
> disk storage, and the space bank data structure is a disk data structure!
Ok, perhaps I didn't understand what a space bank is. Here's what I thought:
A space bank gives space to store things. Storage is either on disk, or in
memory, or both. Storage on disk is "swapped out". Storage in memory and on
disk is normal memory, with a reserved place to put it when it will be swapped
out. Storage only in memory is cache, which is lost when it would be swapped
What I'm proposing is to add a list to this type of space bank (which may
be a bit more than what you call space bank) so it knows in which order to
swap out the pages when the kernel asks for that. Because the space bank is
part of the TCB, it is no problem for the kernel to do a blocking IPC to the
space bank asking it for this information.
> I'll reply to the rest later -- an appointment just showed up.
Ok, thanks so far.
I encourage people to send encrypted e-mail (see http://www.gnupg.org).
If you have problems reading my e-mail, use a better reader.
Please send the central message of e-mails as plain text
in the message body, not as HTML and definitely not as MS Word.
Please do not use the MS Word format for attachments either.
For more information, see http://220.127.116.11/e-mail.html
L4-hurd mailing list