Currently, zero-copy on MP is disabled due to expensive TLB flushes and IPIs
on modern SMP systems (at least, x86). The following patch is the ephemeral
mapping (emap) implementation, which should address the performance issues.
For now, only x86 utilizes these improvements (pmap changes are required).
Concept is based on the idea that activity of other threads will perform
the TLB flush for the processes using emap as a side effect. To track that,
global and per-CPU generation numbers are used. This idea was suggested by
Andrew Doran; various improvements to it by me.
Graph illustrates performance improvement using pipe with two threads bound
to different CPUs. With few pages, it is better than regular copy, although
increase is not very significant due to the following reasons:
- The test application triggers the worst case scenario, when nobody performs
TLB flush due to inactivity of other threads. In real-world workload, it is
more likely that somebody will perform the TLB flush.
- Pipe uses zero-copy on write side, but read side still performs a copy. It
is expected to be better with TCP socket, but to enable that, networking
code requires additional changes.
I would like to get this into the tree. Comments?
* Additional UVM locking fixes are needed, which I hope to finish soon.