If I use the following call in C++, I would expect the WorkingSet of the process to never drop below 100MB.
However, the OS still trims the working set back to 16MB, even if I make this call.
Setting WorkingSet to 100MB would lead to a dramatic increase in my application speed, by eliminating soft page page faults (see the diagram below).
What am I doing wrong?
SIZE_T workingSetSizeMB = 100;
int errorCode = SetProcessWorkingSetSizeEx(
GetCurrentProcess(),
(workingSetSizeMB - 1) * 1024 * 1024), // dwMinimumWorkingSetSize
workingSetSizeMB * 1024 * 1024, // dwMaximumWorkingSetSize,
QUOTA_LIMITS_HARDWS_MIN_ENABLE | QUOTA_LIMITS_HARDWS_MAX_DISABLE
);
// errorCode returns 1, so the call worked.
(extra for experts) Experimental Methodology
I wrote a test C++ project to allocate 100MB of data to bring the WorkingSet over 100MB (as viewed within Process Explorer), then deallocated that memory. However, the OS trimed the WorkingSet back to 16MB as soon as I deallocated that memory. I can provide the test C++ project I used if you wish.
Why is Windows providing a call to SetProcessWorkingSetSizeEx() if it doesn't appear to work? I must be doing something wrong.
The diagram below shows the dramatic increase in the number of soft page faults (the red spikes) when the green line (the working set) dropped from 50MB to 30MB.
Update
In the end, we ended up ignoring the problem, as it didn't impact performance that much.
More importantly, SetProcessWorkingSetSizeEx does not control the current WorkingSet, and is not related in any way to soft page faults. All it does is prevent hard page faults, by preventing the current WorkingSet being paged out to the hard drive.
In other words, if one wants to reduce soft page faults, SetProcessWorkingSetSizeEx has absolutely no effect, as it refers to hard page faults.
There is a great writeup in "Windows via C/C++" (Richter) which how Windows deals with memory.
Page faults are cheap and are to be expected. Real-time applications, high-end games, high-intensity processing and BluRay playback all happily work at full-speed with page-faults. Page faults are not the reason your application is slow.
To find out why your application is slow, you need to do some application profiling of your application.
To specifically answer your question - the page faults that are occurring when you've just had a GC.Collect() aren't page-in faults, they're demand-zeroed page faults caused by the fact that the GC has just allocated a new huge block of demand-zeroed pages to move your objects to. Demand zero pages aren't serviced from your pagefile and incur no disk cost, but they are still page-faults, hence why they show on your graph.
As a general rule, Windows is better at managing your system resources than you are, and it's defaults are highly tuned for the average case of normal programs. It is quite clear from your example that you are using a garbage collector, and hence you've already offloaded the task of dealing with working sets and virtual memory and so on to the GC implementation. If SetProcessWorkingSetSize was a good API call to improve GC performance, the GC implementation would do it.
My advice to you is to profile your app. The main cause of slowdown in managed applications is writing bad managed code - not the GC slowing you down. Improve the big-O performance of your algorithms, offload expensive work through the use of things like Future and BackgroundWorker and try to avoid doing synchronous requests to the network - but above all, the key to getting your app fast is to profile it.