I had to learn a bit about the GC in .NET for my current project. I’m going to be processing chunks of data that are various sizes ranging from a few KB to a few hundred MB. I’m thinking that I want to keep it in memory because I need to parse it and that would be the fastest way. I could write it to disk and read it via a FileStream, but I’m hoping to avoid the delay of the disk write until I’ve parsed out the chunks I’m interested in saving.

Keeping them in memory concerns me because of the possible performance hit I would get when creating a bunch of objects that are over the 85K threshold which qualifies them for storage on the large object heap. We’re limiting the code to only run on x64 which gives us some breathing room with the amount of memory we can access which should help. I’m not sure how the pause that will occur when the GC does it’s full Gen-2 collection, of possibly a few gig of memory, will effect us. How long will it take? Will it really matter when I know? A one second pause might not be too bad, but a 5 second pause every 5 minutes might be a waste of time, especially if there is a better design. I’m sure this falls under pre-mature optimization and I know the best way to get a handle on this is to build up a test app to see it for myself. I’m still early in the design, so changing it isn’t a huge deal and I think I have a possible workaround.

Currently this is being written in C#. I’ve been asking myself is if C# isn’t the best tool for this job. If I just manage the memory myself in C++, then the delay of freeing up memory is done right away in a manner that I control. Another option I’ve looked into a little bit is Unsafe Mode for C#. That would give me the benefits of being able to manage some of the memory myself and not running into the Gen-2 clean up. I just don’t know what the negatives to Unsafe mode are other than the full-trust requirement, which doesn’t really affect us.

I guess I really just need to test this out.