In a normal day of developer life, I was hunting a performance issue and memory leak. It sounds mysterious, but, just another bug, another issue to solve, after all.
When come to the performance/memory issue, one should go for PerfView. The tool gives a very detail picture of what was going on in the memory at a reasonable level that a developer can understand.
The system is a WCF service which works base on the DataContract. From the profiler, I found out that if a returned value has 10MB in size, it will cost the OS of 50MB, proximately of 3 times extra cost. That does not count the memory consumed by the WCF framework to serialize the contract.
Note that I do not judge the architecture good or bad. There were good reasons what it was designed that way.
A very simplified version looks
With that simple code setup, a simple console app that consumes the service
Here is the result of downloading a file of 74MB. The total memory consumed in the heap is 146MB.
Where are those extra cost coming from? The extra cost comes from BinaryDataContractSerializer.Serialize method.
- The memory consumed by the DataContractSerializer.
- And the memory consumed by the MemoryStream to return an array of bytes.
In many cases, with modern hardware, it is not a big problem. There is Garbage Collector taking care of reclaiming the memory. And if both request and response are small, you do not even notice. Well, of course, unless one day in the production, there are many requests.
There are a couple of potential issues about consuming more memory
- If the size is more than 85K (85000 bytes), it is stored in Gen 2 eventually. I would suggest you read more about memory allocation, especially Large Object Heap (LOH). I am so amateur to explain it.
- Cause memory fragmentation. Memory keeps increasing. GC has a very hard time to reclaim them.
- Of course the system is not in a good shape.
How could we solve the problem without changing the design, with less impact?
We know that some operation will consume lots of memory, such as downloading a file, returning a data set. Instead of returning the byte array, we extend the response to carry the object. We could do that for all operations and get rid of the byte array. However, there are hundreds of operations. And we want to keep the contract simple and with less changes as much as possible.
So an improved version looks like
Run the application and see the memory again
Comparing the two, there is a big win: 2784 objects vs 366 objects; 146MB vs 73MB.
With the increasing power of hardware, RAM and Disk are not problems anymore. With the support from the managed language (such as C#/DotNet), developers code without caring too much about memory, memory allocation. I am not saying all developers. However, I believe there are many that do not care much about that issue.
It is about time to care every single line of code we write, shall we? We do not have to learn and understand every detail about the topics. These are good enough to start
- Memory allocation in Heap, Gen 0, Gen 1, and Gen 2.
- Memory fragmentation. Just like disk fragmentation.
- Memory profiler at abstract level, such as using dotMemory, PerfView.
- Garbage Collector. Just have a feel of it is a good start.
I am sure you will be surprised with how fun, how far it takes you.