another technical blog...technically

Monday, April 25, 2016

Howto search for memory leak and save your marriage

Someone said (on friday afternoon): "your application got a memory leak and it's your fault".
Because of my wife became very nervous when i gotta work during the weekend i asked some help in order to understand how to work this problem as fast as possible, so let's start with credits to Iron Man and Mr. Pumpkin (only nicknames here: privacy is privacy).
This is what i learned from pro-people which helped me to do a drill-down analysis.
Pre-conditions:
  1. You know farm sometimes goes down
  2. You must have front end IIS logs
  3. You know there are lots of objects in memory (as i can see in w3wp.dmp)
  4. You need ULS logs

The bug-hunting process deals with:
  1. Find the use case that reproduce the memory issue
  2. Replicate the case in your dev envinronment
  3. Analyze the issue
  4. Conclusions

Step 1 - Find the use case that reproduce the memory issue
To accomplish the first task, i have taken the IIS logs in order to find out "who was doing what" on the web application. IIS logs are too long to use a notepad, so Mr. Pumpkin said: "Use lizard, download it from here http://www.lizard-labs.com/log_parser_lizard.aspx"
Lizard helps you to make query with SQL on log files, IIS logs contain lots of rows with those informations:
  • date
  • time
  • s-sitename
  • s-computername
  • s-ip
  • cs-method
  • cs-uri-stem 
  • cs-uri-query 
  • s-port 
  • cs-username 
  • c-ip cs-version 
  • cs(User-Agent) 
  • cs(Cookie) 
  • cs(Referer) 
  • cs-host 
  • sc-status 
  • sc-substatus 
  • sc-win32-status 
  • sc-bytes 
  • cs-bytes 
  • time-taken
etc But as Mr. Pumpkin said, please be aware IIS log are in UTC, so take care of using this query (maybe restricting where conditions on a particular time-slot near to the memory issue)
SELECT
   TO_LOCALTIME(TO_TIMESTAMP(date, time))
   ,date
   ,time
   ,s-sitename
   ,s-computername
   ,s-ip
   ,cs-method
   ,cs-uri-stem
   ,cs-uri-query
   ,s-port
   ,cs-username
   ,c-ip
   ,cs-version
   ,cs(User-Agent)
   ,cs(Cookie)
   ,cs(Referer)
   ,cs-host
   ,sc-status
   ,sc-substatus
   ,sc-win32-status
   ,sc-bytes
   ,cs-bytes
   ,time-taken
FROM 'C:\Users\Varro\Downloads\iis_frontend47.log'
in this way, column TO_LOCALTIME(TO_TIMESTAMP(date, time)) will give you the localtime. Now it's search time, filter on whatever you want. In my case, i take a look around sc-status (request status code) and time-taken in order to find out the most time consuming call, and i found some interesting data which helped me to replicate the behaviour users had before memory allocation went outta control.

Step 2 - Replicate the case in your dev envinronment
This is one of the most interesting part of the job, now i have the users behaviour i did the same actions on my system, attaching the memory profiler on Visual Studio, below the steps Iron Man explained to me:

1. Create a new performance session
 

2.  Change properties of performance session and choose performance counters

  • NET CLR Memory/# Bytes in all Heaps
  • .NET CLR Memory/Large Object Heap Size
  • .NET CLR Memory/Gen 2 heap size
  • .NET CLR Memory/Gen 1 heap size
  • .NET CLR Memory/Gen 0 heap size
  • .NET CLR Memory/Induced GC
  • Process/Private Bytes
  • Process/Virtual Bytes

3. Attach performance session  to SP process


4. Execute use case and monitor task manager, when you see memory going up, create some memory dumps of the process


5. When you ends the use case, stop everything and watch results ;)

Pay attention to this, you could need to run this command from che Visual Studio command prompt in order to make the profiler work
   vsperfclrenv /globalsampleon
   iisreset
Step 3 - Analyze the issue
What i discovered is that, effectively, there was a high memory consumption, so i tried to discover who could be the guilty method.
I can do this using memory profiler report and debug2diag (https://www.microsoft.com/en-us/download/details.aspx?id=49924) in order to process DMP files.

Memory DMP report
Memory profiler report
Iron Man also explained that i need to replicate the same pattern of memory allocation (as i could see on the customer w3wp.dmp) in order to have a smoking gun. So, using also the debugger to freeze the code in some interesting points in order to make some specific memory dump from task mager.

Step 4 - Conclusions
This pretty complex analysis helped me out to discover our application is quite slow in some particular circumstances, but even if some operations stressed so much memory allocation and garbage collector (some method like SubMethod1 could be more efficient), it's also true memory is released after those operation, so, if farm fall down when users do this operation... maybe it's time to give your farm just a little bit RAM.

That's all folks.

Me, myself and I

My Photo
I'm just another IT guy sharing his knowledge with all of you out there.
Wanna know more?