another technical blog...technically

Showing posts with label SP2010. Show all posts
Showing posts with label SP2010. Show all posts

Monday, April 25, 2016

Howto search for memory leak and save your marriage

Someone said (on friday afternoon): "your application got a memory leak and it's your fault".
Because of my wife became very nervous when i gotta work during the weekend i asked some help in order to understand how to work this problem as fast as possible, so let's start with credits to Iron Man and Mr. Pumpkin (only nicknames here: privacy is privacy).
This is what i learned from pro-people which helped me to do a drill-down analysis.
Pre-conditions:
  1. You know farm sometimes goes down
  2. You must have front end IIS logs
  3. You know there are lots of objects in memory (as i can see in w3wp.dmp)
  4. You need ULS logs

The bug-hunting process deals with:
  1. Find the use case that reproduce the memory issue
  2. Replicate the case in your dev envinronment
  3. Analyze the issue
  4. Conclusions

Step 1 - Find the use case that reproduce the memory issue
To accomplish the first task, i have taken the IIS logs in order to find out "who was doing what" on the web application. IIS logs are too long to use a notepad, so Mr. Pumpkin said: "Use lizard, download it from here http://www.lizard-labs.com/log_parser_lizard.aspx"
Lizard helps you to make query with SQL on log files, IIS logs contain lots of rows with those informations:
  • date
  • time
  • s-sitename
  • s-computername
  • s-ip
  • cs-method
  • cs-uri-stem 
  • cs-uri-query 
  • s-port 
  • cs-username 
  • c-ip cs-version 
  • cs(User-Agent) 
  • cs(Cookie) 
  • cs(Referer) 
  • cs-host 
  • sc-status 
  • sc-substatus 
  • sc-win32-status 
  • sc-bytes 
  • cs-bytes 
  • time-taken
etc But as Mr. Pumpkin said, please be aware IIS log are in UTC, so take care of using this query (maybe restricting where conditions on a particular time-slot near to the memory issue)
SELECT
   TO_LOCALTIME(TO_TIMESTAMP(date, time))
   ,date
   ,time
   ,s-sitename
   ,s-computername
   ,s-ip
   ,cs-method
   ,cs-uri-stem
   ,cs-uri-query
   ,s-port
   ,cs-username
   ,c-ip
   ,cs-version
   ,cs(User-Agent)
   ,cs(Cookie)
   ,cs(Referer)
   ,cs-host
   ,sc-status
   ,sc-substatus
   ,sc-win32-status
   ,sc-bytes
   ,cs-bytes
   ,time-taken
FROM 'C:\Users\Varro\Downloads\iis_frontend47.log'
in this way, column TO_LOCALTIME(TO_TIMESTAMP(date, time)) will give you the localtime. Now it's search time, filter on whatever you want. In my case, i take a look around sc-status (request status code) and time-taken in order to find out the most time consuming call, and i found some interesting data which helped me to replicate the behaviour users had before memory allocation went outta control.

Step 2 - Replicate the case in your dev envinronment
This is one of the most interesting part of the job, now i have the users behaviour i did the same actions on my system, attaching the memory profiler on Visual Studio, below the steps Iron Man explained to me:

1. Create a new performance session
 

2.  Change properties of performance session and choose performance counters

  • NET CLR Memory/# Bytes in all Heaps
  • .NET CLR Memory/Large Object Heap Size
  • .NET CLR Memory/Gen 2 heap size
  • .NET CLR Memory/Gen 1 heap size
  • .NET CLR Memory/Gen 0 heap size
  • .NET CLR Memory/Induced GC
  • Process/Private Bytes
  • Process/Virtual Bytes

3. Attach performance session  to SP process


4. Execute use case and monitor task manager, when you see memory going up, create some memory dumps of the process


5. When you ends the use case, stop everything and watch results ;)

Pay attention to this, you could need to run this command from che Visual Studio command prompt in order to make the profiler work
   vsperfclrenv /globalsampleon
   iisreset
Step 3 - Analyze the issue
What i discovered is that, effectively, there was a high memory consumption, so i tried to discover who could be the guilty method.
I can do this using memory profiler report and debug2diag (https://www.microsoft.com/en-us/download/details.aspx?id=49924) in order to process DMP files.

Memory DMP report
Memory profiler report
Iron Man also explained that i need to replicate the same pattern of memory allocation (as i could see on the customer w3wp.dmp) in order to have a smoking gun. So, using also the debugger to freeze the code in some interesting points in order to make some specific memory dump from task mager.

Step 4 - Conclusions
This pretty complex analysis helped me out to discover our application is quite slow in some particular circumstances, but even if some operations stressed so much memory allocation and garbage collector (some method like SubMethod1 could be more efficient), it's also true memory is released after those operation, so, if farm fall down when users do this operation... maybe it's time to give your farm just a little bit RAM.

That's all folks.

Wednesday, July 29, 2015

Deleting a host named site collection... for real

A simple problem: you have to delete a host named site collection in order to recreate it.
When you try to recreate it, you get an error: SharePoint says the site collection already exists... if you restart the machine you'll be able to recreate it.
What if you cannot restart the machine whenever you want (read production environment).
$variable = Get-SPSite "http://already.deleted.site"
$variable
You'll get an output like this


Now recreate the site collection without problems ;)

Monday, September 8, 2014

OnTaskCreated and DelayActivity epic fail: a inconsistent farm story

Some weeks ago a SP2010 workflow was driving me mad. I got a parallel activity and i need to wait the OnTaskCreated event on every task i was creating.
Workflow started having some strange behaviours:
  • Workflow suspended states;
  • OnTaskCreated never firing on created tasks;
I was trying to figure out what to do, and i start thinking about a porkaround, why don't simulate tha bugged OnTaskCreated with a while and a DelayActivity?
A while condition checks if task id has a value, if not, wait 5 seconds and re-check until condition is met.
Why not? It's not so elegant but it works... don't laugh please, i almost save a workflow project with this workaround.
In Italy we say: "Misfortunes always come in twos"
Delay activity seemed not to work because it put the workflow to sleep... forever. This leads me to think there were problems on customer server... and so it was (http://support.microsoft.com/kb/2553031/en-us) , in other words: update to the most recent cumulative update and cross your fingers.
In my case, i continued to have issues, so i have to develop a parachute solution to make my PM happy. When we switched to pre-production environment everything works without parachutes.
What can i say? Two subsequential MS bugs in a row... Murphy's law rulez.
P.S.:
Who laughed reading this workaround?
Read this: http://support.microsoft.com/kb/970548
No more laughs ah?

Me, myself and I

My Photo
I'm just another IT guy sharing his knowledge with all of you out there.
Wanna know more?