another technical blog...technically

Tuesday, July 1, 2014

Is FAST query a data retrieval silver bullet?

Yesterday, a customer asked me pros and cons of Search in SharePoint 2013 and how it can help with FASTer queries.
I already used it on another project and i was able to make some cross site collections and cross web applications queries using just one statement, so i told him about the benefits this can imply.
Then, the customer asked me also: "And what about cross lists queries? It's better to use CAML or FAST?"
"Ehm... uhm.. sorry i have another call on hold, i'll call you back later" was the answer.
/jokes mode off
Everybody knows there are many ways to query contents across multiple SharePoint lists and, that's clear, writing code is a "depends on" work, but i started asking myself if FAST query could rule them all and be the best query method also for a simple task like cross list query (which is the CAML query dark reign).
I compared pure LINQ queries, SPSiteDataQuery, CrossListQueryInfo and KeywordQuery on the same battlefield (cross lists query), so expects some strong assumptions (borrowed from the project business case i'm following right now):
  • 1 Content Type containing 10 columns;
  • 3 List instances containing the same number of items (which are randomly generated) and the same content type;
  • Query will select all content type fields;
  • Usage of SharePoint like a DB (wrong usage, but some customers seems to love this scenario anyway);
  • Whenever it's possible, use just one query to extract data using date range;
I compared queries execution times populating lists with 2.000, 5.000, 10.000, 20.000, 50.000, 100.000, 500.000, 1.000.000 random items and i queryed just 1 list, then all 3 lists.

If you want to test on your own, as usual, download the last source code and deploy example 2.
Please note you have to follow steps below to enable your farm to support query on many items per lists:
1.  Allow object model override in web application's resource throttling options (Central administration > Manage web applications > <Your web application> > General settings > Resource throttling)
This will avoid CAML query to explode

2.  Using powershell, raise search service application max row limit.
$searchApp = Get-SPEnterpriseSearchServiceApplication
$searchApp.MaxRowLimit = 10000
$searchApp.Update()
iisreset
This will avoid KeywordQuery to extract less results than expected.

After that you can use my web part to populate list instances, launch crawler (you have to launch crawlers everytime you add new items if you haven't enabled continous crawling).
Please launch full crawl from the web part UI  the first time you add items on lists since it's necessary to install also managed property on search service application.

This is your example 02 dashboard, really fine uh? For sure, i made it.
Yes, i know you are thinking a script or a console application would be the best option to make massive import process, but i preferred to use a webpart in order to create a dummy-proof test dashboard for this example.
And now, last but not least, the numbers:

Query on single list
List items Retrieved items LINQ SPSite DataQuery CrossList QueryInfo Keyword Query
2.000 60 44ms 31ms 1ms 236ms
5.000 140 77ms 68ms 2ms 415ms
10.000 270 144ms 117ms 6ms 749ms
20.000 550 283ms 247ms 18ms 1436ms
50.000 1.370 708ms 322ms 32ms 5220ms
100.000 2.740 1387ms 747ms 70ms 10579ms
500.000 N/A N/A N/A N/A N/A
1.000.000 N/A N/A N/A N/A N/A

 

Query on 3 lists 
Items per list Retrieved items LINQ SPSite DataQuery CrossList QueryInfo Keyword Query
2.000 180 115ms 84ms 3ms 375ms
5.000 420 225ms 206ms 8ms 1123ms
10.000 810 438ms 375ms 16ms 1736ms
20.000 1.650 824ms 776ms 47ms 4084ms
50.000 4.110 1953ms 941ms 103ms 15042ms
100.000 8.220 3980ms 2635ms 198ms 29487ms
500.000 N/A N/A N/A N/A N/A
1.000.000 N/A N/A N/A N/A N/A



Results
I haven't done any tests for 500.000 and 1.000.000 items per list, this is due to the KeywordQuery boundary of 10.000 max results rows and the execution time that must be under 60 seconds per query.
The data i'm showing you are the average queries execution times, i excluded the first query because it represents the warm up (and slowest) execution time.
Well, despite my hopes, KeywordQuery doesn't seem to be the silver bullet i was waiting for. This doesn't mean it isn't a good technology, on the contrary i think it comes with a lot of advantages, but, simply, in some old-fashioned scenarios, could be better to use the old-fashioned CrossListQueryInfo way.
Download the source code
written in: Milano, Italia

0 commenti:

Post a Comment

Because of a lot of SPAM about courses, I need to moderate all comments here.
I ensure you that I will answer whenever possible (if you are not a spammer).

Me, myself and I

My Photo
I'm just another IT guy sharing his knowledge with all of you out there.
Wanna know more?