Wednesday, October 1, 2008

SQL Server 2008 Full Text Search Weirdness

Sorry to the non-tech readers but I just had to post this one.

After hearing Greg Low's session at Tech.Ed on Full-Text Searches in SQL Server 2008 I thought I'd upgrade from 2005 to take advantages of the new features. The biggest difference between 2008 and 2005's FTS is that the indexing etc. is now done in the database instead of palming it off to the operating system (2005 used Windows Search for FTS).

So after a switch to SQL Server 2008 CTP on new hardware running Windows Server 2008 I found that one of my searches that ran well under 1 second in 2005 was taking over 10 seconds in 2008. As you may have guessed I was a little flabergasted.

Basically the query is searching for cars (models) that have the search string in their name or the search string is in the car's manufacturer name. Using "lambo" as a test it should return all the Lamborghinis in the database.

Original Inner Join Query

SELECT * FROM Model WITH (NOLOCK)

INNER JOIN Manufacturer WITH (NOLOCK) ON Model.ManufacturerId = Manufacturer.ManufacturerId

WHERE CONTAINS(Model.*, '"*lambo*"') OR CONTAINS(Manufacturer.*, '"*lambo*"')

ORDER BY Model.StartOfProduction DESC, Model.ModelName


Now before you get carried away with those "SELECT *"s in there, this is early days and I'm using LINQ to SQL.

Anyway, my first thought was that the FTS indexes were out of whack after coming across from 2005. I remembered Greg saying something about 2008 not rebuilding the indexes on import unless you explicitly told it to. I set the server to rebuild indexes on import and reimported the database. The query was down to a scorching 5 seconds or so. Still far from an improvement over the >1 seconds on 2005.

A few emails to and fro with Greg, an upgrade to the hardware (more RAM) and to SQL Server 2008 RTM and the very same query was now taking over 20 seconds! And it was in this darkest hour that I stepped back and looked at the problem from afar and realised I should just stop bashing my head against the wall.

What I needed from the query was a list of cars that either have a matching name or a matching manufacturer name. So instead of using an inner join with a Contains() I used Union which allows you to take the results from multiple queries and output them as one table.

New Union Query

SELECT ModelId ... (explicit list of columns)

FROM Model WITH (NOLOCK)

WHERE CONTAINS(Model.*, '"*lambo*"')

UNION

SELECT ModelId ... (explicit list of columns again)

FROM Model WITH (NOLOCK)

INNER JOIN Manufacturer WITH (NOLOCK) ON Model.ManufacturerId = Manufacturer.ManufacturerId

WHERE CONTAINS(Manufacturer.*, '"*lambo*"')

ORDER BY Model.StartOfProduction DESC, Model.ModelName


It's a little more complex than the original query but now it's back to running at under 1 second. *phew*

Big thanks to Greg for putting up with my nattering and for passing this on to the SQL Server product group to see if they can figure out what's going on with the inner join query.

THE END

No comments: