SQL Server 2008 Full Text Indexing or Lucene.Net?
I’m at the point where I need to choose how I’m going to implement the search functionality for our project. My first impulse was to use the Full Text Indexing (FTI) built into SQL Server (2005 when we first discussed the project). I’ve seen other projects similar to ours use it, but I haven’t really heard much about the pros/cons in a production environment other than “we’re using it”. I’ve read all about the improvements in SQL Server 2008 and it sounds good. We were on the fence with requiring SQL Server 2008 over 2005, but I think 2008 is the right way to go.
Researching recommendations and pitfalls of FTI in 2008 consistently keeps pointing to the StackOverflow.com Blog where Jeff Atwood had discussed a problem they ran into with FTI 2008. They got Microsoft involved, and it turned out to be a minor bug that was also fixable by changing the structure of the query. Filtering out all the re-posts about this incident, it looks like there aren’t a lot of articles beyond the “this is how to turn it on” tutorials. Either people aren’t using FTI in 2008, or they just aren’t writing a lot about it. In the end, it sounds like SQL Server 2008 should be fully functional and scalable for our needs.
The other option that gets a lot of praise is Lucene.Net. Like many people, I was unsure about Lucene.Net’s production-readiness while it was in Apache’s Incubator status. Some searching shows that it’s in use in many production environments much larger than my project will ever get to. I also ran across some good explanations about how the Lucene.Net project generally is more stable that the native Java version due to the delay in porting to over to C#. It makes sense to me. They are porting the last released version, not the daily build. You might not have every new feature the Java folks are enjoying, but you get the benefit of some testing before porting. I get the impression that the API itself is good stuff. It’s just up to you to screw it up.
We’re going to be using Oracle’s Outside-In Search Export API to get the text rendering of documents. That removes the pain of trying to find iFilters for document types we might want to search through on the SQL Server 2008 side, and writing my own text extraction app on the Lucene.Net side. From here it really boils down to the amount of work it is to get things running.
For now, we’re going to give SQL Server 2008 FTI a shot. I already know the basics of FTI in SQL Server so it shouldn’t be much learning to get up and running. It is comforting to know that we have Lucene.Net ready as a replacement if we need it. Maybe we’ll include it as a configurable option in later versions.