Shabir has 12 yrs of exp in I.T using Asp, Asp.NET, Vb6/.NET/C#, MOSS/WMI/html/DOM, Ajax, XML,xsl.SharePoint
This article is my latest experience when i was developing search engine for one of my client.you might have seen lot of websites provide search capabilities, where you can simply type several words, press a "Search" button, and you'll receive a list of pages which contain these words. It's simple. But how can you implement these features in your own web application? Yes, you have to use an indexing service which will index your files or web pages. After that, you can use full text search features.
There are a lot of solutions which allow you to provide this functionality in your application. One of them is Microsoft Indexing Service. It's part of Windows 2000 and later Windows versions. So, if you only provide Windows solutions (ASP.NET web applications, Windows Forms applications, etc.), you have to take a look at this Microsoft product.
One of the biggest advantages of Indexing Service is that it's totally free. You can use it without any restrictions or additional licenses. I think that this is so big, because other indexing products cost a lot of money. If you are developing a small or medium sized applications, you don't want to pay thousands of dollars for a full text search tool.
If you choose to use the Indexing Service, you should remember that it can only index file systems. For example, you can't use it for indexing files stored in your database. This is a big minus of the Microsoft Indexing Service, but I believe that you can easily solve this limitation.
In this article, I'll try to describe how to install, configure, and use the Microsoft Indexing Service. We'll develop a simple application which will allow us to use full text search features for web pages located on our local file system.
If you are using Windows XP or later, you'll be using Microsoft Indexing Service 3.0. And, if you're still using Windows 2000, you'll be using Microsoft Indexing Service 2.0. This service is installed to your machine, by default. But, you could disable its installation when installing the Operating System. You have to specify that Indexing Service be installed on your machine. To do this, you go to "Add or Remove Programs" in your Control Panel. Choose "Add/Remove Windows Components" there. You have to check that "Indexing Service" is installed. If it isn't installed, install it.
Now, Microsoft Indexing Service has been installed, and you can configure it. Open the "Computer Management" configuration tool. Choose "Services and Applications", "Indexing Service". In this entry, you can manage your Microsoft Indexing Service.
First of all, you should create a new catalog in Indexing Service for the folder which will contain the indexes. Open the context menu for "Indexing Service" and choose "Catalog" in the "New" submenu. Type "Name", choose "Location", and press "OK".
After that, you have to add the folders which will be indexed. For this, choose the "Directories" entry, open its context menu, and choose "Directory" from the "New" submenu. Choose the folder with your documents in the opened dialog box, and press "OK" to include the selected directory to the index. If you decide to exclude the folder from the existing index, please choose "No" for the "Include in Index?" parameter in this dialog window. This parameter is "Yes", by default.
If your Indexing Service is started, it will index the new catalog. Otherwise, you should start Indexing Service and it will index the catalog automatically. You can create or recreate an index folder manually. To do this, you should open the context menu for the specified folder in the existing catalog and choose "Rescan (Full)" or "Rescan (Incremental)" in the "All Tasks" submenu. Of course, your Microsoft Indexing Service has to be started at this time.
If you choose the "Indexing Service" entry in your "Computer Management", you will see the state of the Indexing Service. Sometimes, this information can help you if you have a big storage and can't find the file there.
There is another important setting for Indexing Service – "Indexing Service Usage". This setting allows you to tell Indexing Service how often it should update the indexes. For example, if your application only uses static storage, the service need not update the index so often because if you use dynamic data storage, your data is updated very often. To configure this parameter, you should open the context menu for the "Indexing Service" entry and choose "Tune Performance" in the "All Tasks" submenu.
Now, you can check the index. To do this, choose "Query the Catalog" in your catalog. You'll see a form which allows you to search something in your index. First of all, you can test a simple full text search. Enter something in the query field and press the "Search" button. Now, you will be able to see the files which contain the entered words. Of course, you can execute more difficult queries using this tool. Choose "Advanced query" if you want to execute some complex queries. You can use Microsoft Indexing Service queries to get the required information. This query language is the same as SQL, but it contains some syntax extensions.
You can use SQL to query Microsoft Indexing Service. But, there are several extensions for Indexing Service's SQL dialect which you have to know about.
The most useful command, when you use the Microsoft Indexing Service, is the SELECT command. It's clear, because you shouldn't add, delete, or update information in your indexes. You use Select to query the Indexing Service to retrieve some information about indexed files. Let's see an example query:
SELECT
Select
SELECT Path FROM SCOPE() WHERE FREETEXT(Contents, 'Hello World')
This query returns you all paths to files which contain the "Hello World" text. And, it can help me describe to you Microsoft Indexing Service's SQL extensions.
First of all, let's look at the FROM expression. In this example, we query all the data which the index contains. The SCOPE() function allows you to tell the Indexing Service which data you have decided to examine. By default, if you don't use any parameters, it examines all the data in your index. This function can optimize your queries, because it can limit the indexes for search. For example, you can use SCOPE ('"/books"'). Here, you will query only the "/books" folder, not all the folders in your index. The query execution speed will be more than if you would use a simple SCOPE() function. For more search limitations, you can use special traversal types. For example, SCOPE ('DEEP TRAVERSAL OF "/books"'). If you use this expression, Indexing Service will search in the "/books" directory and in all the directories beneath it. If you use SHALLOW TRAVERSAL, Microsoft Indexing Service will examine only the "/books" directory. For example, SCOPE('SHALLOW TRAVERSAL OF "/books"').
FROM
SCOPE()
SCOPE ('"/books"')
SCOPE ('DEEP TRAVERSAL OF "/books"')
SHALLOW TRAVERSAL
SCOPE('SHALLOW TRAVERSAL OF "/books"')
The WHERE expression is the same as in SQL, but there are few extensions for it too. There are Comparison Predicates. You can see them in this table:
WHERE
WHERE DocAuthor = 'John Doe'
WHERE DocTitle != 'Finance'
WHERE WordCount < 1000
WHERE WordCount > 500
WHERE WordCount <= 500
WHERE WordCount >= 500
You also can use Boolean operators which are evaluated using the following rules:
There is a LIKE predicate too. But, there are several predicates which extend the SQL language:
LIKE
ARRAY
... WHERE username = SOME ARRAY ['Admin' , 'root']
CONTAINS
…WHERE CONTAINS(country,'"USA" OR "Russia"')
FREETEXT
…WHERE FREETEXT(Contents,'Hello World !!!')
MATHCES
… WHERE MATCHES (Contents, '|(USA|)|{1|}' )
For additional information, you have to go to the Indexing Service articles on the MSDN website.
Now you know how to prepare queries for the Microsoft Indexing Service, but you still need to take a list of properties which can be used in your queries. There are a lot of default properties for each index, which you can find in the following table.
A_HRef
DBTYPE_WSTR
DBTYPE_BYREF
HtmlHRef
Access
VT_FILETIME
All
AllocSize
DBTYPE_I8
Attrib
DBTYPE_UI4
ClassId
DBTYPE_GUID
Characterization
Contents
Create
Directory
DocAppName
DocAuthor
DocByteCount
DBTYPE_14
DocCategory
DBTYPE_STR
DocCharCount
DBTYPE_I4
DocComments
DocCompany
DocCreatedTm
DocEditTime
DocHiddenCount
DocKeywords
DocLastAuthor
DocLastPrinted
DocLastSavedTm
DocLineCount
DocManager
DocNoteCount
DocPageCount
DocParaCount
DocPartTitles
DBTYPE_VECTOR
DocPresentationTarget
DocRevNumber
DocSlideCount
DocSubject
DocTemplate
DocTitle
DocWordCount
FileIndex
FileName
HitCount
HtmlHeading1
HtmlHeading2
HtmlHeading3
HtmlHeading4
HtmlHeading5
HtmlHeading6
Img_Alt
<IMG>
Path
Rank
RankVector
ShortFileName
Size
USN
VPath
WorkId
Write
As you can see, there are a lot of indexed properties for each file, but sometimes, you want to extend this list.
First of all, this feature works only for web pages, because it is based on the HTML <meta> tag.
<meta>
Let's say, you have several indexed web pages and you want to add several special properties for them. For example, if you want to add "country" and "city" properties, you should add <meta> tags to all files which will contain these new properties:
<meta name="country" content="Russia" /> <meta name="city" content="Moscow" />
After these changes, you have to restart Indexing Service. Now, you can open the entry "Properties" and see that Microsoft Indexing Service knows about your special parameters for files. But still, you can't use these new parameters in your queries.
Select the "Properties" node of your catalog and choose the property which you added to the files using the <meta> tag. Double click on the property, switch on the "Cached" checkbox, and choose the data type for the new property from the opened dialog box.
After that, you should create a Column Definition File which contains information about your newly added parameters. The File could have an ".idq" extension, but this isn't important. A Column Definition File uses the following format:
[Names] Propertyname( Data type ) = GUID ["Name" | Property ID]
The data type parameter is optional. If you don't define it, Microsoft Indexing Service will take the data type from the parameters definition for your catalog.
For my example, it contains this:
[Names] country = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 "country" city = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 "city"
All these data can be taken from the dialog box for the properties configuration.
After the Columns Definition File is created, information about this file has to be added to the Indexing Service Registry settings. Add a string entry named "DefaultColumnFile" to the Registry key "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndexCommon". "DefaultColumnFile" should contain the full path to your Columns Definition File.
Restart Microsoft Indexing Service. After that, run a full rescan of your indexed folder. Now, you will be able to use the new parameters in your queries.
Microsoft Indexing Service exposes itself to the developer as an OLE DB provider. Its name is MSIDXS. You can use ADO.NET for querying your Indexing Service. To do this, you have to create a new System.Data.OleDb.OleDbConnection object using this sample connection string:
System.Data.OleDb.OleDbConnection
Provider= "MSIDXS";Data Source="Documents"
In the Data Source parameter, you should use the name of your catalog in Indexing Service.
Let's create a sample code which will query Indexing Service for a few words from the file contents. In this sample, there is a queryString variable. It is an instance of the SearchParameters structure. This structure contains information about the data source and the query string. Here is the definition of this structure:
queryString
SearchParameters
struct SearchParameters { private string storage; public string Storage { get { return storage; } set { storage = value; } } private string query; public string Query { get { return query; } set { query = value; } } }
First of all, you create a new OleDbConnection object:
OleDbConnection
string connectionString = string.Format("Provider= \"MSIDXS\";Data Source=\"{0}\";", queryString.Storage); OleDbConnection connection = new OleDbConnection(connectionString);
After that, you have to create a new OleDbCommand associated with this connection:
OleDbCommand
string query = string.Format(@"SELECT Path FROM scope() " + @"WHERE FREETEXT(Contents, '{0}')", queryString.Query); OleDbCommand command = new OleDbCommand(query, connection);
Note that the MSIDXS provider doesn't support commands with parameters. This is bad. I hope that Microsoft will fix this issue in the next version of the Microsoft Indexing Service.
You are now able to execute this command and retrieve a list of files which contain the selected text:
connection.Open(); ArrayList result = new ArrayList(); OleDbDataReader reader = command.ExecuteReader(); while (reader.Read()) { result.Add(reader.GetString(0)); } connection.Close();
In this code, checking the returned value for NULL is not necessary, because Indexing Service always returns a path to a found file.
NULL
Microsoft Indexing Service is a totally free and powerful product which is included with Windows 2000 or later versions. It's very simple to use. You can easily create indexes. You can also query these indexes using an OLEDB /odbc data provider. If you are working with Microsoft .NET, it is really easy to use. In this article, I have tried to describe how to install, configure, and query the Microsoft Indexing Service.I believe that this article will help you to start using Indexing Service effectively.