The opensource Apache Lucene Projects allows developers to create powerfull search solutions.
Here are some insight for planning and carring out a standard search solution.
The indexing in Lucene is based on creating Documents based on Field. Search engines generaly relay on an inverted file data structre to store the index. This reduces the size of the index and speeds up searches.
However lucene is flexible in that it gives the indexer serveral choices.
There is a some advantage in creating a single full text field.
Adding multiple fields can make the index look more like a structed data file. If information beyond pure text exists in the document then by placing it into fields can enable more advanced search solutions.
In every language some words appeat more frequently than others. Examples in English are the words And, Or, & The. A list of such words is called a stop word list. Removing the stop words has the advantage of reducing index size & speeding search. A more subtle advantage is the improvement in acuracy - this can be understood when one considers stop words as noise within the document's information. of the For standard IR practice they offer few advantages and are therefore stripperd at indexing time and are also removed from user queries.
Lucene allows as well as searcare more common than Some words like "The" Major search engines exclude StopCommon Practice