The search module configuration file (cnsearch.conf by default) should be stored in the same directory with the file 'search.exe'(search.cgi for Unix). It is a text file specially optimized for fast processing. Cnsearch.conf consists of two parts:
The structure of the configuration file looks as follows:
::CONFIG regcode = Enter Oner registration code here ::CONFIG stats = password ::CONFIG content-type = text/html ::HTMLTOP <HTML> <TITLE>This is the top part of the HTML document</TITLE> </HEAD> <BODY> ::HTMLRESULT <P>This the description of the found page. There will be displayed 10 such descriptions. ::HTMLNOTFOUND <P>This text will be displayed if no search results will be found ::HTMLBOTTOM This is the bottom part of the HTML document </BODY> </HTML>
One may use single-line commentaries in the configuration file. Each commentary starts with the symbol "#".
Configuring part of cnsearch.conf contains the following parameters:
The parameter sets path to the search index. It can be used if you do not intend to store the search index in 'cgi-bin' directory or if you plan to use several search indexes.
For example:
::CONFIG path=/home/www/search/en/
For MS Windows:
::CONFIG path=d:\www\search\en\
The parameter defines Content-type field of the header. Default value is "text/html". Search results can be generated into XML-file as well.
For example:
::CONFIG content-type = text/xml
The parameter sets search logic:
"And" logic is the fastest and is recommended in case the search index size exceeds 100Mb.
"Combined" logic is recommended for usage at small sites with the total of less than 50 pages.
For example:
::CONFIG SearchType = Combined
The parameter sets password for access to the statistics interface (see Statistics).
For example:
::CONFIG stats = secret
The parameter sets the product registration code (see the detailed information at the official site).
For example:
::CONFIG regcode = JF7KF-KFJEP-4KSFT-K49GN-FJ40F
The parameter specifies a term denoting stop-words displayed in search results (provided that %P parameter is enabled (found stop-words).
For example:
::CONFIG StopWords =, Ignored Words :
The parameter sets maximum relevance of pages displayed at search results. The pages with relevancy value greater than MaxRelevance are ignored. This parameter allows improving search quality by means of "throwing out" pages with suspiciously high relevancy. As a rule, these pages do not contain a large amount of text or contain keywords, which are repeated too often.
For example:
::CONFIG MaxRelevance = 4000
The parameter specifies a term denoting match of the search results to the search request (provided that %S parameter is enabled. It is used only with "Combined" search logic.
For example:
::CONFIG NonStrictMatch = [non strict match]
The template part contains HTML code generating HTML-document with the search results. One should use special symbols within this code, which will be replaced by the corresponding text after the HTML document will be generated:
For example:
-- cnsearch.conf ---------------------------------------- # This is a cnsearch configuration file ::CONFIG regcode = Enter Oner registration code here ::CONFIG stats = password ::CONFIG content-type = text/html ::CONFIG NonStrictMatch = [non strict match] ::CONFIG StopWords =, Ignored Words : ::CONFIG SearchType = Combined ::HTMLTOP <HTML> <HEAD> <TITLE>Search results - %Q</TITLE> </HEAD> <BODY> <table width=400 height=40 align=center bgcolor=#C0C0C0> <form action="%F" method=get><tr><td align=center> <input type=text name=q size=40 maxlength=64 value="%Q"> <input type=submit value="Search"> </td></form></tr></table> Documents found: %O <B>%O</B><font color=gray>%W<B>%P</B></font><br> <br> <div align=right> Sort by: <a href="%A">date</a> | <a href="%L">relevancy</a> </div> ::HTMLRESULT <HR> <UL> <LI>%N. <a href="%U" target=_new>%T</A> <small> <font color=red>%S</font> [Relevancy: %R]</small> <UL> <LI>%E <LI>%D <LI>%C <LI><a href="%U" target=_new>%u</A> </UL> </UL> ::HTMLNOTFOUND <P><font color=red>%Q not found</font> ::HTMLBOTTOM %B </BODY> </HTML> -- end cnsearch.conf ------------------------------------
The system allows using various templates for creating different search interface modifications and exploiting different indexes during the search process. To use several templates one should set 'template' parameter in the source code of the search form. If 'template' is not set, standard 'cnsearch.conf' template is used.
Any optional name can be used for a template. A template's name should be composed only of Latin letters (upper or lower case) and Arabic numbers; it is not necessary to add 'conf'.
Correct variant:
<input type="hidden" name="template" value="black">
Incorrect variant:
<input type=hidden name="template" value='../black'> <input type=hidden name="template" value='red.htm'>
Below is the example of a template allowing a user to:
The following way to index files is defined in the template:
::CONFIG path=/home/www/search/en
Example:
-- en.conf --------------------------------------------- ::CONFIG path=/home/www/search/en ::CONFIG regcode = Enter Your registration code here ::CONFIG stats = password ::CONFIG content-type = text/html ::CONFIG NonStrictMatch = [non strict match] ::CONFIG StopWords =, Ignored Words : ::CONFIG SearchType = Combined ::HTMLTOP <HTML> <HEAD> <TITLE>Search results - %Q</TITLE> </HEAD> <BODY> <table width=400 height=40 align=center bgcolor=#C0C0C0> <form action="%F" method=get><tr><td align=center> <input type=text name=q size=40 maxlength=64 value="%Q"> <input type=submit value="Search"> <select name=template> <option value="en">English <option value="es">Spanish <option value="ru">Russian </select> </td></form></tr></table> Documents found: %O <B>%O</B><font color=gray>%W<B>%P</B></font><br> <br> <div align=right> Sort by: <a href="%A">date</a> | <a href="%L">relevancy</a> </div> ::HTMLRESULT <HR> <UL> <LI>%N. <a href="%U" target=_new>%T</A> <small> <font color=red>%S</font> [Relevancy: %R]</small> <UL> <LI>%E <LI>%D <LI>%C <LI><a href="%U" target=_new>%u</A> </UL> </UL> ::HTMLNOTFOUND <P><font color=red>%Q not found</font> ::HTMLBOTTOM %B </BODY> </HTML> -- end of en.conf ---------------------------------------
Starting with version 1.3 the system supports an option of searching through selected sites. Each site is assigned an order number at the indexing stage, starting with '0', for example:
[job localhost] [Index] URL http://www.mysite.com/ Statistic Append CharSet ByHTTPHeader MaxFiles 10000 StopWordsFile stopwords.txt Exclude search/,mail/,.zip,.gif,.jpg [Index] URL http://www.second.com/ Statistic Append CharSet ByHTTPHeader [Index] URL http://www.test.com/ Statistic Append CharSet ByHTTPHeader
Numbers of sites are assigned as follows:
0 - http://www.mysite.com/ 1 - http://www.second.com/ 2 - http://www.test.com/
Please, pay attention to the fact that after the re-indexing one and the same number may be assigned to two different sites. For instance, upon re-indexing by means of the following configuration file:
[job addon] [Index] URL http://www.newsite.com/ Statistic Append CharSet ByHTTPHeader MaxFiles 10000 StopWordsFile stopwords.txt Exclude search/,mail/,.zip,.gif,.jpg
the site http://www.newsite.com/ will be assigned as number "0", or:
0 - http://www.mysite.com/ 0 - http://www.newsite.com/ 1 - http://www.second.com/ 2 - http://www.test.com/
It is necessary to use "d" parameter to perform a search by selected sites. If the parameter is not set (default), the search is performed at all sites.
For example 3:
-- cnsearch.conf ---------------------------------------- ::CONFIG regcode = Enter Your registration code here ::CONFIG stats = password ::HTMLTOP <HTML> <HEAD> <TITLE>Search results - %Q</TITLE> </HEAD> <BODY> <table width=400 height=40 align=center bgcolor=#C0C0C0> <form action="%F" method=get><tr><td align=center> <input type=text name=q size=40 maxlength=64 value="%Q"> <input type=submit value="Search"> <br> <select name=d> <option value="0">www.mysite.com, www.newsite.com <option value="1">www.second.com <option value="2">www.test.com </select> </td></form></tr></table> Documents found: %O <B>%O</B><font color=gray>%W<B>%P</B></font><br> <br> <div align=right> Sort by: <a href="%A">date</a> | <a href="%L">relevancy</a> </div> ::HTMLRESULT <HR> <UL> <LI>%N. <a href="%U" target=_new>%T</A> <small> <font color=red>%S</font> [Relevancy: %R]</small> <UL> <LI>%E <LI>%D <LI>%C <LI><a href="%U" target=_new>%u</A> </UL> </UL> ::HTMLNOTFOUND <P><font color=red>%Q not found</font> ::HTMLBOTTOM %B </BODY> </HTML> -- end cnsearch.conf ------------------------------------
Upon searching though a large amount of sites, search results may be often littered by pages of only one site. For example, for the search phraze "news" all the pages of a news site ending with " // Local news" will be found, and the results from other sites will be shifted back to hundreds or even thousands points.
In order to prevent this situation, large search engines, such as Google, Yandex and Rambler, display only one result form each site. Starting from version 1.5, this option is implemented at CNSearch as well.
to enable grouping by sites, one should add a hidden field group to the search request form:
-- cnsearch.conf ---------------------------------------- .... <BODY> <table width=400 height=40 align=center bgcolor=#C0C0C0> <form action="%F" method=get><tr><td align=center> <input type="text" name="q" size="40" maxlength="64" value="%Q"> <input type="hidden" name="group" value="1"> <input type="submit" value="Search"> </td></form></tr></table> .... -- end cnsearch.conf ------------------------------------
To allow users to perform more detailed search by one site of the search results, one can use the link "more from the site". It can be implemented by means of a special symbol %I:
-- cnsearch.conf ---------------------------------------- .... ::HTMLRESULT .... <LI>%N. <a href="%U" target=_new>%T</A> <small> <font color=red>%S</font> [Relevancy: %R]</small> [ <a href="%F?d=%I&q=%G">more from the site</a> ] <UL> .... -- end cnsearch.conf ------------------------------------