elements defined by <tagname> . . </tagname > 4.9 Ignored words 4.10 Use of Whitelist 4.11 Use of Blacklistlist 4.12 Ignored files 4.13 Canonical <link> tag 5. UTF-8 Support and 'Preferred charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search . . .
. . .
must be in a separate row. - One word per row. - No blank rows. - No blank row at the end of the file. 4.11 Use of Blacklistlist Sphider-plus offers the capability to control the index / re-index procedure by a list of words called 'Blacklistlist'. If the content of the page contains one word of the Blacklistlist, it will not be indexed / re-indexed. The . . .
. . .
one word of the Blacklistlist, it will not be indexed / re-indexed. The list is placed in the file /include/common/Blacklistlist.txt In Admin / Settings/ Spider settings, the use of the Blacklistlist may be activated / deactivated by the checkbox: Use Blacklistlist to prevent index / re-index of pages that contain any of the words in Blacklistlist? A second . . .
. . .
A second setting in the same settings section enables the rejection of queries that contain a word of the Blacklistlist. Even if the evil word is only part of the query. If the checkbox: Use Blacklistlist to delete queries that contain any of the words in Blacklistlist? is activated, the complete query is deleted and a blank search is performed. . . .
. . .
is activated, the complete query is deleted and a blank search is performed. Please keep in mind that 'Use of Blacklistlist is implemented in a different way than implementation of 'Use of whitelist'. Blacklistlist is interpreting its content as a string. So, the word 'kinder' in Blacklistlist, will also prevent indexing of a page containing the word . . .
. . .
will also prevent indexing of a page containing the word 'kindergarten'. Be aware not to place blank rows into the Blacklistlist. Also the list should end with the last word; not with a line feed or a blank row. - Each word in list must be in a separate row. - One word per row. - No blank rows. - No blank row at the end of the file. 4.12 Ignored . . .
. . .
to be indexed. All file types not to be followed for text indexing must be placed in 'ext.txt'. To be seen as a Blacklistlist for file suffixes. While image.txt audio.txt video.txt are whitelists that include suffixed for files to be indexed, according to the type of media. 4.13 Canonical <link> tag As defined by Google, Microsoft and Yahoo! . . .
. . .
the corresponding search queries will be rejected. The first option is controlled by the file /include/common/Blacklist_uas.txt holding lists of user agents known to be evil. Here well known evil bot UAs are stored. Additionally there is a list of well known brave bots, which s stored in /include/common/white_uas.txt If the user UA is part of . . .
. . .
s stored in /include/common/white_uas.txt If the user UA is part of this white list, the comparison with all the Blacklist listed UAs will be skipped. Meta search engines are identified by their IP. The IPs could be entered as single IP, as well as IP ranges into the file /include/common/Blacklist_ips.txt Prevented queries are answered with the text . . .