sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini . . .
. . .
standard, by either putting a robots.txt file into the root directory of the server, or adding the necessary Meta tags into the page headers. This directive could be temporary overwritten site specific for the next index procedure by the advanced option: Temporary ignore 'robots.txt' 4.2 Must include / must not include string list A powerful . . .
. . .
for each URL individually. 4.3 Ignoring links Sphider-plus respect rel= nofollow attribute in <a href..> tags, so for example the link foo.html in <a href= foo.html rel= nofollow > is ignored. Also if the nofollow flag is set in the header of a site, this link will not been followed. This directive could be temporary overwritten . . .
. . .
a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to . . .
. . .
will be used to delete the content between <div id='abc'> and </ Never the less links inside of the div tags will be followed. The values in the list-file will not be interpreted case sensitive. Values in this common list may end with a wildcard, so that 'menu*' will work for ids like menu1, menu2, menu_left, etc. Multiple and nested . . .
. . .
be used to index only the content between <div id='abc'> and </ Never the less links outside of the div tags will be followed. The values in the list-file will not be interpreted case sensitive. Values in this common list may end with a wildcard, so that 'menu*' will work for ids like menu1, menu2, menu_left, etc. Multiple and nested . . .
. . .
the HTML element and tag references at http://www.w3schools.com/html/html_elements.asp http://www.w3schools.com/tags/ If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to delete the content between <tagname> and </tagname> . Never the less, links inside of the tags . . .
. . .
a relative path, but is not allowed to refer to a different domain. Unfortunately the creation of canonical link tags needs to be done manually. So special care has to be taken that other directives like robots.txt or rel="nofollow" will not prevent the crawling of the canonical origin. Top 5. UTF-8 Support and 'Preferred charset' Starting with . . .
. . .
be powerful. First of all: the complete full text, and all header information like title, keywords and description tags need to be converted into Unicode. Consequence is an increase of time required for indexing. As also suggested by Yiannes [pikos], three steps are integrated to realize this procedure: 1. Detect the charset of site, page or . . .