Sphider-plus



Displaying results 1 - 20 of 63 matches

1.   Sphider-plus - The PHP Search Engine Visit in a new window


Sphider-plus is a search engine based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods functions template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese Cyrillic Georgian Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form result listing and addurl form to display size of computers tablets smartphones etc. Media support Index and search for images (incl. Open Graph images) audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin' 'Search User' and 'Suggest URL'. Support of multiple table sets in each db MySQL query cache individual index for each db individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically repeated every selected time interval. Admin selectable intervals for 3 hours 12 hours 1 day 1 week or 1 month. Multithreaded indexing In order to reduce the time for indexing 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option the admin may select a suitable level for the next index procedure. Thus only those URLs containing the according level will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content inclusive formatting the search results. RDF RSD RSS and Atom feed support Index and search of feed content inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards Tolerant search Search strict Search only in one domain Search all links of a site Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new old and half width) hiragana katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301 302 303 and 307 status codes. Also obeying JavaScript sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF DOCX XLSX ODT ODS CSV PPTX and XLS files Converting also non-Latin text like: Arabic Cyrillic Chinese Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links keywords frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic Bengali Bulgarian Catalan Chinese Cyrillic Czech Danish Dutch English Farsi Finnish French Greek German Hindi Hungarian Italian Norwegian Polish Portuguese Romanian Spanish Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section nav aside hgroup article header footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 32459595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.55 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 572424 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http//президентрф andhttp//президент.рф/' and 'http//президентрф andhttp//müller.de/' are accepted and . . .
. . .
'http//президентрф andhttp//müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable//президентрф andhttp -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable//президентрф andhttp -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like//президентрф andhttp caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like//президентрф andhttp d'information <-> information dei'largi <-> largi Also Admin selectabe//президентрф andhttp Equalize the different quotes like//президентрф andhttp ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like//президентрф andhttp cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like//президентрф andhttp <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like//президентрф andhttp document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of//президентрф andhttp document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like//президентрф andhttp Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index//президентрф andhttp New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for//президентрф andhttp Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like//президентрф andhttp section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to//президентрф andhttp - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as//президентрф andhttp 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http//президентрф andhttp//президент.рф/' and 'http//президентрф andhttp//müller.de/' are accepted and . . .
. . .
'http//президентрф andhttp//müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable//президентрф andhttp -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable//президентрф andhttp -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like//президентрф andhttp caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like//президентрф andhttp d'information <-> information dei'largi <-> largi Also Admin selectabe//президентрф andhttp Equalize the different quotes like//президентрф andhttp ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like//президентрф andhttp cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like//президентрф andhttp <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like//президентрф andhttp document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of//президентрф andhttp document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like//президентрф andhttp Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index//президентрф andhttp New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for//президентрф andhttp Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like//президентрф andhttp section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to//президентрф andhttp - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as//президентрф andhttp 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language=javascript>"javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mpphpmcv=59</SCRIPT>";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v55 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .

URL: http://sphider-plus.eu/ - 25.6 kb

2.   Sphider-plus - The PHP Search Engine Visit in a new window


Sphider-plus is a search engine based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods functions template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese Cyrillic Georgian Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form result listing and addurl form to display size of computers tablets smartphones etc. Media support Index and search for images (incl. Open Graph images) audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin' 'Search User' and 'Suggest URL'. Support of multiple table sets in each db MySQL query cache individual index for each db individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically repeated every selected time interval. Admin selectable intervals for 3 hours 12 hours 1 day 1 week or 1 month. Multithreaded indexing In order to reduce the time for indexing 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option the admin may select a suitable level for the next index procedure. Thus only those URLs containing the according level will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content inclusive formatting the search results. RDF RSD RSS and Atom feed support Index and search of feed content inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards Tolerant search Search strict Search only in one domain Search all links of a site Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new old and half width) hiragana katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301 302 303 and 307 status codes. Also obeying JavaScript sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF DOCX XLSX ODT ODS CSV PPTX and XLS files Converting also non-Latin text like: Arabic Cyrillic Chinese Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links keywords frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic Bengali Bulgarian Catalan Chinese Cyrillic Czech Danish Dutch English Farsi Finnish French Greek German Hindi Hungarian Italian Norwegian Polish Portuguese Romanian Spanish Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section nav aside hgroup article header footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 32459595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.55 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 572424 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http//президентрф andhttp//президент.рф/' and 'http//президентрф andhttp//müller.de/' are accepted and . . .
. . .
'http//президентрф andhttp//müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable//президентрф andhttp -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable//президентрф andhttp -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like//президентрф andhttp caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like//президентрф andhttp d'information <-> information dei'largi <-> largi Also Admin selectabe//президентрф andhttp Equalize the different quotes like//президентрф andhttp ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like//президентрф andhttp cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like//президентрф andhttp <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like//президентрф andhttp document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of//президентрф andhttp document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like//президентрф andhttp Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index//президентрф andhttp New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for//президентрф andhttp Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like//президентрф andhttp section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to//президентрф andhttp - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as//президентрф andhttp 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http//президентрф andhttp//президент.рф/' and 'http//президентрф andhttp//müller.de/' are accepted and . . .
. . .
'http//президентрф andhttp//müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable//президентрф andhttp -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable//президентрф andhttp -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like//президентрф andhttp caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like//президентрф andhttp d'information <-> information dei'largi <-> largi Also Admin selectabe//президентрф andhttp Equalize the different quotes like//президентрф andhttp ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like//президентрф andhttp cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like//президентрф andhttp <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like//президентрф andhttp document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of//президентрф andhttp document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like//президентрф andhttp Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index//президентрф andhttp New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for//президентрф andhttp Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like//президентрф andhttp section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to//президентрф andhttp - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as//президентрф andhttp 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language=javascript>"javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mpphpmcv=59</SCRIPT>";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .


Sphider-plus is a search engine, based on the scripts of original Sphider. [ About Sphider-plus] More than 400 new features (additional mods, functions, template designs and debugging) have been added to the original Sphider. For details about all the improvements and changes, please read the Documentation section. [ Main features ] Item . . .
. . .
section. [ Main features ] Item Description UTF-8 and UTF-16 support Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols. Support for non-ASCII domains 'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and . . .
. . .
'http://müller.de/' are accepted and processed. Responsive design Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc. Media support Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also . . .
. . .
player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of . . .
. . .
always in cache. Separate caches for text and media results. Admin configurable. Follow sitemap files If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed . . .
. . .
also multiple sitemap files are processed. Periodical Re-index Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month. Multithreaded indexing In order to reduce the time for indexing, 1-10 parallel running threads might be activated . . .
. . .
1-10 parallel running threads might be activated in Admin settings. Preferred re-index While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or . . .
. . .
(site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various . . .
. . .
ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). Add thumbnails to each page presented in text results Admin selectable, this feature will present a web shot . . .
. . .
suspected to contain malware or phishing content. 11 different modes of sorting the text results Admin selectable: -By relevance (weight % ) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted . . .
. . .
on top - Links holding promoted catchwords on top. 5 different modes of sorting the media results Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'. Same results for queries typed with pure vowels, or with accents Will deliver the same results for queries like: caf e and caf é . To be . . .
. . .
in Admin backend. Same results for queries with and without quotes Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´ Same text results for queries with and without ligatures Admin selectable; will deliver the same results . . .
. . .
results for queries with and without ligatures Admin selectable; will deliver the same results for queries like: cœur and coeur . Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures. Present all results for singular and . . .
. . .
all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Transliterate Latin characters into their Greek equivalents Transforms query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the . . .
. . .
and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript. Accept gzip formatted transmission In order . . .
. . .
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. Debug mode Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be . . .
. . .
Included for 33 languages. Common word lists holding stop words. Included for 25 languages Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish. . . .
. . .
links outside are followed. Multiple and nested divs will be attended. Do not index parts of a page defined by HTML5 elements <tag> . . . </tag> Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between . . .
. . .
Extension implemented SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v55 is supported. Compatible with MySQL and MariaDB Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql Ready to run in PHP 8 environment Latest version of Sphider-plus version 4.2024a is proven up to PHP . . .
. . .
8.3.2 [ Proven ] Successfully implemented as search engine on a customer site with a database capacity such as: 25.206 sites 324.595 page links 1.260.698 keywords 169.251 media links. Imprint Private Notice Private Policy . . .

3.   Sphider-plus - The PHP Search Engine Visit in a new window

4.   Sphider-plus - The PHP Search Engine Visit in a new window


1571 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Intro ] Sphider-plus is a search engine based on the original Sphider scripts created by Ando Saabas. In front of original Sphider additional mods functions template designs and debugging have been performed. For details about all . . .
. . .
additional mods functions template designs and debugging have been performed. For details about all changes please notice the chapter Change Log The names of Sphider-plus folders and scripts are often the same like those of original Sphider. But the scripts are not interchangeable between Sphider and Sphider-plus. In front of original . . .
. . .
files. You are invited to translate your native language and then to share the files with the community. Also mods improvements and of course bug fixes are very welcome for future releases of Sphider-plus. Sphider-plus offers a wide range of customizing the index and search procedures. By means of an Admin backend all settings are presented. . . .
. . .
the index and search procedures. By means of an Admin backend all settings are presented. As stated above this search engine uses some PHP libraries and extensions. When opening the Setting interface the existence off these libraries are tested by software and in case that a library is not part of the server environment the . . .
. . .
is not part of the server environment the according option is not presented in the Settings interface. For example if the 'rar' extension is not available it will not be possible to index RAR archives and the belonging checkbox will not be presented in 'Spider Settings'. In order to check the availability of all required libraries and . . .
. . .
not be presented in 'Spider Settings'. In order to check the availability of all required libraries and extensions the Debug mode will present the corresponding messages. Sphider-plus does not contain a JavaScript engine. Consequently all content created in real-time while loading the page will not be indexed. Indexing with a search engine . . .
. . .
like Sphider-plus is problematic on a 'Shared Hosting' server. Indexing huge amount of links might be interrupted because the granted time slice might end before index procedure is finished. Especially if you intend to index not only text but also media content like images as well as audio and video streams. Sphider-plus tries 3 times to . . .
. . .
and video streams. Sphider-plus tries 3 times to reconnect to the database. But if the server canceled the script it will become necessary to manually invoke again the index procedure to continue. Sphider-plus will remember the last indexed link and continue the suspended process. Some special functions like e.g. 'cyclical indexing' in any . . .
. . .
'cyclical indexing' in any case will fail on a 'Shared Hosting' server. Sever problems were reported by customers who tried to install Sphider-plus on 'Shared Hosting' servers offered by Hostinger Hosting24 and A2Hosting. They even seem not to supply the PECL library to their hosting packages which is obligatory for a Sphider-plus . . .


1571 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Intro ] Sphider-plus is a search engine based on the original Sphider scripts created by Ando Saabas. In front of original Sphider additional mods, functions, template designs and debugging have been performed. For details about all . . .
. . .
additional mods, functions, template designs and debugging have been performed. For details about all changes, please notice the chapter Change Log The names of Sphider-plus folders and scripts are often the same like those of original Sphider. But the scripts are not interchangeable between Sphider and Sphider-plus. In front of original . . .
. . .
files. You are invited to translate your native language and then to share the files with the community. Also mods, improvements and of course bug fixes are very welcome for future releases of Sphider-plus. Sphider-plus offers a wide range of customizing the index and search procedures. By means of an Admin backend, all settings are presented. . . .
. . .
the index and search procedures. By means of an Admin backend, all settings are presented. As stated above, this search engine uses some PHP libraries and extensions. When opening the Setting interface, the existence off these libraries are tested by software, and in case that a library is not part of the server environment, the . . .
. . .
is not part of the server environment, the according option is not presented in the Settings interface. For example, if the 'rar' extension is not available, it will not be possible to index RAR archives and the belonging checkbox will not be presented in 'Spider Settings'. In order to check the availability of all required libraries and . . .
. . .
not be presented in 'Spider Settings'. In order to check the availability of all required libraries and extensions, the Debug mode will present the corresponding messages. Sphider-plus does not contain a JavaScript engine. Consequently all content created in real-time, while loading the page, will not be indexed. Indexing with a search engine . . .
. . .
like Sphider-plus is problematic on a 'Shared Hosting' server. Indexing huge amount of links might be interrupted, because the granted time slice might end before index procedure is finished. Especially if you intend to index not only text, but also media content like images, as well as audio and video streams. Sphider-plus tries 3 times to . . .
. . .
and video streams. Sphider-plus tries 3 times to reconnect to the database. But if the server canceled the script, it will become necessary to manually invoke again the index procedure to continue. Sphider-plus will remember the last indexed link and continue the suspended process. Some special functions like e.g. 'cyclical indexing' in any . . .
. . .
'cyclical indexing' in any case will fail on a 'Shared Hosting' server. Sever problems were reported by customers, who tried to install Sphider-plus on 'Shared Hosting' servers offered by Hostinger, Hosting24 and A2Hosting. They even seem not to supply the PECL library to their hosting packages, which is obligatory for a Sphider-plus . . .

5.   Sphider-plus - The PHP Search Engine Visit in a new window


All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Release ] Name: Sphider-plus Version: 4.2024a Released: February 12 2024 Based on original Sphider version 1.3.5 released 2009-12-13 [ Legal Info ] This program is licensed under the GNU GPL v.3 by Rolf Kellner [Tec] tec(a t)sphider-plus.eu . . .
. . .
the GNU GPL v.3 by Rolf Kellner [Tec] tec(a t)sphider-plus.eu Original Sphider GNU GPL licence by Ando Saabas ando(a t)cs.ioc.ee We distribute software in the hope that it will be useful but without any warranty. No author or distributor of this software accepts responsibility to anyone for the consequences of using it or for whether it . . .
. . .
to anyone for the consequences of using it or for whether it serves any particular purpose or works at all unless he says so in writing. This is exactly the same warranty that proprietary software companies offer: none. Imprint Privacy Policy [ Donation ] If you want to use Sphider-plus and also want to promote further development your . . .


All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Release ] Name: Sphider-plus Version: 4.2024a Released: February 12, 2024 Based on original Sphider version 135, released 2009-12-13 [ Legal Info ] This program is licensed under the GNU GPL v.3 by Rolf Kellner [Tec], tec(a t)sphider-plus.eu . . .
. . .
the GNU GPL v.3 by Rolf Kellner [Tec], tec(a t)sphider-plus.eu Original Sphider GNU GPL licence by Ando Saabas, ando(a t)cs.ioc.ee We distribute software in the hope that it will be useful, but without any warranty. No author or distributor of this software accepts responsibility to anyone for the consequences of using it or for whether it . . .
. . .
to anyone for the consequences of using it or for whether it serves any particular purpose or works at all, unless he says so in writing. This is exactly the same warranty that proprietary software companies offer: none. Imprint Privacy Policy [ Donation ] If you want to use Sphider-plus and also want to promote further development, your . . .


All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Release ] Name Sphider-plus Version 4.2024a Released February 12, 2024 Based on original Sphider version 1.3.5, released 2009-12-13 [ Legal Info ] This program is licensed under the GNU GPL v.3 by Rolf Kellner [Tec], tec(a t)sphider-plus.eu . . .
. . .
the GNU GPL v.3 by Rolf Kellner [Tec], tec(a t)sphider-plus.eu Original Sphider GNU GPL licence by Ando Saabas, ando(a t)cs.ioc.ee We distribute software in the hope that it will be useful, but without any warranty. No author or distributor of this software accepts responsibility to anyone for the consequences of using it or for whether it . . .
. . .
to anyone for the consequences of using it or for whether it serves any particular purpose or works at all, unless he says so in writing. This is exactly the same warranty that proprietary software companies offer none. Imprint Privacy Policy [ Donation ] If you want to use Sphider-plus and also want to promote further development, your . . .


All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Release ] Name Sphider-plus Version 4.2024a Released February 12, 2024 Based on original Sphider version 1.3.5, released 2009-12-13 [ Legal Info ] This program is licensed under the GNU GPL v.3 by Rolf Kellner [Tec], tec(a t)sphider-plus.eu . . .
. . .
the GNU GPL v.3 by Rolf Kellner [Tec], tec(a t)sphider-plus.eu Original Sphider GNU GPL licence by Ando Saabas, ando(a t)cs.ioc.ee We distribute software in the hope that it will be useful, but without any warranty. No author or distributor of this software accepts responsibility to anyone for the consequences of using it or for whether it . . .
. . .
to anyone for the consequences of using it or for whether it serves any particular purpose or works at all, unless he says so in writing. This is exactly the same warranty that proprietary software companies offer none. Imprint Privacy Policy [ Donation ] If you want to use Sphider-plus and also want to promote further development, your . . .

6.   Sphider-plus - The PHP Search Engine Visit in a new window

Legal Info Installation Documentation Change Log [ Installation Summary ] Preconditions Sphider-plus requires PHP 5.3 - 8.x (proven up to version 8.3.2) with installed GD mbstring PECL and zlib libraries. Additionally if RAR compressed files should be indexed the RAR extension is required. Also a MySQL (proven up to version 8.0.32) database . . .
. . .
following PHP and Apache settings should be adjusted on the server before installing Sphider-plus PHP safe_mode : Off (deprected since PHP 5.4) register_globals : Off (deprected since PHP 5.4) allow_url_fopen : On allow_url_include : On (deprecated since PHP 7.4) Webserver mod_rewrite : On display_errors : On error_reporting : E_ALL & . . .
. . .
: On display_errors : On error_reporting : E_ALL & ~E_DEPRECATED & ~E_WARNING & ~E_NOTICE & ~E_STRICT memory_limit : 256M (minimum) phpinfo : enabled ( disable_functions = phpinfo - change it to: disable_functions = ) AlllowOverride : All (to be found in apache sub folder /conf/httpd.ini) Additional note: The .htaccess files supplied with . . .
. . .
on some servers and might need to be disabled by renaming them. In order to enable correct function of Sphider-plus please follow the instructions as described below. Additional (but important) note: Please do not edit/modify any of the Sphider-plus scripts and files during first installation. Everything must be performed by means of the admin . . .
. . .
installation. Everything must be performed by means of the admin backend. There are several self tests implemented which should not be bypassed by altering the scripts. Because as long as the admin interfaces shows error messages and warnings later index procedures and the search algorithm may fail. - Installation - - New installation - . . .
. . .
Updating from 4.x to 4.y [ Installation of version 4.2024a ] New installation In order to get Sphider-plus running perform the following steps: 1. Unzip the downloaded file and upload all folders and files to the server for example to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL . . .
. . .
to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL server which will hold the Sphider-plus data tables. Collation of the SQL server connection must be defined to utf8mb4_unicode_ci already before creating any database. In order to get full UNICODE support the collation of Sphider-plus . . .
. . .
order to get full UNICODE support the collation of Sphider-plus databases also must be set to utf8mb4_unicode_ci which is available for MySQL server version 5.5.3 Creation of this database needs to be done outside of the Sphider-plus scripts before entering into the admin backend the first time. For example with a tool like phpMyAdmin . . .
. . .
scripts before entering into the admin backend the first time. For example with a tool like phpMyAdmin PLESK or something similar. During this step you already define - Name of database - Username - Password - Database host which will be required later on in step 5 of the installation process. Please take in mind that Sphider-plus . . .
. . .
not be usable for Sphider-plus. Additionally take care that your password should not contain special characters because some MySQL versions do not process them. Sometimes the 'Database host' is also called 'Server' or 'Database server' like in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with . . .
. . .
in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with something like: http://localhost/public/sphider/admin/admin.php First access to the admin backend is granted without login. Later access to the admin backend will only be granted after login. Use admin as user name and also as password. On first . . .
. . .
to the admin backend will only be granted after login. Use admin as user name and also as password. On first login there will be several warning messages because no database is allocated to Sphider-plus. By means of a self test performed each time the Admin is called also write permission to sub folders converter availability etc. are . . .
. . .
converter availability etc. are checked. Eventually it might become necessary to follow some warning messages like: chmod 777 the folder /admin/tmp/ Additional note: Admin backend login may fail for several server configurations. Please notice the FAQ chapter for several examples how to fix such an issue. 4. Open the sub-menu 'Database' and . . .
. . .
into this section there will be again several warning messages. At minimum one database has to be declared for: Name of database User name Password Database host Prefix for tables as defined before in step2 externally when you created the database. Additionally you need to add a free selectable name as table_prefix. This table prefix is . . .
. . .
database the settings for the non-required databases may remain blank. A corresponding message will be displayed: Mysql server for database 2 is not available! Trying to reconnect to database 2 . . . Cannot connect to this database. Never mind if you don't need it. Installation of multiple databases is described in documentation chapter . . .
. . .
work will be the activation of the database. There are three settings available in the 'Activate / Disable' section: - Select active database for Admin - Select active database for 'Search' user - Select active database for 'Suggest URL' user Each setting allows activating of one database. So if multiple databases are configured an independent . . .
. . .
Sphider-plus version. The additional line is something like: .submitBox { background: #fff; text-align: center; width:50% ; border:1px solid #070; border-radius: 10px; box-shadow: 10px 10px 3px #777; } Add this line to your userstyle.css and modify it for your individual requirements. Afterwards obligatory enter into the admin backend = . . .

Legal Info Installation Documentation Change Log [ Installation Summary ] Preconditions Sphider-plus requires PHP 53 - 8.x (proven up to version 8.3.2) with installed GD, mbstring, PECL and zlib libraries. Additionally, if RAR compressed files should be indexed, the RAR extension is required. Also a MySQL (proven up to version 8.0.32) database . . .
. . .
following PHP and Apache settings should be adjusted on the server before installing Sphider-plus PHP safe_mode : Off (deprected since PHP 5.4) register_globals : Off (deprected since PHP 5.4) allow_url_fopen : On allow_url_include : On (deprecated since PHP 7.4) Webserver mod_rewrite : On display_errors : On error_reporting : E_ALL & . . .
. . .
: On display_errors : On error_reporting : E_ALL & ~E_DEPRECATED & ~E_WARNING & ~E_NOTICE & ~E_STRICT memory_limit : 256M (minimum) phpinfo : enabled ( disable_functions = phpinfo - change it to: disable_functions = ) AlllowOverride : All (to be found in apache sub folder /conf/httpd.ini) Additional note: The .htaccess files supplied with . . .
. . .
on some servers and might need to be disabled by renaming them. In order to enable correct function of Sphider-plus, please follow the instructions as described below. Additional (but important) note: Please do not edit/modify any of the Sphider-plus scripts and files during first installation. Everything must be performed by means of the admin . . .
. . .
installation. Everything must be performed by means of the admin backend. There are several self tests implemented, which should not be bypassed by altering the scripts. Because as long as the admin interfaces shows error messages and warnings, later index procedures and the search algorithm may fail. - Installation - - New installation - . . .
. . .
Updating from 4.x to 4.y [ Installation of version 4.2024a ] New installation In order to get Sphider-plus running, perform the following steps: 1. Unzip the downloaded file, and upload all folders and files to the server, for example to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL . . .
. . .
to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL server, which will hold the Sphider-plus data tables. Collation of the SQL server connection must be defined to utf8mb4_unicode_ci already before creating any database. In order to get full UNICODE support, the collation of Sphider-plus . . .
. . .
order to get full UNICODE support, the collation of Sphider-plus databases also must be set to utf8mb4_unicode_ci , which is available for MySQL server version 5.53 Creation of this database needs to be done outside of the Sphider-plus scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, . . .
. . .
scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, PLESK, or something similar. During this step you already define - Name of database - Username - Password - Database host which will be required later on in step 5 of the installation process. Please take in mind that Sphider-plus . . .
. . .
not be usable for Sphider-plus. Additionally take care that your password should not contain special characters, because some MySQL versions do not process them. Sometimes the 'Database host' is also called 'Server' or 'Database server', like in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with . . .
. . .
in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with something like: http://localhost/public/sphider/admin/admin.php First access to the admin backend is granted without login. Later access to the admin backend will only be granted after login. Use admin as user name and also as password. On first . . .
. . .
to the admin backend will only be granted after login. Use admin as user name and also as password. On first login, there will be several warning messages, because no database is allocated to Sphider-plus. By means of a self test performed each time the Admin is called, also write permission to sub folders, converter availability, etc. are . . .
. . .
converter availability, etc. are checked. Eventually it might become necessary to follow some warning messages like: chmod 777 the folder /admin/tmp/ Additional note: Admin backend login may fail for several server configurations. Please notice the FAQ chapter for several examples how to fix such an issue. 4. Open the sub-menu 'Database' and . . .
. . .
into this section, there will be again several warning messages. At minimum one database has to be declared for: Name of database User name Password Database host Prefix for tables as defined before in step2 externally, when you created the database. Additionally you need to add a free selectable name as table_prefix. This table prefix is . . .
. . .
database, the settings for the non-required databases may remain blank. A corresponding message will be displayed: Mysql server for database 2 is not available! Trying to reconnect to database 2 . . . Cannot connect to this database. Never mind if you don't need it. Installation of multiple databases is described in documentation chapter . . .
. . .
work will be the activation of the database. There are three settings available in the 'Activate / Disable' section: - Select active database for Admin - Select active database for 'Search' user - Select active database for 'Suggest URL' user Each setting allows activating of one database. So, if multiple databases are configured, an independent . . .
. . .
Sphider-plus version. The additional line is something like: .submitBox { background: #fff; text-align: center; width:50% ; border:1px solid #070; border-radius: 10px; box-shadow: 10px 10px 3px #777; } Add this line to your userstyle.css and modify it for your individual requirements. Afterwards obligatory enter into the admin backend = . . .

Legal Info Installation Documentation Change Log [ Installation Summary ] Preconditions Sphider-plus requires PHP 5.3 - 8.x (proven up to version 8.3.2) with installed GD, mbstring, PECL and zlib libraries. Additionally, if RAR compressed files should be indexed, the RAR extension is required. Also a MySQL (proven up to version 8.0.32) database . . .
. . .
following PHP and Apache settings should be adjusted on the server before installing Sphider-plus PHP safe_mode : Off (deprected since PHP 54)) register_globals : Off (deprected since PHP 54)) allow_url_fopen : On allow_url_include : On (deprecated since PHP 7.4) Webserver mod_rewrite : On display_errors : On error_reporting : E_ALL & . . .
. . .
: On display_errors : On error_reporting : E_ALL & ~E_DEPRECATED & ~E_WARNING & ~E_NOTICE & ~E_STRICT memory_limit : 256M (minimum) phpinfo : enabled ( disable_functions = phpinfo - change it to: disable_functions = ) AlllowOverride : All (to be found in apache sub folder /conf/httpd.ini) Additional note: The .htaccess files supplied with . . .
. . .
on some servers and might need to be disabled by renaming them. In order to enable correct function of Sphider-plus, please follow the instructions as described below. Additional (but important) note: Please do not edit/modify any of the Sphider-plus scripts and files during first installation. Everything must be performed by means of the admin . . .
. . .
installation. Everything must be performed by means of the admin backend. There are several self tests implemented, which should not be bypassed by altering the scripts. Because as long as the admin interfaces shows error messages and warnings, later index procedures and the search algorithm may fail. - Installation - - New installation - . . .
. . .
Updating from 4.x to 4.y [ Installation of version 4.2024a ] New installation In order to get Sphider-plus running, perform the following steps: 1. Unzip the downloaded file, and upload all folders and files to the server, for example to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL . . .
. . .
to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL server, which will hold the Sphider-plus data tables. Collation of the SQL server connection must be defined to utf8mb4_unicode_ci already before creating any database. In order to get full UNICODE support, the collation of Sphider-plus . . .
. . .
order to get full UNICODE support, the collation of Sphider-plus databases also must be set to utf8mb4_unicode_ci , which is available for MySQL server version 5.5.3 Creation of this database needs to be done outside of the Sphider-plus scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, . . .
. . .
scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, PLESK, or something similar. During this step you already define - Name of database - Username - Password - Database host which will be required later on in step 5 of the installation process. Please take in mind that Sphider-plus . . .
. . .
not be usable for Sphider-plus. Additionally take care that your password should not contain special characters, because some MySQL versions do not process them. Sometimes the 'Database host' is also called 'Server' or 'Database server', like in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with . . .
. . .
in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with something like: http://localhost/public/sphider/admin/admin.php First access to the admin backend is granted without login. Later access to the admin backend will only be granted after login. Use admin as user name and also as password. On first . . .
. . .
to the admin backend will only be granted after login. Use admin as user name and also as password. On first login, there will be several warning messages, because no database is allocated to Sphider-plus. By means of a self test performed each time the Admin is called, also write permission to sub folders, converter availability, etc. are . . .
. . .
converter availability, etc. are checked. Eventually it might become necessary to follow some warning messages like: chmod 777 the folder /admin/tmp/ Additional note: Admin backend login may fail for several server configurations. Please notice the FAQ chapter for several examples how to fix such an issue. 4. Open the sub-menu 'Database' and . . .
. . .
into this section, there will be again several warning messages. At minimum one database has to be declared for: Name of database User name Password Database host Prefix for tables as defined before in step2 externally, when you created the database. Additionally you need to add a free selectable name as table_prefix. This table prefix is . . .
. . .
database, the settings for the non-required databases may remain blank. A corresponding message will be displayed: Mysql server for database 2 is not available! Trying to reconnect to database 2 . . . Cannot connect to this database. Never mind if you don't need it. Installation of multiple databases is described in documentation chapter . . .
. . .
work will be the activation of the database. There are three settings available in the 'Activate / Disable' section: - Select active database for Admin - Select active database for 'Search' user - Select active database for 'Suggest URL' user Each setting allows activating of one database. So, if multiple databases are configured, an independent . . .
. . .
Sphider-plus version. The additional line is something like: .submitBox { background: #fff; text-align: center; width:50% ; border:1px solid #070; border-radius: 10px; box-shadow: 10px 10px 3px #777; } Add this line to your userstyle.css and modify it for your individual requirements. Afterwards obligatory enter into the admin backend = . . .

Legal Info Installation Documentation Change Log [ Installation Summary ] Preconditions Sphider-plus requires PHP 5.3 - 8.x (proven up to version 8.3.2) with installed GD, mbstring, PECL and zlib libraries. Additionally, if RAR compressed files should be indexed, the RAR extension is required. Also a MySQL (proven up to version 8.0.32) database . . .
. . .
following PHP and Apache settings should be adjusted on the server before installing Sphider-plus PHP safe_mode : Off (deprected since PHP 5.4) register_globals : Off (deprected since PHP 5.4) allow_url_fopen : On allow_url_include : On (deprecated since PHP 7.4) Webserver mod_rewrite : On display_errors : On error_reporting : E_ALL & . . .
. . .
: On display_errors : On error_reporting : E_ALL & ~E_DEPRECATED & ~E_WARNING & ~E_NOTICE & ~E_STRICT memory_limit : 256M (minimum) phpinfo : enabled ( disable_functions = phpinfo - change it to: disable_functions = ) AlllowOverride : All (to be found in apache sub folder /conf/httpd.ini) Additional note: The .htaccess files supplied with . . .
. . .
on some servers and might need to be disabled by renaming them. In order to enable correct function of Sphider-plus, please follow the instructions as described below. Additional (but important) note: Please do not edit/modify any of the Sphider-plus scripts and files during first installation. Everything must be performed by means of the admin . . .
. . .
installation. Everything must be performed by means of the admin backend. There are several self tests implemented, which should not be bypassed by altering the scripts. Because as long as the admin interfaces shows error messages and warnings, later index procedures and the search algorithm may fail. - Installation - - New installation - . . .
. . .
Updating from 4.x to 4.y [ Installation of version 4.2024a ] New installation In order to get Sphider-plus running, perform the following steps: 1. Unzip the downloaded file, and upload all folders and files to the server, for example to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL . . .
. . .
to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL server, which will hold the Sphider-plus data tables. Collation of the SQL server connection must be defined to utf8mb4_unicode_ci already before creating any database. In order to get full UNICODE support, the collation of Sphider-plus . . .
. . .
order to get full UNICODE support, the collation of Sphider-plus databases also must be set to utf8mb4_unicode_ci , which is available for MySQL server version 553.3 Creation of this database needs to be done outside of the Sphider-plus scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, . . .
. . .
scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, PLESK, or something similar. During this step you already define - Name of database - Username - Password - Database host which will be required later on in step 5 of the installation process. Please take in mind that Sphider-plus . . .
. . .
not be usable for Sphider-plus. Additionally take care that your password should not contain special characters, because some MySQL versions do not process them. Sometimes the 'Database host' is also called 'Server' or 'Database server', like in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with . . .
. . .
in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with something like: http://localhost/public/sphider/admin/admin.php First access to the admin backend is granted without login. Later access to the admin backend will only be granted after login. Use admin as user name and also as password. On first . . .
. . .
to the admin backend will only be granted after login. Use admin as user name and also as password. On first login, there will be several warning messages, because no database is allocated to Sphider-plus. By means of a self test performed each time the Admin is called, also write permission to sub folders, converter availability, etc. are . . .
. . .
converter availability, etc. are checked. Eventually it might become necessary to follow some warning messages like: chmod 777 the folder /admin/tmp/ Additional note: Admin backend login may fail for several server configurations. Please notice the FAQ chapter for several examples how to fix such an issue. 4. Open the sub-menu 'Database' and . . .
. . .
into this section, there will be again several warning messages. At minimum one database has to be declared for: Name of database User name Password Database host Prefix for tables as defined before in step2 externally, when you created the database. Additionally you need to add a free selectable name as table_prefix. This table prefix is . . .
. . .
database, the settings for the non-required databases may remain blank. A corresponding message will be displayed: Mysql server for database 2 is not available! Trying to reconnect to database 2 . . . Cannot connect to this database. Never mind if you don't need it. Installation of multiple databases is described in documentation chapter . . .
. . .
work will be the activation of the database. There are three settings available in the 'Activate / Disable' section: - Select active database for Admin - Select active database for 'Search' user - Select active database for 'Suggest URL' user Each setting allows activating of one database. So, if multiple databases are configured, an independent . . .
. . .
Sphider-plus version. The additional line is something like: .submitBox { background: #fff; text-align: center; width:50% ; border:1px solid #070; border-radius: 10px; box-shadow: 10px 10px 3px #777; } Add this line to your userstyle.css and modify it for your individual requirements. Afterwards obligatory enter into the admin backend = . . .

Legal Info Installation Documentation Change Log [ Installation Summary ] Preconditions Sphider-plus requires PHP 5.3 - 8.x (proven up to version 8.3.2) with installed GD, mbstring, PECL and zlib libraries. Additionally, if RAR compressed files should be indexed, the RAR extension is required. Also a MySQL (proven up to version 8.0.32) database . . .
. . .
following PHP and Apache settings should be adjusted on the server before installing Sphider-plus PHP safe_mode : Off (deprected since PHP 5.4) register_globals : Off (deprected since PHP 5.4) allow_url_fopen : On allow_url_include : On (deprecated since PHP 7.4) Webserver mod_rewrite : On display_errors : On error_reporting : E_ALL & . . .
. . .
: On display_errors : On error_reporting : E_ALL & ~E_DEPRECATED & ~E_WARNING & ~E_NOTICE & ~E_STRICT memory_limit : 256MM (minimum) phpinfo : enabled ( disable_functions = phpinfo - change it to: disable_functions = ) AlllowOverride : All (to be found in apache sub folder /conf/httpd.ini) Additional note: The .htaccess files supplied with . . .
. . .
on some servers and might need to be disabled by renaming them. In order to enable correct function of Sphider-plus, please follow the instructions as described below. Additional (but important) note: Please do not edit/modify any of the Sphider-plus scripts and files during first installation. Everything must be performed by means of the admin . . .
. . .
installation. Everything must be performed by means of the admin backend. There are several self tests implemented, which should not be bypassed by altering the scripts. Because as long as the admin interfaces shows error messages and warnings, later index procedures and the search algorithm may fail. - Installation - - New installation - . . .
. . .
Updating from 4.x to 4.y [ Installation of version 4.2024a ] New installation In order to get Sphider-plus running, perform the following steps: 1. Unzip the downloaded file, and upload all folders and files to the server, for example to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL . . .
. . .
to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL server, which will hold the Sphider-plus data tables. Collation of the SQL server connection must be defined to utf8mb4_unicode_ci already before creating any database. In order to get full UNICODE support, the collation of Sphider-plus . . .
. . .
order to get full UNICODE support, the collation of Sphider-plus databases also must be set to utf8mb4_unicode_ci , which is available for MySQL server version 5.5.3 Creation of this database needs to be done outside of the Sphider-plus scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, . . .
. . .
scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, PLESK, or something similar. During this step you already define - Name of database - Username - Password - Database host which will be required later on in step 5 of the installation process. Please take in mind that Sphider-plus . . .
. . .
not be usable for Sphider-plus. Additionally take care that your password should not contain special characters, because some MySQL versions do not process them. Sometimes the 'Database host' is also called 'Server' or 'Database server', like in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with . . .
. . .
in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with something like: http://localhost/public/sphider/admin/admin.php First access to the admin backend is granted without login. Later access to the admin backend will only be granted after login. Use admin as user name and also as password. On first . . .
. . .
to the admin backend will only be granted after login. Use admin as user name and also as password. On first login, there will be several warning messages, because no database is allocated to Sphider-plus. By means of a self test performed each time the Admin is called, also write permission to sub folders, converter availability, etc. are . . .
. . .
converter availability, etc. are checked. Eventually it might become necessary to follow some warning messages like: chmod 777 the folder /admin/tmp/ Additional note: Admin backend login may fail for several server configurations. Please notice the FAQ chapter for several examples how to fix such an issue. 4. Open the sub-menu 'Database' and . . .
. . .
into this section, there will be again several warning messages. At minimum one database has to be declared for: Name of database User name Password Database host Prefix for tables as defined before in step2 externally, when you created the database. Additionally you need to add a free selectable name as table_prefix. This table prefix is . . .
. . .
database, the settings for the non-required databases may remain blank. A corresponding message will be displayed: Mysql server for database 2 is not available! Trying to reconnect to database 2 . . . Cannot connect to this database. Never mind if you don't need it. Installation of multiple databases is described in documentation chapter . . .
. . .
work will be the activation of the database. There are three settings available in the 'Activate / Disable' section: - Select active database for Admin - Select active database for 'Search' user - Select active database for 'Suggest URL' user Each setting allows activating of one database. So, if multiple databases are configured, an independent . . .
. . .
Sphider-plus version. The additional line is something like: .submitBox { background: #fff; text-align: center; width:50% ; border:1px solid #070; border-radius: 10px; box-shadow: 10px 10px 3px #777; } Add this line to your userstyle.css and modify it for your individual requirements. Afterwards obligatory enter into the admin backend = . . .

Legal Info Installation Documentation Change Log [ Installation Summary ] Preconditions Sphider-plus requires PHP 5.3 - 8.x (proven up to version 8.3.2) with installed GD, mbstring, PECL and zlib libraries. Additionally, if RAR compressed files should be indexed, the RAR extension is required. Also a MySQL (proven up to version 8.0.32) database . . .
. . .
following PHP and Apache settings should be adjusted on the server before installing Sphider-plus PHP safe_mode : Off (deprected since PHP 5.4) register_globals : Off (deprected since PHP 5.4) allow_url_fopen : On allow_url_include : On (deprecated since PHP 7.4) Webserver mod_rewrite : On display_errors : On error_reporting : E_ALL & . . .
. . .
: On display_errors : On error_reporting : E_ALL & ~E_DEPRECATED & ~E_WARNING & ~E_NOTICE & ~E_STRICT memory_limit : 256M (minimum) phpinfo : enabled ( disable_functions = phpinfo - change it to: disable_functions = ) AlllowOverride : All (to be found in apache sub folder /conf/httpd.ini) Additional note: The .htaccess files supplied with . . .
. . .
on some servers and might need to be disabled by renaming them. In order to enable correct function of Sphider-plus, please follow the instructions as described below. Additional (but important) note: Please do not edit/modify any of the Sphider-plus scripts and files during first installation. Everything must be performed by means of the admin . . .
. . .
installation. Everything must be performed by means of the admin backend. There are several self tests implemented, which should not be bypassed by altering the scripts. Because as long as the admin interfaces shows error messages and warnings, later index procedures and the search algorithm may fail. - Installation - - New installation - . . .
. . .
Updating from 4.x to 4.y [ Installation of version 4.2024a ] New installation In order to get Sphider-plus running, perform the following steps: 1. Unzip the downloaded file, and upload all folders and files to the server, for example to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL . . .
. . .
to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL server, which will hold the Sphider-plus data tables. Collation of the SQL server connection must be defined to utf8mb4_unicode_ci already before creating any database. In order to get full UNICODE support, the collation of Sphider-plus . . .
. . .
order to get full UNICODE support, the collation of Sphider-plus databases also must be set to utf8mb4_unicode_ci , which is available for MySQL server version 5.5.3 Creation of this database needs to be done outside of the Sphider-plus scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, . . .
. . .
scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, PLESK, or something similar. During this step you already define - Name of database - Username - Password - Database host which will be required later on in step 5 of the installation process. Please take in mind that Sphider-plus . . .
. . .
not be usable for Sphider-plus. Additionally take care that your password should not contain special characters, because some MySQL versions do not process them. Sometimes the 'Database host' is also called 'Server' or 'Database server', like in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with . . .
. . .
in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with something like: http://localhost/public/sphider/admin/admin.php First access to the admin backend is granted without login. Later access to the admin backend will only be granted after login. Use admin as user name and also as password. On first . . .
. . .
to the admin backend will only be granted after login. Use admin as user name and also as password. On first login, there will be several warning messages, because no database is allocated to Sphider-plus. By means of a self test performed each time the Admin is called, also write permission to sub folders, converter availability, etc. are . . .
. . .
converter availability, etc. are checked. Eventually it might become necessary to follow some warning messages like: chmod 777 the folder /admin/tmp/ Additional note: Admin backend login may fail for several server configurations. Please notice the FAQ chapter for several examples how to fix such an issue. 4. Open the sub-menu 'Database' and . . .
. . .
into this section, there will be again several warning messages. At minimum one database has to be declared for: Name of database User name Password Database host Prefix for tables as defined before in step2 externally, when you created the database. Additionally you need to add a free selectable name as table_prefix. This table prefix is . . .
. . .
database, the settings for the non-required databases may remain blank. A corresponding message will be displayed: Mysql server for database 2 is not available! Trying to reconnect to database 2 . . . Cannot connect to this database. Never mind if you don't need it. Installation of multiple databases is described in documentation chapter . . .
. . .
work will be the activation of the database. There are three settings available in the 'Activate / Disable' section: - Select active database for Admin - Select active database for 'Search' user - Select active database for 'Suggest URL' user Each setting allows activating of one database. So, if multiple databases are configured, an independent . . .
. . .
Sphider-plus version. The additional line is something like: .submitBox { background: #fff; text-align: center; width:50% ; border:1px solid #070; border-radius: 10px; box-shadow: 10px 10px 3px #777; } Add this line to your userstyle.css and modify it for your individual requirements. Afterwards obligatory enter into the admin backend = . . .

Legal Info Installation Documentation Change Log [ Installation Summary ] Preconditions Sphider-plus requires PHP 5.3 - 8.x (proven up to version 8.3.2) with installed GD, mbstring, PECL and zlib libraries. Additionally, if RAR compressed files should be indexed, the RAR extension is required. Also a MySQL (proven up to version 8.0.32) database . . .
. . .
following PHP and Apache settings should be adjusted on the server before installing Sphider-plus PHP safe_mode : Off (deprected since PHP 5.4) register_globals : Off (deprected since PHP 5.4) allow_url_fopen : On allow_url_include : On (deprecated since PHP 7.4) Webserver mod_rewrite : On display_errors : On error_reporting : E_ALL & . . .
. . .
: On display_errors : On error_reporting : E_ALL & ~E_DEPRECATED & ~E_WARNING & ~E_NOTICE & ~E_STRICT memory_limit : 256M (minimum) phpinfo : enabled ( disable_functions = phpinfo - change it to: disable_functions = ) AlllowOverride : All (to be found in apache sub folder /conf/httpd.ini) Additional note: The .htaccess files supplied with . . .
. . .
on some servers and might need to be disabled by renaming them. In order to enable correct function of Sphider-plus, please follow the instructions as described below. Additional (but important) note: Please do not edit/modify any of the Sphider-plus scripts and files during first installation. Everything must be performed by means of the admin . . .
. . .
installation. Everything must be performed by means of the admin backend. There are several self tests implemented, which should not be bypassed by altering the scripts. Because as long as the admin interfaces shows error messages and warnings, later index procedures and the search algorithm may fail. - Installation - - New installation - . . .
. . .
Updating from 4.x to 4.y [ Installation of version 4.2024a ] New installation In order to get Sphider-plus running, perform the following steps: 1. Unzip the downloaded file, and upload all folders and files to the server, for example to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL . . .
. . .
to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL server, which will hold the Sphider-plus data tables. Collation of the SQL server connection must be defined to utf8mb4_unicode_ci already before creating any database. In order to get full UNICODE support, the collation of Sphider-plus . . .
. . .
order to get full UNICODE support, the collation of Sphider-plus databases also must be set to utf8mb4_unicode_ci , which is available for MySQL server version 553 Creation of this database needs to be done outside of the Sphider-plus scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, . . .
. . .
scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, PLESK, or something similar. During this step you already define - Name of database - Username - Password - Database host which will be required later on in step 5 of the installation process. Please take in mind that Sphider-plus . . .
. . .
not be usable for Sphider-plus. Additionally take care that your password should not contain special characters, because some MySQL versions do not process them. Sometimes the 'Database host' is also called 'Server' or 'Database server', like in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with . . .
. . .
in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with something like: http://localhost/public/sphider/admin/admin.php First access to the admin backend is granted without login. Later access to the admin backend will only be granted after login. Use admin as user name and also as password. On first . . .
. . .
to the admin backend will only be granted after login. Use admin as user name and also as password. On first login, there will be several warning messages, because no database is allocated to Sphider-plus. By means of a self test performed each time the Admin is called, also write permission to sub folders, converter availability, etc. are . . .
. . .
converter availability, etc. are checked. Eventually it might become necessary to follow some warning messages like: chmod 777 the folder /admin/tmp/ Additional note: Admin backend login may fail for several server configurations. Please notice the FAQ chapter for several examples how to fix such an issue. 4. Open the sub-menu 'Database' and . . .
. . .
into this section, there will be again several warning messages. At minimum one database has to be declared for: Name of database User name Password Database host Prefix for tables as defined before in step2 externally, when you created the database. Additionally you need to add a free selectable name as table_prefix. This table prefix is . . .
. . .
database, the settings for the non-required databases may remain blank. A corresponding message will be displayed: Mysql server for database 2 is not available! Trying to reconnect to database 2 . . . Cannot connect to this database. Never mind if you don't need it. Installation of multiple databases is described in documentation chapter . . .
. . .
work will be the activation of the database. There are three settings available in the 'Activate / Disable' section: - Select active database for Admin - Select active database for 'Search' user - Select active database for 'Suggest URL' user Each setting allows activating of one database. So, if multiple databases are configured, an independent . . .
. . .
Sphider-plus version. The additional line is something like: .submitBox { background: #fff; text-align: center; width:50% ; border:1px solid #070; border-radius: 10px; box-shadow: 10px 10px 3px #777; } Add this line to your userstyle.css and modify it for your individual requirements. Afterwards obligatory enter into the admin backend = . . .

Legal Info Installation Documentation Change Log [ Installation Summary ] Preconditions Sphider-plus requires PHP 5.3 - 8.x (proven up to version 8.3.2) with installed GD, mbstring, PECL and zlib libraries. Additionally, if RAR compressed files should be indexed, the RAR extension is required. Also a MySQL (proven up to version 8.0.32) database . . .
. . .
following PHP and Apache settings should be adjusted on the server before installing Sphider-plus PHP safe_mode Off (deprected since PHP 5.4) register_globals Off (deprected since PHP 5.4) allow_url_fopen On allow_url_include On (deprecated since PHP 7.4) Webserver mod_rewrite On display_errors On error_reporting E_ALL & . . .
. . .
On display_errors On error_reporting E_ALL & ~E_DEPRECATED & ~E_WARNING & ~E_NOTICE & ~E_STRICT memory_limit 256M (minimum) phpinfo enabled ( disable_functions = phpinfo - change it to disable_functions = ) AlllowOverride All (to be found in apache sub folder /conf/httpd.ini) Additional note The .htaccess files supplied with . . .
. . .
on some servers and might need to be disabled by renaming them. In order to enable correct function of Sphider-plus, please follow the instructions as described below. Additional (but important) note Please do not edit/modify any of the Sphider-plus scripts and files during first installation. Everything must be performed by means of the admin . . .
. . .
installation. Everything must be performed by means of the admin backend. There are several self tests implemented, which should not be bypassed by altering the scripts. Because as long as the admin interfaces shows error messages and warnings, later index procedures and the search algorithm may fail. - Installation - - New installation - . . .
. . .
Updating from 4.x to 4.y [ Installation of version 4.2024a ] New installation In order to get Sphider-plus running, perform the following steps 1. Unzip the downloaded file, and upload all folders and files to the server, for example to C\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL . . .
. . .
to C\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL server, which will hold the Sphider-plus data tables. Collation of the SQL server connection must be defined to utf8mb4_unicode_ci already before creating any database. In order to get full UNICODE support, the collation of Sphider-plus . . .
. . .
order to get full UNICODE support, the collation of Sphider-plus databases also must be set to utf8mb4_unicode_ci , which is available for MySQL server version 5.5.3 Creation of this database needs to be done outside of the Sphider-plus scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, . . .
. . .
scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, PLESK, or something similar. During this step you already define - Name of database - Username - Password - Database host which will be required later on in step 5 of the installation process. Please take in mind that Sphider-plus . . .
. . .
not be usable for Sphider-plus. Additionally take care that your password should not contain special characters, because some MySQL versions do not process them. Sometimes the 'Database host' is also called 'Server' or 'Database server', like in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with . . .
. . .
in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with something like http//localhost/public/sphider/admin/admin.php First access to the admin backend is granted without login. Later access to the admin backend will only be granted after login. Use admin as user name and also as password. On first . . .
. . .
to the admin backend will only be granted after login. Use admin as user name and also as password. On first login, there will be several warning messages, because no database is allocated to Sphider-plus. By means of a self test performed each time the Admin is called, also write permission to sub folders, converter availability, etc. are . . .
. . .
converter availability, etc. are checked. Eventually it might become necessary to follow some warning messages like chmod 777 the folder /admin/tmp/ Additional note Admin backend login may fail for several server configurations. Please notice the FAQ chapter for several examples how to fix such an issue. 4. Open the sub-menu 'Database' and . . .
. . .
into this section, there will be again several warning messages. At minimum one database has to be declared for Name of database User name Password Database host Prefix for tables as defined before in step2 externally, when you created the database. Additionally you need to add a free selectable name as table_prefix. This table prefix is . . .
. . .
database, the settings for the non-required databases may remain blank. A corresponding message will be displayed Mysql server for database 2 is not available! Trying to reconnect to database 2 . . . Cannot connect to this database. Never mind if you don't need it. Installation of multiple databases is described in documentation chapter . . .
. . .
work will be the activation of the database. There are three settings available in the 'Activate / Disable' section - Select active database for Admin - Select active database for 'Search' user - Select active database for 'Suggest URL' user Each setting allows activating of one database. So, if multiple databases are configured, an independent . . .
. . .
Sphider-plus version. The additional line is something like .submitBox { background #fff; text-align center; width50% ; border1px solid #070; border-radius 10px; box-shadow 10px 10px 3px #777; } Add this line to your userstyle.css and modify it for your individual requirements. Afterwards obligatory enter into the admin backend = . . .

Legal Info Installation Documentation Change Log [ Installation Summary ] Preconditions Sphider-plus requires PHP 5.3 - 8.x (proven up to version 8.3.2) with installed GD, mbstring, PECL and zlib libraries. Additionally, if RAR compressed files should be indexed, the RAR extension is required. Also a MySQL (proven up to version 8.0.32) database . . .
. . .
following PHP and Apache settings should be adjusted on the server before installing Sphider-plus PHP safe_mode Off (deprected since PHP 5.4) register_globals Off (deprected since PHP 5.4) allow_url_fopen On allow_url_include On (deprecated since PHP 7.4) Webserver mod_rewrite On display_errors On error_reporting E_ALL & . . .
. . .
On display_errors On error_reporting E_ALL & ~E_DEPRECATED & ~E_WARNING & ~E_NOTICE & ~E_STRICT memory_limit 256M (minimum) phpinfo enabled ( disable_functions = phpinfo - change it to disable_functions = ) AlllowOverride All (to be found in apache sub folder /conf/httpd.ini) Additional note The .htaccess files supplied with . . .
. . .
on some servers and might need to be disabled by renaming them. In order to enable correct function of Sphider-plus, please follow the instructions as described below. Additional (but important) note Please do not edit/modify any of the Sphider-plus scripts and files during first installation. Everything must be performed by means of the admin . . .
. . .
installation. Everything must be performed by means of the admin backend. There are several self tests implemented, which should not be bypassed by altering the scripts. Because as long as the admin interfaces shows error messages and warnings, later index procedures and the search algorithm may fail. - Installation - - New installation - . . .
. . .
Updating from 4.x to 4.y [ Installation of version 4.2024a ] New installation In order to get Sphider-plus running, perform the following steps 1. Unzip the downloaded file, and upload all folders and files to the server, for example to C\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL . . .
. . .
to C\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL server, which will hold the Sphider-plus data tables. Collation of the SQL server connection must be defined to utf8mb4_unicode_ci already before creating any database. In order to get full UNICODE support, the collation of Sphider-plus . . .
. . .
order to get full UNICODE support, the collation of Sphider-plus databases also must be set to utf8mb4_unicode_ci , which is available for MySQL server version 5.5.3 Creation of this database needs to be done outside of the Sphider-plus scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, . . .
. . .
scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, PLESK, or something similar. During this step you already define - Name of database - Username - Password - Database host which will be required later on in step 5 of the installation process. Please take in mind that Sphider-plus . . .
. . .
not be usable for Sphider-plus. Additionally take care that your password should not contain special characters, because some MySQL versions do not process them. Sometimes the 'Database host' is also called 'Server' or 'Database server', like in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with . . .
. . .
in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with something like http//localhost/public/sphider/admin/admin.php First access to the admin backend is granted without login. Later access to the admin backend will only be granted after login. Use admin as user name and also as password. On first . . .
. . .
to the admin backend will only be granted after login. Use admin as user name and also as password. On first login, there will be several warning messages, because no database is allocated to Sphider-plus. By means of a self test performed each time the Admin is called, also write permission to sub folders, converter availability, etc. are . . .
. . .
converter availability, etc. are checked. Eventually it might become necessary to follow some warning messages like chmod 777 the folder /admin/tmp/ Additional note Admin backend login may fail for several server configurations. Please notice the FAQ chapter for several examples how to fix such an issue. 4. Open the sub-menu 'Database' and . . .
. . .
into this section, there will be again several warning messages. At minimum one database has to be declared for Name of database User name Password Database host Prefix for tables as defined before in step2 externally, when you created the database. Additionally you need to add a free selectable name as table_prefix. This table prefix is . . .
. . .
database, the settings for the non-required databases may remain blank. A corresponding message will be displayed Mysql server for database 2 is not available! Trying to reconnect to database 2 . . . Cannot connect to this database. Never mind if you don't need it. Installation of multiple databases is described in documentation chapter . . .
. . .
work will be the activation of the database. There are three settings available in the 'Activate / Disable' section - Select active database for Admin - Select active database for 'Search' user - Select active database for 'Suggest URL' user Each setting allows activating of one database. So, if multiple databases are configured, an independent . . .
. . .
Sphider-plus version. The additional line is something like .submitBox { background #fff; text-align center; width50% ; border1px solid #070; border-radius 10px; box-shadow 10px 10px 3px #777; } Add this line to your userstyle.css and modify it for your individual requirements. Afterwards obligatory enter into the admin backend = . . .

Legal Info Installation Documentation Change Log [ Installation Summary ] Preconditions Sphider-plus requires PHP 5.3 - 8.x (proven up to version 8.3.2) with installed GD, mbstring, PECL and zlib libraries. Additionally, if RAR compressed files should be indexed, the RAR extension is required. Also a MySQL (proven up to version 8.0.32) database . . .
. . .
following PHP and Apache settings should be adjusted on the server before installing Sphider-plus PHP safe_mode : Off (deprected since PHP 5.4) register_globals : Off (deprected since PHP 5.4) allow_url_fopen : On allow_url_include : On (deprecated since PHP 7.4) Webserver mod_rewrite : On display_errors : On error_reporting : E_ALL & . . .
. . .
: On display_errors : On error_reporting : E_ALL & ~E_DEPRECATED & ~E_WARNING & ~E_NOTICE & ~E_STRICT memory_limit : 256M (minimum) phpinfo : enabled ( disable_functions = phpinfo - change it to: disable_functions = ) AlllowOverride : All (to be found in apache sub folder /conf/httpd.ini) Additional note: The .htaccess files supplied with . . .
. . .
on some servers and might need to be disabled by renaming them. In order to enable correct function of Sphider-plus, please follow the instructions as described below. Additional (but important) note: Please do not edit/modify any of the Sphider-plus scripts and files during first installation. Everything must be performed by means of the admin . . .
. . .
installation. Everything must be performed by means of the admin backend. There are several self tests implemented, which should not be bypassed by altering the scripts. Because as long as the admin interfaces shows error messages and warnings, later index procedures and the search algorithm may fail. - Installation - - New installation - . . .
. . .
Updating from 4.x to 4.y [ Installation of version 4.2024a ] New installation In order to get Sphider-plus running, perform the following steps: 1. Unzip the downloaded file, and upload all folders and files to the server, for example to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL . . .
. . .
to: C:\programms\xampp\htdocs\public\sphider-plus\ 2. Create at minimum one database as part of the MySQL server, which will hold the Sphider-plus data tables. Collation of the SQL server connection must be defined to utf8mb4_unicode_ci already before creating any database. In order to get full UNICODE support, the collation of Sphider-plus . . .
. . .
order to get full UNICODE support, the collation of Sphider-plus databases also must be set to utf8mb4_unicode_ci , which is available for MySQL server version 5.5.3 Creation of this database needs to be done outside of the Sphider-plus scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, . . .
. . .
scripts, before entering into the admin backend the first time. For example with a tool like phpMyAdmin, PLESK, or something similar. During this step you already define - Name of database - Username - Password - Database host which will be required later on in step 5 of the installation process. Please take in mind that Sphider-plus . . .
. . .
not be usable for Sphider-plus. Additionally take care that your password should not contain special characters, because some MySQL versions do not process them. Sometimes the 'Database host' is also called 'Server' or 'Database server', like in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with . . .
. . .
in the tool phpMyAdmin. 3. Open the Admin interface with your browser by addressing the Admin with something like: http://localhost/public/sphider/admin/admin.php First access to the admin backend is granted without login. Later access to the admin backend will only be granted after login. Use admin as user name and also as password. On first . . .
. . .
to the admin backend will only be granted after login. Use admin as user name and also as password. On first login, there will be several warning messages, because no database is allocated to Sphider-plus. By means of a self test performed each time the Admin is called, also write permission to sub folders, converter availability, etc. are . . .
. . .
converter availability, etc. are checked. Eventually it might become necessary to follow some warning messages like: chmod 777 the folder /admin/tmp/ Additional note: Admin backend login may fail for several server configurations. Please notice the FAQ chapter for several examples how to fix such an issue. 4. Open the sub-menu 'Database' and . . .
. . .
into this section, there will be again several warning messages. At minimum one database has to be declared for: Name of database User name Password Database host Prefix for tables as defined before in step2 externally, when you created the database. Additionally you need to add a free selectable name as table_prefix. This table prefix is . . .
. . .
database, the settings for the non-required databases may remain blank. A corresponding message will be displayed: Mysql server for database 2 is not available! Trying to reconnect to database 2 . . . Cannot connect to this database. Never mind if you don't need it. Installation of multiple databases is described in documentation chapter . . .
. . .
work will be the activation of the database. There are three settings available in the 'Activate / Disable' section: - Select active database for Admin - Select active database for 'Search' user - Select active database for 'Suggest URL' user Each setting allows activating of one database. So, if multiple databases are configured, an independent . . .
. . .
Sphider-plus version. The additional line is something like: .submitBox { background: #fff; text-align: center; width50%% ; border:1px solid #070; border-radius: 10px; box-shadow: 10px 10px 3px #777; } Add this line to your userstyle.css and modify it for your individual requirements. Afterwards obligatory enter into the admin backend = . . .

7.   Sphider-plus - The PHP Search Engine Visit in a new window


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <xsize>635</xsize>; <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12 2024 is the actual release. 1. Settings customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF RSD RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON XML and RSS result output [ Documentation ] 1. Settings customizing and statistics If you want to change settings behavior and design of Sphider-plus you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add edit delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add edit delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log separate and bulk delete - Clear Thumbnail images separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin' 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal) you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr. total clicks last . . .
. . .
Link addr. total clicks last clicked last query (Top 50) - Most Popular Searches for media links offering: Link addr. total clicks last clicked last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query Results Queried at Time taken User IP Country Host name (Latest100) - Index log offering: . . .
. . .
Country Host name (Latest100) - Index log offering: File-name index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP host query impact involved tags date and time of intrusion. - Flood attempts log offering: IP query date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP query date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software environment MySQL PDF-converter image functions php.ini file PHP integration PHP security info. Each item holding lists of details. All text links media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page because the tags need to be added (edited) to the page. A more flexible method . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section nav aside hgroup article header footer etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus during index / re-index there was no printout available because: - Several servers especially on Win32 buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds) AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title filename size of original image link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds present the text extract of 350 characters as part of product attributes. If this option is not activated all products of the XML product feed will be presented in search results and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases which are correctly configured assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results the complete database had to been browsed. Starting with version 2.5 an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour""ip":"::1""host_name":"guard_007-hoster""query_time":"2016-03-18 10:56:23 AM""consumed":0.016"total_results":2"num_of_results":2"from":1"to":2"text_results":[{"num":1"weight":"100.0""url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf""title":" Info_eng""fulltxt":" . . . . . .
. . .
placed on the ground floor today's big kitchen with the heating fireplace for open and close mode of . . . ""page_size":"5661.6kb""domain_name":"www.english.le-piaggie.info"}{"num":2"weight":"50.0""url":"http:\/\/www.english.le-piaggie.info\/html\/description.html""title":" Description""fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house about 800 m to reach the . . . ""page_size":"26.4 kb""domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 15 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 105623 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 25 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 25 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 25, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 45 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 45 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 65 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 165 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 165 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 165 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 165 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-2147561html-21.47.56_1.html (log file of first thread) db2_100524-2147561html-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524-2147561html - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-2147561html-21.47.56_ID1.html (log file of first thread) db2_100524-2147561html-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-2147561html_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 2147561html - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-2147561html_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-2147561html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-214756ID1html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"56616kbdomainnamewwwenglishle-piaggieinfo}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1-1 iso-8859-1-2 iso-8859-1-3 iso-8859-1-4 iso-8859-1-5 iso-8859-1-6 iso-8859-1-7 iso-8859-1-8 iso-8859-1-9 iso-8859-1-10 iso-8859-1-11 iso-8859-1-12 iso-8859-1-13 iso-8859-1-14 iso-8859-1-15 iso-8859-1-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-1-12 iso-8859-1-13 iso-8859-1-14 iso-8859-1-15 iso-8859-1-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859-1 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like Sites - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories - Add, edit, delete - Create new subcategory under Index - Basic indexing options - Advanced options Clean - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering . . .
. . .
Country, Host name (Latest100) - Index log offering File-name, index date and delete option - sitemap log offering sitemap.xml output sitemap list offering file/page suffixes - IDS log offering IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs http//www.example.com/product.php?item=swedish-fish&category=gummy-candy http//www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL http//www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend First option For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http//www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http//www.abc.de/index.php</url> . . .
. . .
the following content {"query""colour","ip""1","host_name""guard_007-hoster","query_time""2016-03-18 105623 AM","consumed"0.016,"total_results"2,"num_of_results"2,"from"1,"to"2,"text_results"[{"num"1,"weight""100.0","url""http\/\/www.english.le-piaggie.info\/Info_eng.pdf","title"" Info_eng","fulltxt"" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size""5,661.6kb","domain_name""www.english.le-piaggie.info"},{"num"2,"weight""50.0","url""http\/\/www.english.le-piaggie.info\/html\/description.html","title"" Description","fulltxt"" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size""26.4 kb","domain_name""www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like Sites - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories - Add, edit, delete - Create new subcategory under Index - Basic indexing options - Advanced options Clean - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering . . .
. . .
Country, Host name (Latest100) - Index log offering File-name, index date and delete option - sitemap log offering sitemap.xml output sitemap list offering file/page suffixes - IDS log offering IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs http//www.example.com/product.php?item=swedish-fish&category=gummy-candy http//www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL http//www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend First option For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http//www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http//www.abc.de/index.php</url> . . .
. . .
the following content {"query""colour","ip""1","host_name""guard_007-hoster","query_time""2016-03-18 105623 AM","consumed"0.016,"total_results"2,"num_of_results"2,"from"1,"to"2,"text_results"[{"num"1,"weight""100.0","url""http\/\/www.english.le-piaggie.info\/Info_eng.pdf","title"" Info_eng","fulltxt"" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size""5,661.6kb","domain_name""www.english.le-piaggie.info"},{"num"2,"weight""50.0","url""http\/\/www.english.le-piaggie.info\/html\/description.html","title"" Description","fulltxt"" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size""26.4 kb","domain_name""www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results"2numofresults2from1to2textresults[{num1weight1000urlhttp\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle"num_of_results"2numofresults2from1to2textresults[{num1weight1000urlhttp\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle"from":1,"to"2numofresults2from1to2textresults[{num1weight1000urlhttp\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num"2numofresults2from1to2textresults[{num1weight1000urlhttp\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2100524-2147561html-21.47.56_1.html (log file of first thread) db2100524-2147561html-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2100524-2147561html-21.47.56_ID1.html (log file of first thread) db2100524-2147561html-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2100524-2147561html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2100524-214756ID1html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2100524-2148122html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2100524-214812ID2html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domainnamewwwenglishle-piaggieinfo}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domainnamewwwenglishle-piaggieinfo}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http//wwwexamplecom/productphpitem=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1-1 iso-8859-1-2 iso-8859-1-3 iso-8859-1-4 iso-8859-1-5 iso-8859-1-6 iso-8859-1-7 iso-8859-1-8 iso-8859-1-9 iso-8859-1-10 iso-8859-1-11 iso-8859-1-12 iso-8859-1-13 iso-8859-1-14 iso-8859-1-15 iso-8859-1-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-1-12 iso-8859-1-13 iso-8859-1-14 iso-8859-1-15 iso-8859-1-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non iso-8859-1 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum)) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5]] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","pagesize56616kbdomainnamewwwenglishle-piaggieinfo}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","pagesize56616kbdomainnamewwwenglishle-piaggieinfo}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5]] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"wwwenglishle-piaggieinfo\/Infoengpdftitle-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"wwwenglishle-piaggieinfo\/Infoengpdftitle-piaggie.info"}]} Top . . .


5a2 All required information. Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails w 3e80 ith ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
4.4 Ignoring parts of a page Sphider-plus includes an option to exclude parts of pages from being indexed. Thi 775e s can for example be used to prevent search result flooding when certain keywords appear on certain part in most pages (like a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

8.   Sphider-plus - The PHP Search Engine Visit in a new window

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <xsize>635</xsize>; <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12 2024 is the actual release. 1. Settings customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF RSD RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON XML and RSS result output [ Documentation ] 1. Settings customizing and statistics If you want to change settings behavior and design of Sphider-plus you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add edit delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add edit delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log separate and bulk delete - Clear Thumbnail images separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin' 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal) you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr. total clicks last clicked . . .
. . .
Link addr. total clicks last clicked last query (Top 50) - Most Popular Searches for media links offering: Link addr. total clicks last clicked last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query Results Queried at Time taken User IP Country Host name (Latest100) - Index log offering: . . .
. . .
Country Host name (Latest100) - Index log offering: File-name index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP host query impact involved tags date and time of intrusion. - Flood attempts log offering: IP query date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP query date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software environment MySQL PDF-converter image functions php.ini file PHP integration PHP security info. Each item holding lists of details. All text links media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page because the tags need to be added (edited) to the page. A more flexible method . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section nav aside hgroup article header footer etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus during index / re-index there was no printout available because: - Several servers especially on Win32 buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds) AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title filename size of original image link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds present the text extract of 350 characters as part of product attributes. If this option is not activated all products of the XML product feed will be presented in search results and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases which are correctly configured assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results the complete database had to been browsed. Starting with version 2.5 an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour""ip":"::1""host_name":"guard_007-hoster""query_time":"2016-03-18 10:56:23 AM""consumed":0.016"total_results":2"num_of_results":2"from":1"to":2"text_results":[{"num":1"weight":"100.0""url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf""title":" Info_eng""fulltxt":" . . . . . .
. . .
placed on the ground floor today's big kitchen with the heating fireplace for open and close mode of . . . ""page_size":"5661.6kb""domain_name":"www.english.le-piaggie.info"}{"num":2"weight":"50.0""url":"http:\/\/www.english.le-piaggie.info\/html\/description.html""title":" Description""fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house about 800 m to reach the . . . ""page_size":"26.4 kb""domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 15 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 105623 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 25 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 25 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 25, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 45 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 45 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 65 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 165 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 165 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 165 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 165 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-2147561html-21.47.56_1.html (log file of first thread) db2_100524-2147561html-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524-2147561html - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-2147561html-21.47.56_ID1.html (log file of first thread) db2_100524-2147561html-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-2147561html_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 2147561html - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-2147561html_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-2147561html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-214756ID1html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"56616kbdomainnamewwwenglishle-piaggieinfo}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1-1 iso-8859-1-2 iso-8859-1-3 iso-8859-1-4 iso-8859-1-5 iso-8859-1-6 iso-8859-1-7 iso-8859-1-8 iso-8859-1-9 iso-8859-1-10 iso-8859-1-11 iso-8859-1-12 iso-8859-1-13 iso-8859-1-14 iso-8859-1-15 iso-8859-1-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-1-12 iso-8859-1-13 iso-8859-1-14 iso-8859-1-15 iso-8859-1-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859-1 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like Sites - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories - Add, edit, delete - Create new subcategory under Index - Basic indexing options - Advanced options Clean - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering . . .
. . .
Country, Host name (Latest100) - Index log offering File-name, index date and delete option - sitemap log offering sitemap.xml output sitemap list offering file/page suffixes - IDS log offering IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs http//www.example.com/product.php?item=swedish-fish&category=gummy-candy http//www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL http//www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend First option For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http//www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http//www.abc.de/index.php</url> . . .
. . .
the following content {"query""colour","ip""1","host_name""guard_007-hoster","query_time""2016-03-18 105623 AM","consumed"0.016,"total_results"2,"num_of_results"2,"from"1,"to"2,"text_results"[{"num"1,"weight""100.0","url""http\/\/www.english.le-piaggie.info\/Info_eng.pdf","title"" Info_eng","fulltxt"" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size""5,661.6kb","domain_name""www.english.le-piaggie.info"},{"num"2,"weight""50.0","url""http\/\/www.english.le-piaggie.info\/html\/description.html","title"" Description","fulltxt"" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size""26.4 kb","domain_name""www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like Sites - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories - Add, edit, delete - Create new subcategory under Index - Basic indexing options - Advanced options Clean - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering . . .
. . .
Country, Host name (Latest100) - Index log offering File-name, index date and delete option - sitemap log offering sitemap.xml output sitemap list offering file/page suffixes - IDS log offering IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs http//www.example.com/product.php?item=swedish-fish&category=gummy-candy http//www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL http//www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend First option For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http//www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http//www.abc.de/index.php</url> . . .
. . .
the following content {"query""colour","ip""1","host_name""guard_007-hoster","query_time""2016-03-18 105623 AM","consumed"0.016,"total_results"2,"num_of_results"2,"from"1,"to"2,"text_results"[{"num"1,"weight""100.0","url""http\/\/www.english.le-piaggie.info\/Info_eng.pdf","title"" Info_eng","fulltxt"" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size""5,661.6kb","domain_name""www.english.le-piaggie.info"},{"num"2,"weight""50.0","url""http\/\/www.english.le-piaggie.info\/html\/description.html","title"" Description","fulltxt"" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size""26.4 kb","domain_name""www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results"2numofresults2from1to2textresults[{num1weight1000urlhttp\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle"num_of_results"2numofresults2from1to2textresults[{num1weight1000urlhttp\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle"from":1,"to"2numofresults2from1to2textresults[{num1weight1000urlhttp\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num"2numofresults2from1to2textresults[{num1weight1000urlhttp\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2100524-2147561html-21.47.56_1.html (log file of first thread) db2100524-2147561html-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2100524-2147561html-21.47.56_ID1.html (log file of first thread) db2100524-2147561html-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2100524-2147561html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2100524-214756ID1html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2100524-2148122html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2100524-214812ID2html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domainnamewwwenglishle-piaggieinfo}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domainnamewwwenglishle-piaggieinfo}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http//wwwexamplecom/productphpitem=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1-1 iso-8859-1-2 iso-8859-1-3 iso-8859-1-4 iso-8859-1-5 iso-8859-1-6 iso-8859-1-7 iso-8859-1-8 iso-8859-1-9 iso-8859-1-10 iso-8859-1-11 iso-8859-1-12 iso-8859-1-13 iso-8859-1-14 iso-8859-1-15 iso-8859-1-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-1-12 iso-8859-1-13 iso-8859-1-14 iso-8859-1-15 iso-8859-1-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non iso-8859-1 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum)) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5]] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","pagesize56616kbdomainnamewwwenglishle-piaggieinfo}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","pagesize56616kbdomainnamewwwenglishle-piaggieinfo}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5]] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"wwwenglishle-piaggieinfo\/Infoengpdftitle-piaggie.info"},{"num":2,"weight":"50.0","url":"http:\/\/wwwenglishle-piaggieinfo\/Infoengpdftitle-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"wwwenglishle-piaggieinfo\/Infoengpdftitle-piaggie.info"}]} Top . . .

Introduction Release and Legal Info Installation Documentation Change Log [ Documentation Summary ] Preamble: The info presented here is valid only for the latest release of Sphider-plus. At present version 4.2024a published February 12, 2024 is the actual release. 1. Settings, customizing and statistics 2. Indexing 2.1 Various options 2.2 . . .
. . .
2. Indexing 2.1 Various options 2.2 Allow other hosts in same domain 2.3 Word stemming 2.4 Periodical Re-indexing 2.5 Preferred indexing 2.6 Multithreaded indexing 2.7 Create thumbnails during index procedure 2.8 Prevent indexing of known malware and pishing pages 2.9 Follow and create sitemap files 2.10 Use private sitemap instead of global . . .
. . .
Sitemap file 3. Using the indexer from command line 3.1 All options 3.2 Multithreaded indexing 4. Keeping pages, words and files from being indexed 4.1 robots.txt 4.2 Must include / must not include string list 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> . . .
. . .
charset' 6. Search modes 6.1 Search with wildcards * 6.2 Strict search ! 6.3 Tolerant search 6.4 Link search 6.5 Media search 6.6 Search only in one domain 6.7 Search in categories 6.8 Greek language support 6.9 Block queries 7. Chronological order for result listing 7.1 Text result listing 7.2 Media result listing 8. PDF converter 9. Clean . . .
. . .
of logging data 11. Error messages and Debug mode 12. Delete secondary characters 13. Media search for images, audio streams and videos 13.1 Media indexing 13.2 Not supported media content 13.3 Search for media content 13.4 Statistics for media content 14. Feed support 14.1 XML product feeds 14.2 RDF, RSD, RSS and Atom feeds 15. Result . . .
. . .
Overview 16.2 Definition and configuration 16.3 Activate / disable database 16.4 Backup & Restore of databases 16.5 Copy & and Move 16.6 Enhancing functionality of multiple database support 17. Search in categories 17.1 Hierachical structure 17.2 Parallel structure 18. User suggested sites 19. Vulnerability protection 19.1 Prevent queries . . .
. . .
templates 22.2 Embed the search engine into existing HTML code 22.3 The different style sheet files 23. JSON, XML and RSS result output [ Documentation ] 1. Settings, customizing and statistics If you want to change settings, behavior and design of Sphider-plus, you can do so by means of the Admin interface. There is a wide range of . . .
. . .
interface. There is a wide range of settings foreseen for Sphider-plus. Separated into different submenus like: Sites: - Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - . . .
. . .
URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with any link - Clean links not associated with any site - Clean Category table not associated with any site - Clean Media links - Clear Temp table . . .
. . .
- Clear Search log - Clear 'Most Popular Page Links' log - Clear 'Most Popular Media Links' log - Clear Spider log, separate and bulk delete - Clear Thumbnail images, separate and bulk delete - Clear Text cache - Clear Media cache - Clear IDS log file - Clear flood attempts log file - Clear all entries in addurl or banned table - Truncate all . . .
. . .
flood attempts log file - Clear all entries in addurl or banned table - Truncate all tables in database Settings: - General Settings - Index Log Settings - Spider Settings - Search Settings - Order of Result listing - Suggest Options - Page Indexing Weights Database: - Configure up to 5 databases with unlimited number of table sets - Activate . . .
. . .
Database: - Configure up to 5 databases with unlimited number of table sets - Activate separately for 'Admin', 'Search' user and 'Suggest URL user' - Backup / Restore - Copy / Move - Optimize Templates: In order to enable customer's integration of Sphider-plus into existing sites, HTML templates are prepared for Search form Text result . . .
. . .
Search form Text result listing Media result listing Most popular queries etc. Three different designs are offered, which may be selected in submenu 'Settings'. If the layout does not fit the design of your site (which is normal), you may create new designs and modify the appropriate file /templates/My_template/adminstyle.css . . .
. . .
the appropriate file /templates/My_template/adminstyle.css /templates/My_template/userstyle.css Statistics output: - Top keywords (Top 50 with hit counter). - All indexed thumbnails with ID3 and EXIF info. - Larges pages offering link URL and file size. - Most Popular Searches for text links offering: Link addr., total clicks, last clicked, . . .
. . .
Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Searches for media links offering: Link addr., total clicks, last clicked, last query (Top 50) - Most Popular Links (click counter). - Search log offering: Query, Results, Queried at, Time taken, User IP, Country, Host name (Latest100) - Index log offering: . . .
. . .
Country, Host name (Latest100) - Index log offering: File-name, index date and delete option - sitemap log offering: sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto . . .
. . .
attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini file PHP integration, PHP security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in . . .
. . .
individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level will be used by the option 'Re-index only preferred sites'. Level 1 will be interpreted as most important, while . . .
. . .
less all index results will be stored in log files in sub folder /admin/log/ The names of the log files look like: db2_100524-21.47.56_1.html (log file of first thread) db2_100524-21.48.12_2.html (log file of second thread) and is build by the following items: db2 - Number of database. 100524 - Date (May 24, 2010) 21.47.56 - Time when this thread . . .
. . .
spider.php -new1 php spider.php -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) IDs could be defined by personal requirements, but the limitations for file names with respect to the OS should be taken . . .
. . .
but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast re-index procedure. Once prepared, multithreaded re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: php spider.php -erased1 php . . .
. . .
<! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to be added (edited) to the page. A more flexible method, . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ menu[0-5] / 4.6 Indexing only parts of a page by <div id='abc'> If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between <div . . .
. . .
a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */ table[0-5] / 4.7 Ignore HTML elements defined by <tagname> . . </tagname> This option is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. HTML elements . . .
. . .
contain a regexp pattern. The regexp needs to be introduced by */ and must be ended with another slash. Example: */nav[0-5]/ Please keep in mind that element names placed in /include/common/elements_not.txt will be processed case-sensitive. 4.8 Index only HTML elements defined by <tagname> . . </tagname> This is the vice versa . . .
. . .
of all the duplicate content URLs: http://www.example.com/product.php?item=swedish-fish&category=gummy-candy http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678 and Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. The . . .
. . .
charset are supported by the ConvertCharset function and will be used to convert text into UTF-8 Unicode: WINDOWS windows-1250 - Central Europe windows-1251 - Cyrillic windows-1252 - Latin I windows-1253 - Greek windows-1254 - Turkish windows-1255 - Hebrew windows-1256 - Arabic windows-1257 - Baltic windows-1258 - Viet Nam cp874 - Thai - this . . .
. . .
- Baltic windows-1258 - Viet Nam cp874 - Thai - this file is also for DOS DOS cp437 - Latin US cp737 - Greek cp775 - BaltRim cp850 - Latin1 cp852 - Latin2 cp855 - Cyrylic cp857 - Turkish cp860 - Portuguese cp861 - Iceland cp862 - Hebrew cp863 - Canada cp864 - Arabic cp865 - Nordic cp866 - Cyrylic Russian (this is the one, used in IE . . .
. . .
IE Cyrillic (DOS) ) cp869 - Greek2 MAC (Apple) x-mac-cyrillic x-mac-greek x-mac-icelandic x-mac-ce x-mac-roman ISO iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-9 iso-8859-10 iso-8859-11 iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 . . .
. . .
iso-8859-12 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 MISCELLANEOUS gsm0338 (ETSI GSM 03.38) cp037 cp424 cp500 cp856 cp875 cp1006 cp1026 koi8-r (Cyrillic) koi8-u (Cyrillic Ukrainian) nextstep us-ascii us-ascii-quotes DSP implementation for NeXT stdenc symbol zdingbat And specially for old Polish programs: mazovia This list is to be read . . .
. . .
Access denied; you need the RELOAD privilege. . . Top 10. Enable real-time output of logging data Up to version 1.5 of Sphider-plus, during index / re-index there was no printout available because: - Several servers, especially on Win32, buffer the output from the script until it terminates before transmitting the results to the browser. - . . .
. . .
is seen. - Some versions of Microsoft Internet Explorer only start to display the page after they have received 256 bytes of output. As progress was not presented during index / re- index procedure, waiting for results became a pain in the neck. Selectable in Admin setting together with the update interval (1 - 10 seconds), AJAX technology was . . .
. . .
characters in front of words Warning: This option should be used with special care and not be activated for non ISO-8859 charsets. Some special characters as part of the word ending might be erased by accidental. Top 13. Media search for images, audio streams and videos 13.1 Media indexing Index of media files is enabled by separated Admin . . .
. . .
and 'Found at' - Total clicks - Last clicked - Query input 'Indexed Image Thumbnails' presenting: - Thumbnail 150 x 100 pixel - Image details like title, filename size of original image, link- and thumb-id - Option to delete the thumbnail In order to open the media files all tables contain active links. Media results are also stored in . . .
. . .
Settings' menu of the admin backend: First option: For results of XML product feeds, present the text extract of 350 characters as part of product attributes. If this option is not activated, all products of the XML product feed will be presented in search results, and the hits of the query string will be highlighted in all involved products. . . .
. . .
content of the database tables (those with the same table prefix) will be destroyed by the restore procedure. 16.5 Copy & Move This section of the Database Management will present only those databases, which are correctly configured, assigned and do have a set of installed tables as described in chapter Definition and configuration. This . . .
. . .
results. Never the less to find these x results, the complete database had to been browsed. Starting with version 2.5, an additional clean option is offered as part of the Admin backend. Main advantage of this option is a significant reduction of the search time for any query, because the content of the db could be limited to offer only x . . .
. . .
<link>http://www.abc.de/images/warp.gif</link> <title>warp.gif</title> <x_size>635</x_size> <y_size>98</y_size> </media_result> <media_result> <num>2</num> <type>audio</type> <url>http://www.abc.de/index.php</url> . . .
. . .
the following content: {"query":"colour","ip":"::1","host_name":"guard_007-hoster","query_time":"2016-03-18 10:56:23 AM","consumed":0.016,"total_results":2,"num_of_results":2,"from":1,"to":2,"text_results":[{"num":1,"weight":"100.0","url":"http:\/\/www.english.le-piaggie.info\/Info_eng.pdf","title":" Info_eng","fulltxt":" . . . . . .
. . .
placed on the ground floor, today's big kitchen with the heating fireplace for open and close mode of . . . ","page_size":"5,661.6kb","domain_name":"www.english.le-piaggie.info"}{num2weight500urlhttp\/\/wwwenglishle-piaggieinfo\/html\/descriptionhtmltitle"num":2,"weight":"50.0","url":"http:\/\/www.english.le-piaggie.info\/html\/description.html","title":" Description","fulltxt":" . . . cable. Additional active . . .
. . .
D.O.C.G. as far as to the Pratomagno mountains 160 olive-trees Detached house, about 800 m to reach the . . . ","page_size":"26.4 kb","domain_name":"www.english.le-piaggie.info"}]} Top . . .

9.   Sphider-plus - The PHP Search Engine Visit in a new window

10.   Sphider-plus - The PHP Search Engine Visit in a new window

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Actual release ] Version: 4.2024a Release date: February 12 2024 - Scripts prepared to work in PHP 8.3 environment. - Scripts prepared to co-operate with MySQL 8.0 databases. - IDS removed because outdated. - Updated scripts for EXIF indexation of media . . .
. . .
removed because outdated. - Updated scripts for EXIF indexation of media content. - New converters to index .ods .odt .docx and .xlsx documents. - Improved presentation of results with multiple results per page. - Improved search algorithm to deliver the same results for queries with/without ligatures. - Improved user interface for RFC 3986 . . .
. . .
for RFC 3986 compatibility. - New option in admin Settings for the Dublin Core Metadata Initiative (DCMI) : Optionally show meta content of .docx documents and .xlsx spreadsheets during index procedure. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: Nearly all scripts in all . . .
. . .
that have been modified / added for this release: Nearly all scripts in all folders. As database has been altered Spider plus needs to be installed completely new for full functionality of this release. Please perform a fresh installation as described in chapter 'New Installation'. Top . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Actual release ] Version 4.2024a Release date February 12, 2024 - Scripts prepared to work in PHP 8.3 environment. - Scripts prepared to co-operate with MySQL 8.0 databases. - IDS removed because outdated. - Updated scripts for EXIF indexation of media . . .
. . .
removed because outdated. - Updated scripts for EXIF indexation of media content. - New converters to index .ods, .odt, .docx and .xlsx documents. - Improved presentation of results with multiple results per page. - Improved search algorithm to deliver the same results for queries with/without ligatures. - Improved user interface for RFC 3986 . . .
. . .
for RFC 3986 compatibility. - New option in admin Settings for the Dublin Core Metadata Initiative (DCMI) Optionally show meta content of .docx documents and .xlsx spreadsheets during index procedure. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release Nearly all scripts in all . . .
. . .
that have been modified / added for this release Nearly all scripts in all folders. As database has been altered, Spider plus needs to be installed completely new for full functionality of this release. Please perform a fresh installation as described in chapter 'New Installation'. Top . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Actual release ] Version 4.2024a Release date February 12, 2024 - Scripts prepared to work in PHP 8.3 environment. - Scripts prepared to co-operate with MySQL 8.0 databases. - IDS removed because outdated. - Updated scripts for EXIF indexation of media . . .
. . .
removed because outdated. - Updated scripts for EXIF indexation of media content. - New converters to index .ods, .odt, .docx and .xlsx documents. - Improved presentation of results with multiple results per page. - Improved search algorithm to deliver the same results for queries with/without ligatures. - Improved user interface for RFC 3986 . . .
. . .
for RFC 3986 compatibility. - New option in admin Settings for the Dublin Core Metadata Initiative (DCMI) Optionally show meta content of .docx documents and .xlsx spreadsheets during index procedure. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release Nearly all scripts in all . . .
. . .
that have been modified / added for this release Nearly all scripts in all folders. As database has been altered, Spider plus needs to be installed completely new for full functionality of this release. Please perform a fresh installation as described in chapter 'New Installation'. Top . . .

11.   Sphider-plus - The PHP Search Engine Visit in a new window


4325  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.0 Release date: May 27 2009 In front of Sphider-plus version 1.9 the following items have been added / modified: Multiple database support for up to 5 independent databases (expandable). Individual activation . . .
. . .
Multiple database support for up to 5 independent databases (expandable). Individual activation of one database for: - Admin - Search user - Suggest URL For more details please notice chapter Multiple database support Independent configuration and activation for each database is integrated into the Admin interface. Additional password protected . . .
. . .
availability check for all databases and their release relevant table structure. Individual for each database: - Backup and restore - Copy / Move from each database to each other database 32 MByte query cache for MySQL database. - To be activated in Admin settings. - Status of cache is observable in Admin / Statistics / Server-Info / MySQL. . . .
. . .
/ Server-Info / MySQL. (Cache might not work for 'Shared Hosting' applications) Obey the tag specification: rel="canonical" If defined in page header of a website the crawler will be redirected to the canonical link and Sphider-plus will understand that the duplicates all refer to the canonical URL. For more details please notice . . .
. . .
Operating System environment - is suggested. If path to PDF converter is invalid and converter is not accessible an error message (in Admin Settings dialog) is created. Additional Admin setting to enable optionally indexing of external hosted media content. Improved index procedure of media files by avoiding indexing of duplicate media . . .
. . .
- Search form and Result listing - Suggest URL form Improved vulnerability check of User input and Admin log-in: - Prevent buffer overflow errors. - Suppress JavaScript execution and tag inclusions masked as XSS attacks. - Prevent C-function 'format-string' vulnerability. The 'URL Suggestion Form' now includes a character counter for . . .
. . .
counter for remaining input in 'title' and 'description' field. Phrase search is enabled now also for title tags not only for full text. Improved suggest framework: For search in categories the suggestions now will be presented with respect to the pre-selected category. For 'Search with wildcards' now the complete word is highlighted in . . .
. . .
result listing. Not only the query part of the found keyword. Additional Admin setting in section 'Suggest Options': For 'Media search' get suggestions also from EXIF info and ID3 tags. Files for database setting and script configuration are protected now against direct client access by pre-defining a named constant. Updated Swedish language . . .
. . .
Updated Swedish language file. Thanks to Holger Gremminger. Bug fixed in 'Search for suggestions in query log' which prevented to disable this option Bug fixed that caused multiple listing of the same result when "Define maximum count of result hits per page displayed in search results (if multiple occurrence is available on a page)" was . . .
. . .
occurrence is available on a page)" was activated. Involved files that have been modified / added for this release: Nearly all scripts. Attention: This release requires a fresh installation of all scripts and a blank MySQL database. An update from former Sphider-plus versions or an upgrade from original Sphider is not foreseen. For more details . . .


4325  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.0 Release date: May 27, 2009 In front of Sphider-plus version 1.9 the following items have been added / modified: Multiple database support for up to 5 independent databases (expandable). Individual activation . . .
. . .
Multiple database support for up to 5 independent databases (expandable). Individual activation of one database for: - Admin - Search user - Suggest URL For more details, please notice chapter Multiple database support Independent configuration and activation for each database is integrated into the Admin interface. Additional password protected . . .
. . .
availability check for all databases and their release relevant table structure. Individual for each database: - Backup and restore - Copy / Move from each database to each other database 32 MByte query cache for MySQL database. - To be activated in Admin settings. - Status of cache is observable in Admin / Statistics / Server-Info / MySQL. . . .
. . .
/ Server-Info / MySQL. (Cache might not work for 'Shared Hosting' applications) Obey the tag specification: rel="canonical" If defined in page header of a website, the crawler will be redirected to the canonical link and Sphider-plus will understand that the duplicates all refer to the canonical URL. For more details, please notice . . .
. . .
Operating System environment - is suggested. If path to PDF converter is invalid and converter is not accessible, an error message (in Admin Settings dialog) is created. Additional Admin setting to enable optionally indexing of external hosted media content. Improved index procedure of media files, by avoiding indexing of duplicate media . . .
. . .
- Search form and Result listing - Suggest URL form Improved vulnerability check of User input and Admin log-in: - Prevent buffer overflow errors. - Suppress JavaScript execution and tag inclusions masked as XSS attacks. - Prevent C-function 'format-string' vulnerability. The 'URL Suggestion Form' now includes a character counter for . . .
. . .
counter for remaining input in 'title' and 'description' field. Phrase search is enabled now also for title tags, not only for full text. Improved suggest framework: For search in categories, the suggestions now will be presented with respect to the pre-selected category. For 'Search with wildcards' now the complete word is highlighted in . . .
. . .
result listing. Not only the query part of the found keyword. Additional Admin setting in section 'Suggest Options': For 'Media search' get suggestions also from EXIF info and ID3 tags. Files for database setting and script configuration are protected now against direct client access by pre-defining a named constant. Updated Swedish language . . .
. . .
Updated Swedish language file. Thanks to Holger Gremminger. Bug fixed in 'Search for suggestions in query log', which prevented to disable this option Bug fixed that caused multiple listing of the same result, when "Define maximum count of result hits per page, displayed in search results (if multiple occurrence is available on a page)" was . . .
. . .
occurrence is available on a page)" was activated. Involved files that have been modified / added for this release: Nearly all scripts. Attention: This release requires a fresh installation of all scripts and a blank MySQL database. An update from former Sphider-plus versions or an upgrade from original Sphider is not foreseen. For more details, . . .


4325  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.0 Release date May 27, 2009 In front of Sphider-plus version 1.9 the following items have been added / modified Multiple database support for up to 5 independent databases (expandable). Individual activation . . .
. . .
Multiple database support for up to 5 independent databases (expandable). Individual activation of one database for - Admin - Search user - Suggest URL For more details, please notice chapter Multiple database support Independent configuration and activation for each database is integrated into the Admin interface. Additional password protected . . .
. . .
availability check for all databases and their release relevant table structure. Individual for each database - Backup and restore - Copy / Move from each database to each other database 32 MByte query cache for MySQL database. - To be activated in Admin settings. - Status of cache is observable in Admin / Statistics / Server-Info / MySQL. . . .
. . .
/ Server-Info / MySQL. (Cache might not work for 'Shared Hosting' applications) Obey the tag specification rel="canonical" If defined in page header of a website, the crawler will be redirected to the canonical link and Sphider-plus will understand that the duplicates all refer to the canonical URL. For more details, please notice . . .
. . .
Operating System environment - is suggested. If path to PDF converter is invalid and converter is not accessible, an error message (in Admin Settings dialog) is created. Additional Admin setting to enable optionally indexing of external hosted media content. Improved index procedure of media files, by avoiding indexing of duplicate media . . .
. . .
- Search form and Result listing - Suggest URL form Improved vulnerability check of User input and Admin log-in - Prevent buffer overflow errors. - Suppress JavaScript execution and tag inclusions masked as XSS attacks. - Prevent C-function 'format-string' vulnerability. The 'URL Suggestion Form' now includes a character counter for . . .
. . .
counter for remaining input in 'title' and 'description' field. Phrase search is enabled now also for title tags, not only for full text. Improved suggest framework For search in categories, the suggestions now will be presented with respect to the pre-selected category. For 'Search with wildcards' now the complete word is highlighted in . . .
. . .
result listing. Not only the query part of the found keyword. Additional Admin setting in section 'Suggest Options' For 'Media search' get suggestions also from EXIF info and ID3 tags. Files for database setting and script configuration are protected now against direct client access by pre-defining a named constant. Updated Swedish language . . .
. . .
Updated Swedish language file. Thanks to Holger Gremminger. Bug fixed in 'Search for suggestions in query log', which prevented to disable this option Bug fixed that caused multiple listing of the same result, when "Define maximum count of result hits per page, displayed in search results (if multiple occurrence is available on a page)" was . . .
. . .
occurrence is available on a page)" was activated. Involved files that have been modified / added for this release Nearly all scripts. Attention This release requires a fresh installation of all scripts and a blank MySQL database. An update from former Sphider-plus versions or an upgrade from original Sphider is not foreseen. For more details, . . .


4325  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.0 Release date May 27, 2009 In front of Sphider-plus version 1.9 the following items have been added / modified Multiple database support for up to 5 independent databases (expandable). Individual activation . . .
. . .
Multiple database support for up to 5 independent databases (expandable). Individual activation of one database for - Admin - Search user - Suggest URL For more details, please notice chapter Multiple database support Independent configuration and activation for each database is integrated into the Admin interface. Additional password protected . . .
. . .
availability check for all databases and their release relevant table structure. Individual for each database - Backup and restore - Copy / Move from each database to each other database 32 MByte query cache for MySQL database. - To be activated in Admin settings. - Status of cache is observable in Admin / Statistics / Server-Info / MySQL. . . .
. . .
/ Server-Info / MySQL. (Cache might not work for 'Shared Hosting' applications) Obey the tag specification rel="canonical" If defined in page header of a website, the crawler will be redirected to the canonical link and Sphider-plus will understand that the duplicates all refer to the canonical URL. For more details, please notice . . .
. . .
Operating System environment - is suggested. If path to PDF converter is invalid and converter is not accessible, an error message (in Admin Settings dialog) is created. Additional Admin setting to enable optionally indexing of external hosted media content. Improved index procedure of media files, by avoiding indexing of duplicate media . . .
. . .
- Search form and Result listing - Suggest URL form Improved vulnerability check of User input and Admin log-in - Prevent buffer overflow errors. - Suppress JavaScript execution and tag inclusions masked as XSS attacks. - Prevent C-function 'format-string' vulnerability. The 'URL Suggestion Form' now includes a character counter for . . .
. . .
counter for remaining input in 'title' and 'description' field. Phrase search is enabled now also for title tags, not only for full text. Improved suggest framework For search in categories, the suggestions now will be presented with respect to the pre-selected category. For 'Search with wildcards' now the complete word is highlighted in . . .
. . .
result listing. Not only the query part of the found keyword. Additional Admin setting in section 'Suggest Options' For 'Media search' get suggestions also from EXIF info and ID3 tags. Files for database setting and script configuration are protected now against direct client access by pre-defining a named constant. Updated Swedish language . . .
. . .
Updated Swedish language file. Thanks to Holger Gremminger. Bug fixed in 'Search for suggestions in query log', which prevented to disable this option Bug fixed that caused multiple listing of the same result, when "Define maximum count of result hits per page, displayed in search results (if multiple occurrence is available on a page)" was . . .
. . .
occurrence is available on a page)" was activated. Involved files that have been modified / added for this release Nearly all scripts. Attention This release requires a fresh installation of all scripts and a blank MySQL database. An update from former Sphider-plus versions or an upgrade from original Sphider is not foreseen. For more details, . . .

12.   Sphider-plus - The PHP Search Engine Visit in a new window

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23 2023 - New converter to index PDF documents. Besides the known world languages this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncsphp /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results don't present result . . .
. . .
be presented individually for each search result. For details about the new web service please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated only the content of this special sitemap will guide the index procedure. For details see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence send e-mail report to Sphider-plus admin about each harm attempt. For details see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag a PHP solution was implemented. So those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09 2015 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description' . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form result listing and addurl form. Automatically adapting to display size of computer tablet smartphone etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF DOC RTF and XLS files. In result listing no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section nav aside hgroup article header footer etc If enabled in Admin settings the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section nav aside hgroup article header footer etc. If enabled in Admin settings the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes) or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure but only for canonical links. Only the canonical link will be indexed but links found there will be ignored. . . .
. . .
redirections which are invoked by JavaScript when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301 302 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google MSN Amazon etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new old and half width) hiragana katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query) if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.135 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.135 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.135 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.135 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.135 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.135 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.135 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.135 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.135 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.135 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.135 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.135 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.135 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.135 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.135 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.135 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.135 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.135 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.135 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114119164255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 225 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 25 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 25 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.53 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.53 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.53) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 53.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 53 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 553.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 553.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 57 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 59 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 225 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150end_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015ee Release date: September 24, 2015e Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015ed Release date: July 06, 2015e Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015ec Release date: May 29, 2015e Build up with Sphider: v.1.3.5 In front of version 3.2015eb the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015eb Release date: March 09, 2015e, 2015e Build up with Sphider: v.1.3.5 In front of version 3.2015ea the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015ea Release date: January 06, 2015e Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 20190315 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 32015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 32015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 32015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 32015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 32015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 32015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 32015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 53x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 553 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 553 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version 4.2023f Release date November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version 4.2023e Release date September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order) - Arabic - Bengali - Chinese - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version 4.2023d Release date August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version 4.2023c Release date June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version 4.2023b Release date February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version 4.2023a Release date December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation . . .
. . .
highlighting of search queries in result listing No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version 4.2021c Release date October 02, 2021 Improved index procedure Now . . .
. . .
instructions. Top [ Outdated version ] Version 3.2020d Release date Sept. 24, 2020 Build up with Sphider v.1.3.5 New option URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version 3.2020c Release date May 19, 2020 Build up with Sphider v.1.3.5 New option Index and make searchable Open Graph images. Currently are parsed ogtitle . . .
. . .
results.html Top [ Outdated version ] Version 3.2020b Release date March 10, 2020 Build up with Sphider v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version 3.2020a Release date January 01, 2020 Build up with Sphider v.1.3.5 New option Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in Settings = Search Settings New option For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version 3.2019c Release date August 21, 2019 Build up with Sphider v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version v.3.2019b Release date June 29, 2019 Build up with Sphider v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version v.3.2019a Release date 2019.03.15 Build up with Sphider v.1.3.5 New feature Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version 3.2018b Release date October 08, 2018 Build up with Sphider v.1.3.5 New feature Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version 3.2018a Release date January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version 3.2018a Release date January 25, 2018 Build up with Sphider v.1.3.5 New feature New option in admin settings Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version 3.2016c Release date May 30, 2016 Build up with Sphider v.1.3.5 New feature - Index only e-mail accounts like 'my-name@gmail.com' (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version 3.2016b Release date March 22, 2016 Build up with Sphider v.1.3.5 New feature Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version 3.2016a Release date February 10, 2016 Build up with Sphider v.1.3.5 New feature . . .
. . .
backend. New feature Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version 3.2015e Release date September 24, 2015 Build up with Sphider v.1.3.5 New feature Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version 3.2015d Release date July 06, 2015 Build up with Sphider v.1.3.5 New feature for command line operation Enabled to index with respect to preference level. To be invoked by -preferred <level> Improved admin backend . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version 3.2015c Release date May 29, 2015 Build up with Sphider v.1.3.5 In front of version 3.2015b the following modifications have been added New option to define the chronological order of text result listing Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version 3.2015b Release date March 09, 2015, 2015 Build up with Sphider v.1.3.5 In front of version 3.2015a the following modifications have been added New feature for index procedure - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version 3.2015a Release date January 06, 2015 Build up with Sphider v.1.3.5 New feature Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="displaynone" in div elements. Something like <div style="displaynone">ignore_this_content</ 5dc0 New feature In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu Clear all entries in 'Banned' table. Improved option Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version 2.6 Release date March 08, 2011 Build up with Sphider v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified New feature Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting Separated activation of debug mode for Admin backend and User interface. New Admin setting Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version 2.5 Release date November 30, 2010 Build up with Sphider v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified New feature Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset Shift_JIS, EUC-JP and UTF-8 New feature Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are fopen(); file_get_contents(); md5_file(); 3 new features for command line operation - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version 4.2023f Release date November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version 4.2023e Release date September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order) - Arabic - Bengali - Chinese - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version 4.2023d Release date August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version 4.2023c Release date June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version 4.2023b Release date February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version 4.2023a Release date December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation . . .
. . .
highlighting of search queries in result listing No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version 4.2021c Release date October 02, 2021 Improved index procedure Now . . .
. . .
instructions. Top [ Outdated version ] Version 3.2020d Release date Sept. 24, 2020 Build up with Sphider v.1.3.5 New option URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version 3.2020c Release date May 19, 2020 Build up with Sphider v.1.3.5 New option Index and make searchable Open Graph images. Currently are parsed ogtitle . . .
. . .
results.html Top [ Outdated version ] Version 3.2020b Release date March 10, 2020 Build up with Sphider v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version 3.2020a Release date January 01, 2020 Build up with Sphider v.1.3.5 New option Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in Settings = Search Settings New option For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version 3.2019c Release date August 21, 2019 Build up with Sphider v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version v.3.2019b Release date June 29, 2019 Build up with Sphider v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version v.3.2019a Release date 2019.03.15 Build up with Sphider v.1.3.5 New feature Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version 3.2018b Release date October 08, 2018 Build up with Sphider v.1.3.5 New feature Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version 3.2018a Release date January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version 3.2018a Release date January 25, 2018 Build up with Sphider v.1.3.5 New feature New option in admin settings Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version 3.2016c Release date May 30, 2016 Build up with Sphider v.1.3.5 New feature - Index only e-mail accounts like 'my-name@gmail.com' (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version 3.2016b Release date March 22, 2016 Build up with Sphider v.1.3.5 New feature Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version 3.2016a Release date February 10, 2016 Build up with Sphider v.1.3.5 New feature . . .
. . .
backend. New feature Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version 3.2015e Release date September 24, 2015 Build up with Sphider v.1.3.5 New feature Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version 3.2015d Release date July 06, 2015 Build up with Sphider v.1.3.5 New feature for command line operation Enabled to index with respect to preference level. To be invoked by -preferred <level> Improved admin backend . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version 3.2015c Release date May 29, 2015 Build up with Sphider v.1.3.5 In front of version 3.2015b the following modifications have been added New option to define the chronological order of text result listing Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version 3.2015b Release date March 09, 2015, 2015 Build up with Sphider v.1.3.5 In front of version 3.2015a the following modifications have been added New feature for index procedure - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version 3.2015a Release date January 06, 2015 Build up with Sphider v.1.3.5 New feature Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="displaynone" in div elements. Something like <div style="displaynone">ignore_this_content</ 5dc0 New feature In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu Clear all entries in 'Banned' table. Improved option Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version 2.6 Release date March 08, 2011 Build up with Sphider v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified New feature Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting Separated activation of debug mode for Admin backend and User interface. New Admin setting Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version 2.5 Release date November 30, 2010 Build up with Sphider v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified New feature Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset Shift_JIS, EUC-JP and UTF-8 New feature Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are fopen(); file_get_contents(); md5_file(); 3 new features for command line operation - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search50php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search50php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search50php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language=javascript>windowlocation=mpphpmcv=59"javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5file()); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mpphpmcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015headlinehtml /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025search-formhtml_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025search-formhtml_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025search-formhtml_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025search-formhtml_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025search-formhtml-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025search-formhtml-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025search-formhtml-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025search-formhtml-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025search-formhtml Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025search-formhtml /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025search-formhtml /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025search-formhtml /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050result-headerhtmlbr_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050result-headerhtmlbr_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050result-headerhtmlbr_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050result-headerhtmlbr_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050result-headerhtmlbr_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050result-headerhtmlbr-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050result-headerhtmlbr-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050result-headerhtmlbr-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050result-headerhtmlbr-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050result-headerhtmlbr-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050result-headerhtmlbr . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050result-headerhtmlbr /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050result-headerhtmlbr . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050result-headerhtmlbr /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050result-headerhtmlbr /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25search-formphp_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25search-formphp-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25search-formphp Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v135 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v135 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v135 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v135 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v135 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v135 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v135 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v135 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v135 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v135 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v135 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v135 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v135 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v135 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v135 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v135 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v135 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v135 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v135 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v53)) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Former versions ] [ Former version ] Version: 4.2023f Release date: November 21, 2023 - Improved exception handling for applications on 'Shared Hosting' servers. - Updated file list for IPs to be ignored during search procedure. - Bug fixed in Punycode conversion. - Bug fixed . . .
. . .
search. - Some small bugs fixed. - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/messages.php /admin/spiderfuncs.php /include/commonfuncs.php /include/search_10.php /include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e . . .
. . .
/include/searchfuncs.php /include/common/black_ips_priv.txt Top [ Former version ] Version: 4.2023e Release date: September 23, 2023 - New converter to index PDF documents. Besides the known world languages, this new converter is proven for (as examples in alphabetical order): - Arabic - Bengali - Chinese: - Chinese (traditional) - Mandarin . . .
. . .
- Mandarin (simplified Chinese) - Cyrillic - Ethiopic (Abyssinica) - Greek - Hebrew - Hindi - Japanese: - JS Hiragana - JS Katakana - JS Kanji - Korean - Syriac/Arabic - Tai - Turkish - Urdu - Improved search algorithm for queries with wildcards, together with optimized highlighting in result listing. - New option in Settings of . . .
. . .
with wildcards, together with optimized highlighting in result listing. - New option in Settings of admin backend: Define maximum count of result hits for queries with wildcards, displayed in results. (if multiple occurrence of keyword is available in different sections of full text) - Some small bugs fixed - Involved folders and files that . . .
. . .
of full text) - Some small bugs fixed - Involved folders and files that have been modified / added for this release: /admin/admin.php /admin/configset.php /admin/sphider.php /admin/spiderfuncs.php /converter/ pdf / . . . as new subfolder together with all its subfolders and scripts /include/searchfuncs.php . . .
. . .
/include/stemming/fr_stem.php Top [ Former version ] Version: 4.2023d Release date: August 05, 2023 - Improved search algorithm for query strings containing accents grave and accents circumflex. - Improved highlighting of query string in result listing. - Bug fixed in 'Search with wildcard'. . . .
. . .
in arrays. - Some more small bugs fixed Involved folders and files that have been modified / added for this release: /admin/auth.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs.php /include/searchfuncs.php /include/search_40.php /include/suggest.php Top [ Former version ] Version: 4.2023c Release date: June 01, 2023 New . . .
. . .
all indexed thumbnail files. New option Delete all queried thumbnail files. Bug fixed in indexation of last word in: - full text - meta tag 'title' - meta tag 'description' Bug fixed in highlighting of query string in text results. Bug fixed in 'Prevent search form from being flooded by too many queries per unit of time'. Bug fixed in statistics . . .
. . .
release: /admin/admin.php /admin/admin_header.php /admin/configset.php /admin/spiderfuncs.php /include/commonfuncs,php /include/search_10.php /include/search_40.php /languages/sr-language.php Top [ Former version ] Version: 4.2023b Release date: February 21, 2023 Additional language file added for Greek dialog language. With special thanks to . . .
. . .
alphabetically by suffixes of all indexed pages. New option: Sort result listing by file/page suffixes. For details, please have a look at chapter 7.1 : Sorting text results = Sort by file suffix Scripts prepared to work in PHP 8.2.3 environment. Bug fixed in Settings option: Do not index UNICODE symbols and Emoji characters. Involved folders and . . .
. . .
/languages/el_language.php Top [ Former version ] Version: 4.2023a Release date: December 21, 2022 Improved conversion of PDF documents. Now suppressing more invalid and unreadable characters. Improved conversion of DOC documents. Now suppressing more invalid and unreadable characters. Improved database table installation: . . .
. . .
highlighting of search queries in result listing: No longer highlighting complete text behind a search query, which (up to now) sometimes happened. Improved presentation of search form: Now correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated . . .
. . .
is offered: Store all user IPs GDPR conform. If activated, any URL is stored anonymously by replacing for example 114.119.164.255 to 114.119.0.0 Realized for IPv4 and IPv6. For details see the Sphider-plus FAQ : Why to store all user IPs GDPR conform? Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs . . .
. . .
/include/commonfuncs.php /include/commons.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/050_result-header.html . . .
. . .
results.html /templates/html/130_image-results header.html /templates/html/140_image-results.html /templates/html/150_end image-results.html /templates/html/160_stream-results header.html /templates/html/170_stream-results.html Top [ Former version ] Version: 4.2021c Release date: October 02, 2021 Improved index procedure: Now . . .
. . .
instructions. Top [ Outdated version ] Version: 3.2020d Release date: Sept. 24, 2020 Build up with Sphider: v.1.3.5 New option: URLs are followed, which are redirected from http to https protocol by HTTP301 'permanently moved'. Usually performed by a .htaccess directive, now also Sphider-plus offers it independently. During index procedure . . .
. . .
/include/common/black_ips.txt /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html Top [ Outdated version ] Version: 3.2020c Release date: May 19, 2020 Build up with Sphider: v.1.3.5 New option: Index and make searchable Open Graph images. Currently are parsed: og:title . . .
. . .
results.html Top [ Outdated version ] Version: 3.2020b Release date: March 10, 2020 Build up with Sphider: v.1.3.5 Bug fixed in option 'Convert all kind of accents and diacritics into their basic vowels.' Bug fixed in option 'Index media.' Bug fixed in option 'Use word stemming.' Bug fixed in 'Tolerant search.' Some small bugs fixed. . . .
. . .
files Top [ Outdated version ] Version: 3.2020a Release date: January 01, 2020 Build up with Sphider: v.1.3.5 New option: Continuous amount of search results presented per page. Range selectable between 1 and 100 results per page To be defined in: Settings = Search Settings New option: For single results, don't present result . . .
. . .
be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared . . .
. . .
Top [ Outdated version ] Version: 3.2019c Release date: August 21, 2019 Build up with Sphider: v.1.3.5 For new added sites in admin backend the default value for ‘Spider can leave domain during index procedure’ has been altered to NO Bug fixed in database configuration for support of multiple databases. Bug fixed in result . . .
. . .
Top [ Outdated version ] Version: v.3.2019b Release date: June 29, 2019 Build up with Sphider: v.1.3.5 Improved domain WHOIS algorithm. Now detecting 238 TLDs. Improved IP detection and geo info for users IP address. Improved code for responsive design feature. Improved user input protection against SQL injections Bug fixed in . . .
. . .
/templates/html/0101_html_header.html Top [ Outdated version ] Version: v.3.2019a Release date: 2019.03.15 Build up with Sphider: v.1.3.5 New feature: Present all results (for singular and plural) at Russian nouns. This will deliver all search results for e.g. автокреслО and/or автокреслA. Independent from singular or plural . . .
. . .
Top [ Outdated version ] Version: 3.2018b Release date: October 08, 2018 Build up with Sphider: v.1.3.5 New feature: Support of XML product feeds. Index and search of feed content, inclusive formatting the search results. For details please notice chapter 17.1 of the readme.pdf docu (Chapter 14.1 of this online docu). New . . .
. . .
If activated, only the content of this special sitemap will guide the index procedure. For details, see chapter 5.9 'Use private sitemap' of the readme.pdf docu. New option in admin settings: For new URLs verify not only host part, but also path and argument of the URL to be new for database. New option in admin settings: Protect admin backend . . .
. . .
/languages/all files /templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/050_result-header.html /templates/html/090 footer.html /templates/html/091 footer.html /templates/120_media-only results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with . . .
. . .
results.html Top [ Outdated version ] Version: 3.2018a Release date: January 25, 2018 Build up with Sphider: v.1.3.5 New feature: New option in admin settings: Create a log file containing all attempts to harm the user interface of Sphider-plus. Additional option: On occurrence, send e-mail report to Sphider-plus admin about each harm . . .
. . .
option: On occurrence, send e-mail report to Sphider-plus admin about each harm attempt. For details, see chapter 22.5 of the readme.pdf docu. Improved search result listing for phpBB forum. Improved option 'Follow sitemap.xml files during index procedure'. Updated URL for web shot thumbnail creation in result listing. Updated 'black_ips' file . . .
. . .
/include/searchfuncs.php /include/xml.php /include/common/black_ips_priv.txt /templates/html/20_search-form.php /templates/html/25_search-form.php Top [ Outdated version ] Version: 3.2016c Release date: May 30, 2016 Build up with Sphider: v.1.3.5 New feature: - Index only e-mail accounts like 'my-name@gmail.com' : (Will extract all e-mail . . .
. . .
Now removing all emoji characters (smileys) from full text, so that systems still using MySQL versions older than 5.5.3 will be able to highlight search results correctly. Corrected Apache glitch which causes a % 252F instead of % 2F in URLs. Instead of using the Apache rewrite module and NE flag, a PHP solution was implemented. So, those links . . .
. . .
/include/suggest.php /include/common/black_ips_priv.txt /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/080_most_pop.html Top [ Outdated version ] Version: 3.2016b Release date: March 22, 2016 Build up with Sphider: v.1.3.5 New feature: Besides XML result output file, now also a JSON . . .
. . .
/templates/html/010_html_header.html /templates/html/011_html_header.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/070_more-results.html /templates/html/200_no media-found.html Top [ Outdated version ] Version: 3.2016a Release date: February 10, 2016 Build up with Sphider: v.1.3.5 New feature: . . .
. . .
backend. New feature: Database support for full UNICODE, including astral symbols. Requires MySQL server version 5.5.3 New feature: Compressed transfer on the Internet enabled for page content and PHP scripts. Depending on server environment this feature may not work on all servers. Improved MySQL database support: - Now creating tables in . . .
. . .
in admin 'Settings' menu, and also in result listing. Wrapper added to bypass the PHP bug (error known since PHP v.5.3) gzopen() = gzopen64() and all other gz functions. p Bug fixed to store the admin and dispatcher e-mail account in admin backend. Bug fixed in <! sphider_noindex > directive. Bug fixed for search terms with a length < . . .
. . .
connector had been modified for this version, a fresh installation is required. Top [ Outdated version ] Version: 3.2015e Release date: September 24, 2015 Build up with Sphider: v.1.3.5 New feature: Block all queries for e-mail accounts like 'my-name@gmail.com' To be activated in admin backend. New feature in admin backend: Create a default . . .
. . .
/templates/Slade/adminstyle.css /templates/Sphider-plus/adminstyle.css Top [ Outdated version ] Version: 3.2015d Release date: July 06, 2015 Build up with Sphider: v.1.3.5 New feature for command line operation: Enabled to index with respect to preference level. To be invoked by: -preferred <level> Improved admin backend: . . .
. . .
/templates/Pure/adminstyle.css /templates/Pure/userstyle.css Top [ Outdated version ] Version: 3.2015c Release date: May 29, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015b the following modifications have been added: New option to define the chronological order of text result listing: Single result per page . . .
. . .
These files remained unchanged since last version of Sphider-plus. Top [ Outdated version ] Version: 3.2015b Release date: March 09, 2015, 2015 Build up with Sphider: v.1.3.5 In front of version 3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', . . .
. . .
/include/search_media.php /include/show_id3.php /templates/html/all files Top [ Outdated version ] Version: 3.2015a Release date: January 06, 2015 Build up with Sphider: v.1.3.5 New feature: Responsive design for search form, result listing and addurl form. Automatically adapting to display size of computer, tablet, smartphone, etc. New . . .
. . .
/include/show_id3.php /include/common/black_ips /include/IDS/all scripts /languages/all scripts /templates/html/015_headline.html /templates/html/020_search-form.html /templates/html/025_search-form.html /templates/html/030_category-selection.html /templates/html/040_category-tree.html /templates/html/050_result-header.html . . .
. . .
- Improved protection against SQL injection, even without activated IDS Updated link and charset detection for HTML5 coded URLs. Updated Danish language file. Thanks to 'incognito'. Bug fixed in result listing for title presentation, containing % 20 blanks. Some small bugs fixed. Involved files that have been modified / added for this . . .
. . .
all of them are presented in 'Sites' view for the according URL. Length of 'Name of promoted domain' enlarged to 255 characters. Length of 'Promoted catchword in text' enlarged to 255 characters. Modified title extraction for PDF, DOC, RTF and XLS files. In result listing, no longer presenting the file suffix as part of the title. Bug fixed in . . .
. . .
of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ 5dc0 New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery. For details please . . .
. . .
a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now preventing double indexing. Bug fixed in 'Strip session ids'. Bug fixed in Korean word segmentation. Some small bugs killed. Involved files that have been modified . . .
. . .
of a page, defined by <element > . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc If enabled in Admin settings, the values as defined in the list-file /include/common/elements_use.txt will be used to index only the page content between . . .
. . .
of a page, defined by <element> . . . </element> This feature is foreseen to cooperate with the new HTML5 elements like section, nav, aside, hgroup, article, header, footer, etc. If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to remove the content between . . . . . .
. . .
top level domains and some important country codes (supporting 30 suffixes), or an extended list (supporting 155 suffixes) are selectable. New option to be activated in Admin backend: Crawler can leave domain during index procedure, but only for canonical links. Only the canonical link will be indexed, but links found there will be ignored. . . .
. . .
redirections, which are invoked by JavaScript, when sent as HTTP content. Will obey directives like: <SCRIPT language="javascript">window.location="mp.php?mcv=59"; </SCRIPT> New feature: Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. New feature: Separated PDF converter supplied for 32 and 64 bit . . .
. . .
/include/idna_converter.php /include/media_counter.php /include/search_10.php /include/search_40.php /include/search_50.php /include/search_media.php /include/searchfuncs.php 1f40 /include/suggest.php /include/common/docs.txt /languages/ all files /templates/html/020_search-form.html /templates/html/090_footer.html . . .
. . .
Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New . . .
. . .
in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). 5a2 New option in Admin 'Clear' menu: Clear all entries in 'Addurl' table. New option in Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
. . .
files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in search.php script, the results will be presented as XML file in /xml/ For details see the . . .
. . .
be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred charset. New Admin setting: Separated activation of debug mode for Admin backend and User interface. New Admin setting: Do not index the full . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .
. . .
all tables' for all databases in 'Database Management / Configure' menu. Top [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .
. . .
indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

13.   Sphider-plus - The PHP Search Engine Visit in a new window

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.1 Release date: September 03 2009 In front of Sphider-plus version 2.0 the following items have been added / modified: New item in Admin settings: Perform a segmentation of Chinese and Korean text during index / re-index procedure. Will divide . . .
. . .
Korean text during index / re-index procedure. Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 so that all will become searchable. Valid for Chinese sites with charset: GB2312 GBK and GB18030 Valid for Korean sites with charset: EUC-KR and ISO10646-1933 New item in Admin setting: Index password protected sites. If enabled . . .
. . .
Up to 3 different zones could be registered in Admin settings and will be indexed. New options in Admin settings: - Index framesets - Index iframes If enabled both options will index html and image frames. Not available for dynamically reloaded frames (e.g. by JavaScript). New item in Admin setting: Enable to decode BBCode during index / . . .
. . .
Admin setting: Enable to decode BBCode during index / re-index into standard HTML If selected code like [url=http://abc.de/][b]abc.de[/b][/url] will be converted to <;a href="http://abc.de">;<;strong abc.de<;/strong>;<;/a>; New item in Admin settings: Enable to decode entity coded sites into standard HTML characters. If . . .
. . .
New item in Admin settings: Enable to decode entity coded sites into standard HTML characters. If selected entity coded text like Čapek and D #246;hl will be converted to Čapek and Döhl New options in Admin settings: - Use whitelist in order to enable index / re-index only those pages that include any the words in whitelist - Use . . .
. . .
index / re-index only those pages that include all the words in whitelist Improved 'Follow sitemap.xml' procedure: If <;sitemapindex . . >; is detected in a sitemap.xml file and if multiple Sitemap files are available Sphider-plus will process the secondary Sitemaps and extract all links for index / re-index. Also gzip-compressed files . . .
. . .
files (Index Sitemap files as well as the Sitemap files) will be processed. Improved index / re-index procedure: If charset of a site to be indexed is undetectable because it is not HTML standard conform or missing HTML tag the index procedure will no longer been interrupted. Preferred charset as defined in Admin settings will be used for . . .
. . .
charset as defined in Admin settings will be used for the involved link. Improved index / re-index procedure: If Sphider-plus is relocated by http 301 or 302 links found at the relocated site will also be followed. For new sites as per default the spider-depth is now set to 'full'. Improved UTF-8 support: Conversion into UTF-8 charset . . .
. . .
& Re-index'. Improved search functions for search with wildcards and for strict search. Improved category search: - Selected category name is highlighted in headline of result listing. - If activated in Admin setting categories which would also deliver results are presented individual for each result link in the result listing. - If search in . . .
. . .
are presented individual for each result link in the result listing. If media search is enabled in Admin settings text search with wildcards will also present media results. Improved search utility: Queries with and without hyphen will deliver the same results so that queries like 'make-up' and 'make up' do have equal rights. The same . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.1 Release date: September 03, 2009 In front of Sphider-plus version 2.0 the following items have been added / modified: New item in Admin settings: Perform a segmentation of Chinese and Korean text during index / re-index procedure. Will divide . . .
. . .
Korean text during index / re-index procedure. Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 , so that all will become searchable. Valid for Chinese sites with charset: GB2312, GBK and GB18030 Valid for Korean sites with charset: EUC-KR and ISO10646-1933 New item in Admin setting: Index password protected sites. If enabled, . . .
. . .
Up to 3 different zones could be registered in Admin settings and will be indexed. New options in Admin settings: - Index framesets - Index iframes If enabled, both options will index html and image frames. Not available for dynamically reloaded frames (e.g. by JavaScript). New item in Admin setting: Enable to decode BBCode during index / . . .
. . .
Admin setting: Enable to decode BBCode during index / re-index into standard HTML If selected, code like [url=http://abc.de/][b]abc.de[/b][/url] will be converted to <;a href="http://abc.de">;<;strong abc.de<;/strong>;<;/a>; New item in Admin settings: Enable to decode entity coded sites into standard HTML characters. If . . .
. . .
New item in Admin settings: Enable to decode entity coded sites into standard HTML characters. If selected, entity coded text like Čapek and D #246;hl will be converted to Čapek and Döhl New options in Admin settings: - Use whitelist in order to enable index / re-index only those pages that include any the words in whitelist - Use . . .
. . .
index / re-index only those pages that include all the words in whitelist Improved 'Follow sitemap.xml' procedure: If <;sitemapindex . . >; is detected in a sitemap.xml file, and if multiple Sitemap files are available, Sphider-plus will process the secondary Sitemaps and extract all links for index / re-index. Also gzip-compressed files . . .
. . .
files (Index Sitemap files as well as the Sitemap files) will be processed. Improved index / re-index procedure: If charset of a site to be indexed is undetectable, because it is not HTML standard conform or missing HTML tag, the index procedure will no longer been interrupted. Preferred charset as defined in Admin settings will be used for . . .
. . .
charset as defined in Admin settings will be used for the involved link. Improved index / re-index procedure: If Sphider-plus is relocated by http 301 or 302, links found at the relocated site will also be followed. For new sites, as per default the spider-depth is now set to 'full'. Improved UTF-8 support: Conversion into UTF-8 charset . . .
. . .
& Re-index'. Improved search functions for search with wildcards and for strict search. Improved category search: - Selected category name is highlighted in headline of result listing. - If activated in Admin setting, categories which would also deliver results are presented individual for each result link in the result listing. - If search in . . .
. . .
are presented individual for each result link in the result listing. If media search is enabled in Admin settings, text search with wildcards will also present media results. Improved search utility: Queries with and without hyphen will deliver the same results, so that queries like 'make-up' and 'make up' do have equal rights. The same . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 53x.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 53x compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.1 Release date: September 03, 2009 In front of Sphider-plus version 2.0 the following items have been added / modified: New item in Admin settings: Perform a segmentation of Chinese and Korean text during index / re-index procedure. Will divide . . .
. . .
Korean text during index / re-index procedure. Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 , so that all will become searchable. Valid for Chinese sites with charset: GB2312, GBK and GB18030 Valid for Korean sites with charset: EUC-KR and ISO10646-1933 New item in Admin setting: Index password protected sites. If enabled, . . .
. . .
Up to 3 different zones could be registered in Admin settings and will be indexed. New options in Admin settings: - Index framesets - Index iframes If enabled, both options will index html and image frames. Not available for dynamically reloaded frames (e.g. by JavaScript). New item in Admin setting: Enable to decode BBCode during index / . . .
. . .
Admin setting: Enable to decode BBCode during index / re-index into standard HTML If selected, code like [url=http://abc.de/][b]abc.de[/b][/url] will be converted to <;a href="http://abc.de">;<;strong abc.de<;/strong>;<;/a>; New item in Admin settings: Enable to decode entity coded sites into standard HTML characters. If . . .
. . .
New item in Admin settings: Enable to decode entity coded sites into standard HTML characters. If selected, entity coded text like Čapek and D #246;hl will be converted to Čapek and Döhl New options in Admin settings: - Use whitelist in order to enable index / re-index only those pages that include any the words in whitelist - Use . . .
. . .
index / re-index only those pages that include all the words in whitelist Improved 'Follow sitemap.xml' procedure: If <;sitemapindex . . >; is detected in a sitemap.xml file, and if multiple Sitemap files are available, Sphider-plus will process the secondary Sitemaps and extract all links for index / re-index. Also gzip-compressed files . . .
. . .
files (Index Sitemap files as well as the Sitemap files) will be processed. Improved index / re-index procedure: If charset of a site to be indexed is undetectable, because it is not HTML standard conform or missing HTML tag, the index procedure will no longer been interrupted. Preferred charset as defined in Admin settings will be used for . . .
. . .
charset as defined in Admin settings will be used for the involved link. Improved index / re-index procedure: If Sphider-plus is relocated by http 301 or 302, links found at the relocated site will also be followed. For new sites, as per default the spider-depth is now set to 'full'. Improved UTF-8 support: Conversion into UTF-8 charset . . .
. . .
& Re-index'. Improved search functions for search with wildcards and for strict search. Improved category search: - Selected category name is highlighted in headline of result listing. - If activated in Admin setting, categories which would also deliver results are presented individual for each result link in the result listing. - If search in . . .
. . .
are presented individual for each result link in the result listing. If media search is enabled in Admin settings, text search with wildcards will also present media results. Improved search utility: Queries with and without hyphen will deliver the same results, so that queries like 'make-up' and 'make up' do have equal rights. The same . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.1 Release date: September 03, 2009 In front of Sphider-plus version 2.0 the following items have been added / modified: New item in Admin settings: Perform a segmentation of Chinese and Korean text during index / re-index procedure. Will divide . . .
. . .
Korean text during index / re-index procedure. Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 , so that all will become searchable. Valid for Chinese sites with charset: GB2312, GBK and GB18030 Valid for Korean sites with charset: EUC-KR and ISO10646-1933 New item in Admin setting: Index password protected sites. If enabled, . . .
. . .
Up to 3 different zones could be registered in Admin settings and will be indexed. New options in Admin settings: - Index framesets - Index iframes If enabled, both options will index html and image frames. Not available for dynamically reloaded frames (e.g. by JavaScript). New item in Admin setting: Enable to decode BBCode during index / . . .
. . .
Admin setting: Enable to decode BBCode during index / re-index into standard HTML If selected, code like [url=http://abc.de/][b]abc.de[/b][/url] will be converted to <;a href="http://abc.de">;<;strong abc.de<;/strong>;<;/a>; New item in Admin settings: Enable to decode entity coded sites into standard HTML characters. If . . .
. . .
New item in Admin settings: Enable to decode entity coded sites into standard HTML characters. If selected, entity coded text like Čapek and D #246;hl will be converted to Čapek and Döhl New options in Admin settings: - Use whitelist in order to enable index / re-index only those pages that include any the words in whitelist - Use . . .
. . .
index / re-index only those pages that include all the words in whitelist Improved 'Follow sitemap.xml' procedure: If <;sitemapindex . . >; is detected in a sitemap.xml file, and if multiple Sitemap files are available, Sphider-plus will process the secondary Sitemaps and extract all links for index / re-index. Also gzip-compressed files . . .
. . .
files (Index Sitemap files as well as the Sitemap files) will be processed. Improved index / re-index procedure: If charset of a site to be indexed is undetectable, because it is not HTML standard conform or missing HTML tag, the index procedure will no longer been interrupted. Preferred charset as defined in Admin settings will be used for . . .
. . .
charset as defined in Admin settings will be used for the involved link. Improved index / re-index procedure: If Sphider-plus is relocated by http 301 or 302, links found at the relocated site will also be followed. For new sites, as per default the spider-depth is now set to 'full'. Improved UTF-8 support: Conversion into UTF-8 charset . . .
. . .
& Re-index'. Improved search functions for search with wildcards and for strict search. Improved category search: - Selected category name is highlighted in headline of result listing. - If activated in Admin setting, categories which would also deliver results are presented individual for each result link in the result listing. - If search in . . .
. . .
are presented individual for each result link in the result listing. If media search is enabled in Admin settings, text search with wildcards will also present media results. Improved search utility: Queries with and without hyphen will deliver the same results, so that queries like 'make-up' and 'make up' do have equal rights. The same . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 53x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release: Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.1 Release date September 03, 2009 In front of Sphider-plus version 2.0 the following items have been added / modified New item in Admin settings Perform a segmentation of Chinese and Korean text during index / re-index procedure. Will divide . . .
. . .
Korean text during index / re-index procedure. Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 , so that all will become searchable. Valid for Chinese sites with charset GB2312, GBK and GB18030 Valid for Korean sites with charset EUC-KR and ISO10646-1933 New item in Admin setting Index password protected sites. If enabled, . . .
. . .
Up to 3 different zones could be registered in Admin settings and will be indexed. New options in Admin settings - Index framesets - Index iframes If enabled, both options will index html and image frames. Not available for dynamically reloaded frames (e.g. by JavaScript). New item in Admin setting Enable to decode BBCode during index / . . .
. . .
Admin setting Enable to decode BBCode during index / re-index into standard HTML If selected, code like [url=http//abc.de/][b]abc.de[/b][/url] will be converted to <;a href="http//abc.de">;<;strong abc.de<;/strong>;<;/a>; New item in Admin settings Enable to decode entity coded sites into standard HTML characters. If . . .
. . .
New item in Admin settings Enable to decode entity coded sites into standard HTML characters. If selected, entity coded text like Čapek and D #246;hl will be converted to Čapek and Döhl New options in Admin settings - Use whitelist in order to enable index / re-index only those pages that include any the words in whitelist - Use . . .
. . .
index / re-index only those pages that include all the words in whitelist Improved 'Follow sitemap.xml' procedure If <;sitemapindex . . >; is detected in a sitemap.xml file, and if multiple Sitemap files are available, Sphider-plus will process the secondary Sitemaps and extract all links for index / re-index. Also gzip-compressed files . . .
. . .
files (Index Sitemap files as well as the Sitemap files) will be processed. Improved index / re-index procedure If charset of a site to be indexed is undetectable, because it is not HTML standard conform or missing HTML tag, the index procedure will no longer been interrupted. Preferred charset as defined in Admin settings will be used for . . .
. . .
charset as defined in Admin settings will be used for the involved link. Improved index / re-index procedure If Sphider-plus is relocated by http 301 or 302, links found at the relocated site will also be followed. For new sites, as per default the spider-depth is now set to 'full'. Improved UTF-8 support Conversion into UTF-8 charset . . .
. . .
& Re-index'. Improved search functions for search with wildcards and for strict search. Improved category search - Selected category name is highlighted in headline of result listing. - If activated in Admin setting, categories which would also deliver results are presented individual for each result link in the result listing. - If search in . . .
. . .
are presented individual for each result link in the result listing. If media search is enabled in Admin settings, text search with wildcards will also present media results. Improved search utility Queries with and without hyphen will deliver the same results, so that queries like 'make-up' and 'make up' do have equal rights. The same . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release Nearly all, because of PHP 5.3 compatibility. In . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.1 Release date September 03, 2009 In front of Sphider-plus version 2.0 the following items have been added / modified New item in Admin settings Perform a segmentation of Chinese and Korean text during index / re-index procedure. Will divide . . .
. . .
Korean text during index / re-index procedure. Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 , so that all will become searchable. Valid for Chinese sites with charset GB2312, GBK and GB18030 Valid for Korean sites with charset EUC-KR and ISO10646-1933 New item in Admin setting Index password protected sites. If enabled, . . .
. . .
Up to 3 different zones could be registered in Admin settings and will be indexed. New options in Admin settings - Index framesets - Index iframes If enabled, both options will index html and image frames. Not available for dynamically reloaded frames (e.g. by JavaScript). New item in Admin setting Enable to decode BBCode during index / . . .
. . .
Admin setting Enable to decode BBCode during index / re-index into standard HTML If selected, code like [url=http//abc.de/][b]abc.de[/b][/url] will be converted to <;a href="http//abc.de">;<;strong abc.de<;/strong>;<;/a>; New item in Admin settings Enable to decode entity coded sites into standard HTML characters. If . . .
. . .
New item in Admin settings Enable to decode entity coded sites into standard HTML characters. If selected, entity coded text like Čapek and D #246;hl will be converted to Čapek and Döhl New options in Admin settings - Use whitelist in order to enable index / re-index only those pages that include any the words in whitelist - Use . . .
. . .
index / re-index only those pages that include all the words in whitelist Improved 'Follow sitemap.xml' procedure If <;sitemapindex . . >; is detected in a sitemap.xml file, and if multiple Sitemap files are available, Sphider-plus will process the secondary Sitemaps and extract all links for index / re-index. Also gzip-compressed files . . .
. . .
files (Index Sitemap files as well as the Sitemap files) will be processed. Improved index / re-index procedure If charset of a site to be indexed is undetectable, because it is not HTML standard conform or missing HTML tag, the index procedure will no longer been interrupted. Preferred charset as defined in Admin settings will be used for . . .
. . .
charset as defined in Admin settings will be used for the involved link. Improved index / re-index procedure If Sphider-plus is relocated by http 301 or 302, links found at the relocated site will also be followed. For new sites, as per default the spider-depth is now set to 'full'. Improved UTF-8 support Conversion into UTF-8 charset . . .
. . .
& Re-index'. Improved search functions for search with wildcards and for strict search. Improved category search - Selected category name is highlighted in headline of result listing. - If activated in Admin setting, categories which would also deliver results are presented individual for each result link in the result listing. - If search in . . .
. . .
are presented individual for each result link in the result listing. If media search is enabled in Admin settings, text search with wildcards will also present media results. Improved search utility Queries with and without hyphen will deliver the same results, so that queries like 'make-up' and 'make up' do have equal rights. The same . . .
. . .
site and link URLs to be indexed is now increased to 1024 characters. Maximum length for link 'title' increased to 255 characters. Code rewritten to cooperate with PHP 5.3.x Error corrected de-language file. Thanks to Carl D. Erling Involved files that have been modified / added for this release Nearly all, because of PHP 5.3 compatibility. In . . .

14.   Sphider-plus - The PHP Search Engine Visit in a new window


5455  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.2 Release date: December 22 2009 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.1 the following items have been added / modified: Improved multiple database support: Results may now be . . .
. . .
1 - 5 databases could be configured to fetch results for the common result listing. Valid for text and media search all search modes taking into account category selection. More details in documentation chapter Activate / Disable databases Improved RSS and Atom feed index procedure. Including now also a validation for the well-formed XML. . . .
. . .
now also a validation for the well-formed XML. Support added for RDF feeds. For a complete list of indexed items please notice the documentation chapter: RDF RSD RSS and Atom feeds Additional item in Admin settings: Follow CDATA directives for feed content. Additional item in Admin settings: Index 'Dublin Core' and other individually marked . . .
. . .
settings: Index 'Dublin Core' and other individually marked tags in RDF feeds. Additional item in Admin settings: Follow the 'preferred (true/false)' directive in RSD feeds. Detection of encoding (charset) added for XML and XHTML files. New item in Admin settings: During index procedure convert all kind of single quotes like ` ´ ’ ‘ into . . .
. . .
index procedure convert all kind of single quotes like ` ´ ’ ‘ into standard quotes ' New item in Admin settings: Reduce queries which contain quotes to the basic word. This will deliver the same results for queries like: d'information = information or dei'largi = largi Results will be highlighted for the base word. Exclusive noun pronoun . . .
. . .
for the base word. Exclusive noun pronoun etc. Works for all kinds of single quotes. New Admin setting: For queries containing numbers search with wildcards. Useful to search for complex article numbers if the user only knows a part of the complete item description New Admin setting: Index ZIP compressed files and archives. . . .
. . .
part of the complete item description New Admin setting: Index ZIP compressed files and archives. Supports (X)HTML XML and also compressed PDFs and other document files as well as all kind of feeds frames and iframes. Links found in the compressed files will be followed. New option to sort the result listing: Sort by last indexed (date and . . .
. . .
to limit result listing: Define max. amount of results presented in result listing. To be defined in Admin settings the count of results will be limited for text and media results New item in Admin settings: Use list of div id's to ignore the corresponding div content during index/re-index A common list of div id values is used to ignore parts . . .
. . .
is used to ignore parts of a page. Content between <;div id=’this_value’>; and <;/ ; will be ignored however links in it are followed. Multiple and nested div’s will be attended. Values in common list may end with a wildcard so that 'menu*' will work for menu1 menu2 menu_left etc. Usable also for external pages if it is . . .
. . .
impossible to add the <;! sphider_noindex >; tags. Common 'URL Must include' and 'URL must Not include' rules which are valid for all new sites may be placed now in 2 files. The contents will be transferred to the corresponding option fields when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in . . .
. . .
when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in documentation chapter: Must include / must not include string list Log output suppressed if the indexer is only redirected from http://www.abc.de to http://www.abc.de/index.html Improved response for 'canonical' links. Back references to the calling . . .
. . .
links. Back references to the calling page are ignored now. New option for iframe indexing in Admin settings: Instead of calling page remember the link to iframe directly. New Admin setting: If found on different pages index also duplicate media content. If activated all images audio and video stream will be presented in result . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query) if keyword was found . . .


5455  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.2 Release date: December 22, 2009 Build up with Sphider: v.135 In front of Sphider-plus version 2.1 the following items have been added / modified: Improved multiple database support: Results may now be . . .
. . .
1 - 5 databases could be configured to fetch results for the common result listing. Valid for text and media search, all search modes, taking into account category selection. More details in documentation chapter Activate / Disable databases Improved RSS and Atom feed index procedure. Including now also a validation for the well-formed XML. . . .
. . .
now also a validation for the well-formed XML. Support added for RDF feeds. For a complete list of indexed items, please notice the documentation chapter: RDF, RSD, RSS and Atom feeds Additional item in Admin settings: Follow CDATA directives for feed content. Additional item in Admin settings: Index 'Dublin Core' and other individually marked . . .
. . .
settings: Index 'Dublin Core' and other individually marked tags in RDF feeds. Additional item in Admin settings: Follow the 'preferred (true/false)' directive in RSD feeds. Detection of encoding (charset) added for XML and XHTML files. New item in Admin settings: During index procedure, convert all kind of single quotes like ` ´ ’ ‘ into . . .
. . .
index procedure, convert all kind of single quotes like ` ´ ’ ‘ into standard quotes ' New item in Admin settings: Reduce queries which contain quotes to the basic word. This will deliver the same results for queries like: d'information = information or dei'largi = largi Results will be highlighted for the base word. Exclusive noun, pronoun, . . .
. . .
for the base word. Exclusive noun, pronoun, etc. Works for all kinds of single quotes. New Admin setting: For queries containing numbers, search with wildcards. Useful to search for complex article numbers, if the user only knows a part of the complete item description New Admin setting: Index ZIP compressed files and archives. . . .
. . .
part of the complete item description New Admin setting: Index ZIP compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. New option to sort the result listing: Sort by last indexed (date and . . .
. . .
to limit result listing: Define max. amount of results presented in result listing. To be defined in Admin settings, the count of results will be limited for text and media results New item in Admin settings: Use list of div id's to ignore the corresponding div content during index/re-index A common list of div id values is used to ignore parts . . .
. . .
is used to ignore parts of a page. Content between <;div id=’this_value’>; and <;/ ; will be ignored, however links in it are followed. Multiple and nested div’s will be attended. Values in common list may end with a wildcard, so that 'menu*' will work for menu1, menu2, menu_left, etc. Usable also for external pages, if it is . . .
. . .
impossible to add the <;! sphider_noindex >; tags. Common 'URL Must include' and 'URL must Not include' rules, which are valid for all new sites, may be placed now in 2 files. The contents will be transferred to the corresponding option fields when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in . . .
. . .
when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in documentation chapter: Must include / must not include string list Log output suppressed, if the indexer is only redirected from http://www.abc.de to http://www.abc.de/index.html Improved response for 'canonical' links. Back references to the calling . . .
. . .
links. Back references to the calling page are ignored now. New option for iframe indexing in Admin settings: Instead of calling page, remember the link to iframe directly. New Admin setting: If found on different pages, index also duplicate media content. If activated, all images, audio and video stream will be presented in result . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .


5455  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.2 Release date: December 22, 2009 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.1 the following items have been added / modified: Improved multiple database support: Results may now be . . .
. . .
1 - 5 databases could be configured to fetch results for the common result listing. Valid for text and media search, all search modes, taking into account category selection. More details in documentation chapter Activate / Disable databases Improved RSS and Atom feed index procedure. Including now also a validation for the well-formed XML. . . .
. . .
now also a validation for the well-formed XML. Support added for RDF feeds. For a complete list of indexed items, please notice the documentation chapter: RDF, RSD, RSS and Atom feeds Additional item in Admin settings: Follow CDATA directives for feed content. Additional item in Admin settings: Index 'Dublin Core' and other individually marked . . .
. . .
settings: Index 'Dublin Core' and other individually marked tags in RDF feeds. Additional item in Admin settings: Follow the 'preferred (true/false)' directive in RSD feeds. Detection of encoding (charset) added for XML and XHTML files. New item in Admin settings: During index procedure, convert all kind of single quotes like ` ´ ’ ‘ into . . .
. . .
index procedure, convert all kind of single quotes like ` ´ ’ ‘ into standard quotes ' New item in Admin settings: Reduce queries which contain quotes to the basic word. This will deliver the same results for queries like: d'information = information or dei'largi = largi Results will be highlighted for the base word. Exclusive noun, pronoun, . . .
. . .
for the base word. Exclusive noun, pronoun, etc. Works for all kinds of single quotes. New Admin setting: For queries containing numbers, search with wildcards. Useful to search for complex article numbers, if the user only knows a part of the complete item description New Admin setting: Index ZIP compressed files and archives. . . .
. . .
part of the complete item description New Admin setting: Index ZIP compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. New option to sort the result listing: Sort by last indexed (date and . . .
. . .
to limit result listing: Define max. amount of results presented in result listing. To be defined in Admin settings, the count of results will be limited for text and media results New item in Admin settings: Use list of div id's to ignore the corresponding div content during index/re-index A common list of div id values is used to ignore parts . . .
. . .
is used to ignore parts of a page. Content between <;div id=’this_value’>; and <;/ ; will be ignored, however links in it are followed. Multiple and nested div’s will be attended. Values in common list may end with a wildcard, so that 'menu*' will work for menu1, menu2, menu_left, etc. Usable also for external pages, if it is . . .
. . .
impossible to add the <;! sphider_noindex >; tags. Common 'URL Must include' and 'URL must Not include' rules, which are valid for all new sites, may be placed now in 2 files. The contents will be transferred to the corresponding option fields when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in . . .
. . .
when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in documentation chapter: Must include / must not include string list Log output suppressed, if the indexer is only redirected from http://www.abc.de to http://www.abc.de/index.html Improved response for 'canonical' links. Back references to the calling . . .
. . .
links. Back references to the calling page are ignored now. New option for iframe indexing in Admin settings: Instead of calling page, remember the link to iframe directly. New Admin setting: If found on different pages, index also duplicate media content. If activated, all images, audio and video stream will be presented in result . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .


5455  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.2 Release date: December 22, 2009 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.1 the following items have been added / modified: Improved multiple database support: Results may now be . . .
. . .
1 - 5 databases could be configured to fetch results for the common result listing. Valid for text and media search, all search modes, taking into account category selection. More details in documentation chapter Activate / Disable databases Improved RSS and Atom feed index procedure. Including now also a validation for the well-formed XML. . . .
. . .
now also a validation for the well-formed XML. Support added for RDF feeds. For a complete list of indexed items, please notice the documentation chapter: RDF, RSD, RSS and Atom feeds Additional item in Admin settings: Follow CDATA directives for feed content. Additional item in Admin settings: Index 'Dublin Core' and other individually marked . . .
. . .
settings: Index 'Dublin Core' and other individually marked tags in RDF feeds. Additional item in Admin settings: Follow the 'preferred (true/false)' directive in RSD feeds. Detection of encoding (charset) added for XML and XHTML files. New item in Admin settings: During index procedure, convert all kind of single quotes like ` ´ ’ ‘ into . . .
. . .
index procedure, convert all kind of single quotes like ` ´ ’ ‘ into standard quotes ' New item in Admin settings: Reduce queries which contain quotes to the basic word. This will deliver the same results for queries like: d'information = information or dei'largi = largi Results will be highlighted for the base word. Exclusive noun, pronoun, . . .
. . .
for the base word. Exclusive noun, pronoun, etc. Works for all kinds of single quotes. New Admin setting: For queries containing numbers, search with wildcards. Useful to search for complex article numbers, if the user only knows a part of the complete item description New Admin setting: Index ZIP compressed files and archives. . . .
. . .
part of the complete item description New Admin setting: Index ZIP compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. New option to sort the result listing: Sort by last indexed (date and . . .
. . .
to limit result listing: Define max. amount of results presented in result listing. To be defined in Admin settings, the count of results will be limited for text and media results New item in Admin settings: Use list of div id's to ignore the corresponding div content during index/re-index A common list of div id values is used to ignore parts . . .
. . .
is used to ignore parts of a page. Content between <;div id=’this_value’>; and <;/ ; will be ignored, however links in it are followed. Multiple and nested div’s will be attended. Values in common list may end with a wildcard, so that 'menu*' will work for menu1, menu2, menu_left, etc. Usable also for external pages, if it is . . .
. . .
impossible to add the <;! sphider_noindex >; tags. Common 'URL Must include' and 'URL must Not include' rules, which are valid for all new sites, may be placed now in 2 files. The contents will be transferred to the corresponding option fields when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in . . .
. . .
when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in documentation chapter: Must include / must not include string list Log output suppressed, if the indexer is only redirected from http://www.abc.de to http://www.abc.de/index.html Improved response for 'canonical' links. Back references to the calling . . .
. . .
links. Back references to the calling page are ignored now. New option for iframe indexing in Admin settings: Instead of calling page, remember the link to iframe directly. New Admin setting: If found on different pages, index also duplicate media content. If activated, all images, audio and video stream will be presented in result . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .


5455  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.2 Release date December 22, 2009 Build up with Sphider v.1.3.5 In front of Sphider-plus version 2.1 the following items have been added / modified Improved multiple database support Results may now be . . .
. . .
1 - 5 databases could be configured to fetch results for the common result listing. Valid for text and media search, all search modes, taking into account category selection. More details in documentation chapter Activate / Disable databases Improved RSS and Atom feed index procedure. Including now also a validation for the well-formed XML. . . .
. . .
now also a validation for the well-formed XML. Support added for RDF feeds. For a complete list of indexed items, please notice the documentation chapter RDF, RSD, RSS and Atom feeds Additional item in Admin settings Follow CDATA directives for feed content. Additional item in Admin settings Index 'Dublin Core' and other individually marked . . .
. . .
settings Index 'Dublin Core' and other individually marked tags in RDF feeds. Additional item in Admin settings Follow the 'preferred (true/false)' directive in RSD feeds. Detection of encoding (charset) added for XML and XHTML files. New item in Admin settings During index procedure, convert all kind of single quotes like ` ´ ’ ‘ into . . .
. . .
index procedure, convert all kind of single quotes like ` ´ ’ ‘ into standard quotes ' New item in Admin settings Reduce queries which contain quotes to the basic word. This will deliver the same results for queries like d'information = information or dei'largi = largi Results will be highlighted for the base word. Exclusive noun, pronoun, . . .
. . .
for the base word. Exclusive noun, pronoun, etc. Works for all kinds of single quotes. New Admin setting For queries containing numbers, search with wildcards. Useful to search for complex article numbers, if the user only knows a part of the complete item description New Admin setting Index ZIP compressed files and archives. . . .
. . .
part of the complete item description New Admin setting Index ZIP compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. New option to sort the result listing Sort by last indexed (date and . . .
. . .
to limit result listing Define max. amount of results presented in result listing. To be defined in Admin settings, the count of results will be limited for text and media results New item in Admin settings Use list of div id's to ignore the corresponding div content during index/re-index A common list of div id values is used to ignore parts . . .
. . .
is used to ignore parts of a page. Content between <;div id=’this_value’>; and <;/ ; will be ignored, however links in it are followed. Multiple and nested div’s will be attended. Values in common list may end with a wildcard, so that 'menu*' will work for menu1, menu2, menu_left, etc. Usable also for external pages, if it is . . .
. . .
impossible to add the <;! sphider_noindex >; tags. Common 'URL Must include' and 'URL must Not include' rules, which are valid for all new sites, may be placed now in 2 files. The contents will be transferred to the corresponding option fields when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in . . .
. . .
when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in documentation chapter Must include / must not include string list Log output suppressed, if the indexer is only redirected from http//www.abc.de to http//www.abc.de/index.html Improved response for 'canonical' links. Back references to the calling . . .
. . .
links. Back references to the calling page are ignored now. New option for iframe indexing in Admin settings Instead of calling page, remember the link to iframe directly. New Admin setting If found on different pages, index also duplicate media content. If activated, all images, audio and video stream will be presented in result . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .


5455  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.2 Release date December 22, 2009 Build up with Sphider v.1.3.5 In front of Sphider-plus version 2.1 the following items have been added / modified Improved multiple database support Results may now be . . .
. . .
1 - 5 databases could be configured to fetch results for the common result listing. Valid for text and media search, all search modes, taking into account category selection. More details in documentation chapter Activate / Disable databases Improved RSS and Atom feed index procedure. Including now also a validation for the well-formed XML. . . .
. . .
now also a validation for the well-formed XML. Support added for RDF feeds. For a complete list of indexed items, please notice the documentation chapter RDF, RSD, RSS and Atom feeds Additional item in Admin settings Follow CDATA directives for feed content. Additional item in Admin settings Index 'Dublin Core' and other individually marked . . .
. . .
settings Index 'Dublin Core' and other individually marked tags in RDF feeds. Additional item in Admin settings Follow the 'preferred (true/false)' directive in RSD feeds. Detection of encoding (charset) added for XML and XHTML files. New item in Admin settings During index procedure, convert all kind of single quotes like ` ´ ’ ‘ into . . .
. . .
index procedure, convert all kind of single quotes like ` ´ ’ ‘ into standard quotes ' New item in Admin settings Reduce queries which contain quotes to the basic word. This will deliver the same results for queries like d'information = information or dei'largi = largi Results will be highlighted for the base word. Exclusive noun, pronoun, . . .
. . .
for the base word. Exclusive noun, pronoun, etc. Works for all kinds of single quotes. New Admin setting For queries containing numbers, search with wildcards. Useful to search for complex article numbers, if the user only knows a part of the complete item description New Admin setting Index ZIP compressed files and archives. . . .
. . .
part of the complete item description New Admin setting Index ZIP compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. New option to sort the result listing Sort by last indexed (date and . . .
. . .
to limit result listing Define max. amount of results presented in result listing. To be defined in Admin settings, the count of results will be limited for text and media results New item in Admin settings Use list of div id's to ignore the corresponding div content during index/re-index A common list of div id values is used to ignore parts . . .
. . .
is used to ignore parts of a page. Content between <;div id=’this_value’>; and <;/ ; will be ignored, however links in it are followed. Multiple and nested div’s will be attended. Values in common list may end with a wildcard, so that 'menu*' will work for menu1, menu2, menu_left, etc. Usable also for external pages, if it is . . .
. . .
impossible to add the <;! sphider_noindex >; tags. Common 'URL Must include' and 'URL must Not include' rules, which are valid for all new sites, may be placed now in 2 files. The contents will be transferred to the corresponding option fields when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in . . .
. . .
when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in documentation chapter Must include / must not include string list Log output suppressed, if the indexer is only redirected from http//www.abc.de to http//www.abc.de/index.html Improved response for 'canonical' links. Back references to the calling . . .
. . .
links. Back references to the calling page are ignored now. New option for iframe indexing in Admin settings Instead of calling page, remember the link to iframe directly. New Admin setting If found on different pages, index also duplicate media content. If activated, all images, audio and video stream will be presented in result . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .


5455  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.2 Release date: December 22, 2009 Build up with Sphider: v135 In front of Sphider-plus version 2.1 the following items have been added / modified: Improved multiple database support: Results may now be . . .
. . .
1 - 5 databases could be configured to fetch results for the common result listing. Valid for text and media search, all search modes, taking into account category selection. More details in documentation chapter Activate / Disable databases Improved RSS and Atom feed index procedure. Including now also a validation for the well-formed XML. . . .
. . .
now also a validation for the well-formed XML. Support added for RDF feeds. For a complete list of indexed items, please notice the documentation chapter: RDF, RSD, RSS and Atom feeds Additional item in Admin settings: Follow CDATA directives for feed content. Additional item in Admin settings: Index 'Dublin Core' and other individually marked . . .
. . .
settings: Index 'Dublin Core' and other individually marked tags in RDF feeds. Additional item in Admin settings: Follow the 'preferred (true/false)' directive in RSD feeds. Detection of encoding (charset) added for XML and XHTML files. New item in Admin settings: During index procedure, convert all kind of single quotes like ` ´ ’ ‘ into . . .
. . .
index procedure, convert all kind of single quotes like ` ´ ’ ‘ into standard quotes ' New item in Admin settings: Reduce queries which contain quotes to the basic word. This will deliver the same results for queries like: d'information = information or dei'largi = largi Results will be highlighted for the base word. Exclusive noun, pronoun, . . .
. . .
for the base word. Exclusive noun, pronoun, etc. Works for all kinds of single quotes. New Admin setting: For queries containing numbers, search with wildcards. Useful to search for complex article numbers, if the user only knows a part of the complete item description New Admin setting: Index ZIP compressed files and archives. . . .
. . .
part of the complete item description New Admin setting: Index ZIP compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. New option to sort the result listing: Sort by last indexed (date and . . .
. . .
to limit result listing: Define max. amount of results presented in result listing. To be defined in Admin settings, the count of results will be limited for text and media results New item in Admin settings: Use list of div id's to ignore the corresponding div content during index/re-index A common list of div id values is used to ignore parts . . .
. . .
is used to ignore parts of a page. Content between <;div id=’this_value’>; and <;/ ; will be ignored, however links in it are followed. Multiple and nested div’s will be attended. Values in common list may end with a wildcard, so that 'menu*' will work for menu1, menu2, menu_left, etc. Usable also for external pages, if it is . . .
. . .
impossible to add the <;! sphider_noindex >; tags. Common 'URL Must include' and 'URL must Not include' rules, which are valid for all new sites, may be placed now in 2 files. The contents will be transferred to the corresponding option fields when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in . . .
. . .
when calling 'Add site' in Admin menu. Individually de-selectable by checkbox. Details in documentation chapter: Must include / must not include string list Log output suppressed, if the indexer is only redirected from http://www.abc.de to http://www.abc.de/index.html Improved response for 'canonical' links. Back references to the calling . . .
. . .
links. Back references to the calling page are ignored now. New option for iframe indexing in Admin settings: Instead of calling page, remember the link to iframe directly. New Admin setting: If found on different pages, index also duplicate media content. If activated, all images, audio and video stream will be presented in result . . .
. . .
Bug fixed that prevented highlighting of keywords in result listing, if full text was shorter than 250 characters (as to be defined in Admin settings: 'Maximum length of page summary displayed in search results'). Bug fixed that prevented highlighting of keywords in result listing for Strict search (!query), if keyword was found . . .

15.   Sphider-plus - The PHP Search Engine Visit in a new window

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.3 Release date: April 23 2010 Build up with Sphider: v.1.3.5 In order to ease customer's integration of Sphider-plus into existing sites HTML templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. . . .
. . .
templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. New feature: Allow indexing of other hosts with same domain name for links found during indexing. Also ignore TLD SLD and www. More details in documentation chapter: Allow other hosts in same domain New feature: Allow indexing of other hosts . . .
. . .
but only if the found links are redirected. Also ignore TLD SLD and www. More details in documentation chapter: Allow other hosts in same domain New feature: Index sites and follow links containing none ‘Basic Latin’ and none ASCII characters as part of their URL. 2 new features of sorting the result listing: - Results of a promoted / . . .
. . .
a promoted / featured domain will be displayed on top of the search result listing. As part of the Admin settings a domain name or part of the name could be entered. All search results belonging to this domain will be placed on top of result listing. - Pages containing a catchword will be displayed on top of the search result listing. As part . . .
. . .
- Pages containing a catchword will be displayed on top of the search result listing. As part of the Admin settings the catchword could be entered. More details in documentation chapter: Chronological order for result listing New feature: Split words into their basic parts separated at each hyphen dot or comma inside the words. For example . . .
. . .
will be divided into the 3 keywords: sphider plus eu As also the original word is stored as keyword all 4 words become searchable. Alternatively the separation only at hyphens is selectable in Admin settings. New feature: Index the "Description" Meta tag in HTML header. To be activated in Admin settings. New feature: Index of . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In order to index XLS files a converter for Exel files was developed. Implemented as PHP script the converter needs no adoption to the Operating System. New Admin setting: Index RAR compressed files and archives. Supports (X)HTML XML and also compressed . . .
. . .
Index RAR compressed files and archives. Supports (X)HTML XML and also compressed PDFs and other document files as well as all kind of feeds frames and iframes. Links found in the compressed files will be followed. 15 language specific stemming algorithms implemented. Individually selectable for: Bulgarian Chinese Czech Dutch English . . .
. . .
stemming algorithms implemented. Individually selectable for: Bulgarian Chinese Czech Dutch English Finnish French German Greek Hungarian Italian Portuguese Russian Spanish and Swedish. For details see chapter Word stemming More details in documentation chapter: Word stemming New Admin setting: Activate/disable: Create . . .
. . .
Re-index all meanwhile erased sites. New Admin setting: Show complete list during import and export of URLs or hide output. 24 language specific common files holding a list of words to be ignored during index (stop words). Added or updated for: Arabic Bengali Bulgarian Catalan Czech Danish Dutch English Farsi Finnish French . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.3 Release date: April 23, 2010 Build up with Sphider: v.135 In order to ease customer's integration of Sphider-plus into existing sites, HTML templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. . . .
. . .
templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. New feature: Allow indexing of other hosts with same domain name for links found during indexing. Also ignore TLD, SLD and www. More details in documentation chapter: Allow other hosts in same domain New feature: Allow indexing of other hosts . . .
. . .
but only if the found links are redirected. Also ignore TLD, SLD and www. More details in documentation chapter: Allow other hosts in same domain New feature: Index sites and follow links containing none ‘Basic Latin’ and none ASCII characters as part of their URL. 2 new features of sorting the result listing: - Results of a promoted / . . .
. . .
a promoted / featured domain will be displayed on top of the search result listing. As part of the Admin settings, a domain name or part of the name could be entered. All search results belonging to this domain will be placed on top of result listing. - Pages containing a catchword will be displayed on top of the search result listing. As part . . .
. . .
- Pages containing a catchword will be displayed on top of the search result listing. As part of the Admin settings, the catchword could be entered. More details in documentation chapter: Chronological order for result listing New feature: Split words into their basic parts, separated at each hyphen, dot or comma inside the words. For example . . .
. . .
will be divided into the 3 keywords: sphider plus eu As also the original word is stored as keyword, all 4 words become searchable. Alternatively the separation only at hyphens is selectable in Admin settings. New feature: Index the "Description" Meta tag in HTML header. To be activated in Admin settings. New feature: Index of . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In order to index XLS files, a converter for Exel files was developed. Implemented as PHP script, the converter needs no adoption to the Operating System. New Admin setting: Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed . . .
. . .
Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. 15 language specific stemming algorithms implemented. Individually selectable for: Bulgarian, Chinese, Czech, Dutch, English, . . .
. . .
stemming algorithms implemented. Individually selectable for: Bulgarian, Chinese, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Portuguese, Russian, Spanish and Swedish. For details see chapter Word stemming More details in documentation chapter: Word stemming New Admin setting: Activate/disable: Create . . .
. . .
Re-index all meanwhile erased sites. New Admin setting: Show complete list during import and export of URLs, or hide output. 24 language specific common files holding a list of words to be ignored during index (stop words). Added or updated for: Arabic, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English, Farsi, Finnish, French, . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.3 Release date: April 23, 2010 Build up with Sphider: v.1.3.5 In order to ease customer's integration of Sphider-plus into existing sites, HTML templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. . . .
. . .
templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. New feature: Allow indexing of other hosts with same domain name for links found during indexing. Also ignore TLD, SLD and www. More details in documentation chapter: Allow other hosts in same domain New feature: Allow indexing of other hosts . . .
. . .
but only if the found links are redirected. Also ignore TLD, SLD and www. More details in documentation chapter: Allow other hosts in same domain New feature: Index sites and follow links containing none ‘Basic Latin’ and none ASCII characters as part of their URL. 2 new features of sorting the result listing: - Results of a promoted / . . .
. . .
a promoted / featured domain will be displayed on top of the search result listing. As part of the Admin settings, a domain name or part of the name could be entered. All search results belonging to this domain will be placed on top of result listing. - Pages containing a catchword will be displayed on top of the search result listing. As part . . .
. . .
- Pages containing a catchword will be displayed on top of the search result listing. As part of the Admin settings, the catchword could be entered. More details in documentation chapter: Chronological order for result listing New feature: Split words into their basic parts, separated at each hyphen, dot or comma inside the words. For example . . .
. . .
will be divided into the 3 keywords: sphider plus eu As also the original word is stored as keyword, all 4 words become searchable. Alternatively the separation only at hyphens is selectable in Admin settings. New feature: Index the "Description" Meta tag in HTML header. To be activated in Admin settings. New feature: Index of . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In order to index XLS files, a converter for Exel files was developed. Implemented as PHP script, the converter needs no adoption to the Operating System. New Admin setting: Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed . . .
. . .
Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. 15 language specific stemming algorithms implemented. Individually selectable for: Bulgarian, Chinese, Czech, Dutch, English, . . .
. . .
stemming algorithms implemented. Individually selectable for: Bulgarian, Chinese, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Portuguese, Russian, Spanish and Swedish. For details see chapter Word stemming More details in documentation chapter: Word stemming New Admin setting: Activate/disable: Create . . .
. . .
Re-index all meanwhile erased sites. New Admin setting: Show complete list during import and export of URLs, or hide output. 24 language specific common files holding a list of words to be ignored during index (stop words). Added or updated for: Arabic, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English, Farsi, Finnish, French, . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.3 Release date April 23, 2010 Build up with Sphider v.1.3.5 In order to ease customer's integration of Sphider-plus into existing sites, HTML templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. . . .
. . .
templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. New feature Allow indexing of other hosts with same domain name for links found during indexing. Also ignore TLD, SLD and www. More details in documentation chapter Allow other hosts in same domain New feature Allow indexing of other hosts . . .
. . .
but only if the found links are redirected. Also ignore TLD, SLD and www. More details in documentation chapter Allow other hosts in same domain New feature Index sites and follow links containing none ‘Basic Latin’ and none ASCII characters as part of their URL. 2 new features of sorting the result listing - Results of a promoted / . . .
. . .
a promoted / featured domain will be displayed on top of the search result listing. As part of the Admin settings, a domain name or part of the name could be entered. All search results belonging to this domain will be placed on top of result listing. - Pages containing a catchword will be displayed on top of the search result listing. As part . . .
. . .
- Pages containing a catchword will be displayed on top of the search result listing. As part of the Admin settings, the catchword could be entered. More details in documentation chapter Chronological order for result listing New feature Split words into their basic parts, separated at each hyphen, dot or comma inside the words. For example . . .
. . .
will be divided into the 3 keywords sphider plus eu As also the original word is stored as keyword, all 4 words become searchable. Alternatively the separation only at hyphens is selectable in Admin settings. New feature Index the "Description" Meta tag in HTML header. To be activated in Admin settings. New feature Index of . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are fopen(); file_get_contents(); md5_file(); 3 new features for command line operation - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature In . . .
. . .
been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature In order to index XLS files, a converter for Exel files was developed. Implemented as PHP script, the converter needs no adoption to the Operating System. New Admin setting Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed . . .
. . .
Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. 15 language specific stemming algorithms implemented. Individually selectable for Bulgarian, Chinese, Czech, Dutch, English, . . .
. . .
stemming algorithms implemented. Individually selectable for Bulgarian, Chinese, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Portuguese, Russian, Spanish and Swedish. For details see chapter Word stemming More details in documentation chapter Word stemming New Admin setting Activate/disable Create . . .
. . .
Re-index all meanwhile erased sites. New Admin setting Show complete list during import and export of URLs, or hide output. 24 language specific common files holding a list of words to be ignored during index (stop words). Added or updated for Arabic, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English, Farsi, Finnish, French, . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.3 Release date April 23, 2010 Build up with Sphider v.1.3.5 In order to ease customer's integration of Sphider-plus into existing sites, HTML templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. . . .
. . .
templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. New feature Allow indexing of other hosts with same domain name for links found during indexing. Also ignore TLD, SLD and www. More details in documentation chapter Allow other hosts in same domain New feature Allow indexing of other hosts . . .
. . .
but only if the found links are redirected. Also ignore TLD, SLD and www. More details in documentation chapter Allow other hosts in same domain New feature Index sites and follow links containing none ‘Basic Latin’ and none ASCII characters as part of their URL. 2 new features of sorting the result listing - Results of a promoted / . . .
. . .
a promoted / featured domain will be displayed on top of the search result listing. As part of the Admin settings, a domain name or part of the name could be entered. All search results belonging to this domain will be placed on top of result listing. - Pages containing a catchword will be displayed on top of the search result listing. As part . . .
. . .
- Pages containing a catchword will be displayed on top of the search result listing. As part of the Admin settings, the catchword could be entered. More details in documentation chapter Chronological order for result listing New feature Split words into their basic parts, separated at each hyphen, dot or comma inside the words. For example . . .
. . .
will be divided into the 3 keywords sphider plus eu As also the original word is stored as keyword, all 4 words become searchable. Alternatively the separation only at hyphens is selectable in Admin settings. New feature Index the "Description" Meta tag in HTML header. To be activated in Admin settings. New feature Index of . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are fopen(); file_get_contents(); md5_file(); 3 new features for command line operation - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature In . . .
. . .
been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature In order to index XLS files, a converter for Exel files was developed. Implemented as PHP script, the converter needs no adoption to the Operating System. New Admin setting Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed . . .
. . .
Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. 15 language specific stemming algorithms implemented. Individually selectable for Bulgarian, Chinese, Czech, Dutch, English, . . .
. . .
stemming algorithms implemented. Individually selectable for Bulgarian, Chinese, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Portuguese, Russian, Spanish and Swedish. For details see chapter Word stemming More details in documentation chapter Word stemming New Admin setting Activate/disable Create . . .
. . .
Re-index all meanwhile erased sites. New Admin setting Show complete list during import and export of URLs, or hide output. 24 language specific common files holding a list of words to be ignored during index (stop words). Added or updated for Arabic, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English, Farsi, Finnish, French, . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.3 Release date: April 23, 2010 Build up with Sphider: v.1.3.5 In order to ease customer's integration of Sphider-plus into existing sites, HTML templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. . . .
. . .
templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. New feature: Allow indexing of other hosts with same domain name for links found during indexing. Also ignore TLD, SLD and www. More details in documentation chapter: Allow other hosts in same domain New feature: Allow indexing of other hosts . . .
. . .
but only if the found links are redirected. Also ignore TLD, SLD and www. More details in documentation chapter: Allow other hosts in same domain New feature: Index sites and follow links containing none ‘Basic Latin’ and none ASCII characters as part of their URL. 2 new features of sorting the result listing: - Results of a promoted / . . .
. . .
a promoted / featured domain will be displayed on top of the search result listing. As part of the Admin settings, a domain name or part of the name could be entered. All search results belonging to this domain will be placed on top of result listing. - Pages containing a catchword will be displayed on top of the search result listing. As part . . .
. . .
- Pages containing a catchword will be displayed on top of the search result listing. As part of the Admin settings, the catchword could be entered. More details in documentation chapter: Chronological order for result listing New feature: Split words into their basic parts, separated at each hyphen, dot or comma inside the words. For example . . .
. . .
will be divided into the 3 keywords: sphider plus eu As also the original word is stored as keyword, all 4 words become searchable. Alternatively the separation only at hyphens is selectable in Admin settings. New feature: Index the "Description" Meta tag in HTML header. To be activated in Admin settings. New feature: Index of . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5file()); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In order to index XLS files, a converter for Exel files was developed. Implemented as PHP script, the converter needs no adoption to the Operating System. New Admin setting: Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed . . .
. . .
Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. 15 language specific stemming algorithms implemented. Individually selectable for: Bulgarian, Chinese, Czech, Dutch, English, . . .
. . .
stemming algorithms implemented. Individually selectable for: Bulgarian, Chinese, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Portuguese, Russian, Spanish and Swedish. For details see chapter Word stemming More details in documentation chapter: Word stemming New Admin setting: Activate/disable: Create . . .
. . .
Re-index all meanwhile erased sites. New Admin setting: Show complete list during import and export of URLs, or hide output. 24 language specific common files holding a list of words to be ignored during index (stop words). Added or updated for: Arabic, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English, Farsi, Finnish, French, . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.3 Release date: April 23, 2010 Build up with Sphider: v135 In order to ease customer's integration of Sphider-plus into existing sites, HTML templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. . . .
. . .
templates are prepared for - Search form - Text results - Media results - Most popular queries - etc. New feature: Allow indexing of other hosts with same domain name for links found during indexing. Also ignore TLD, SLD and www. More details in documentation chapter: Allow other hosts in same domain New feature: Allow indexing of other hosts . . .
. . .
but only if the found links are redirected. Also ignore TLD, SLD and www. More details in documentation chapter: Allow other hosts in same domain New feature: Index sites and follow links containing none ‘Basic Latin’ and none ASCII characters as part of their URL. 2 new features of sorting the result listing: - Results of a promoted / . . .
. . .
a promoted / featured domain will be displayed on top of the search result listing. As part of the Admin settings, a domain name or part of the name could be entered. All search results belonging to this domain will be placed on top of result listing. - Pages containing a catchword will be displayed on top of the search result listing. As part . . .
. . .
- Pages containing a catchword will be displayed on top of the search result listing. As part of the Admin settings, the catchword could be entered. More details in documentation chapter: Chronological order for result listing New feature: Split words into their basic parts, separated at each hyphen, dot or comma inside the words. For example . . .
. . .
will be divided into the 3 keywords: sphider plus eu As also the original word is stored as keyword, all 4 words become searchable. Alternatively the separation only at hyphens is selectable in Admin settings. New feature: Index the "Description" Meta tag in HTML header. To be activated in Admin settings. New feature: Index of . . .
. . .
that do not offer all PHP functions for remote files. Bypassed PHP functions are: fopen(); file_get_contents(); md5_file(); 3 new features for command line operation: - Erase & Re-index all sites ( -eall ) - Index all new URLs in database which had not jet been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In . . .
. . .
been indexed ( -new ) - Re-index all meanwhile erased sites ( -erased ) New feature: In order to index XLS files, a converter for Exel files was developed. Implemented as PHP script, the converter needs no adoption to the Operating System. New Admin setting: Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed . . .
. . .
Index RAR compressed files and archives. Supports (X)HTML, XML and also compressed PDFs and other document files, as well as all kind of feeds, frames and iframes. Links found in the compressed files will be followed. 15 language specific stemming algorithms implemented. Individually selectable for: Bulgarian, Chinese, Czech, Dutch, English, . . .
. . .
stemming algorithms implemented. Individually selectable for: Bulgarian, Chinese, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Portuguese, Russian, Spanish and Swedish. For details see chapter Word stemming More details in documentation chapter: Word stemming New Admin setting: Activate/disable: Create . . .
. . .
Re-index all meanwhile erased sites. New Admin setting: Show complete list during import and export of URLs, or hide output. 24 language specific common files holding a list of words to be ignored during index (stop words). Added or updated for: Arabic, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English, Farsi, Finnish, French, . . .
. . .
links with up to 1024 characters. Input settings for database configuration menus are now enabled for values up to 255 characters. 'Clean resources' improved for index procedure. In case of failure, only warning messages will be created and indexing will not be aborted. The feature 'Clean resources' is added now also for search procedure. Common . . .

16.   Sphider-plus - The PHP Search Engine Visit in a new window

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.4 Release date: July 03 2010 Build up with Sphider: v.1.3.5 New feature: In order to reduce the time for indexing multithreaded indexing was implemented. As part of the Admin settings 1-10 threads are to be activated. Available also for command . . .
. . .
also for command line operation without limitation of the thread counts. For details see the documentation chapter: Multithreaded indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new old and half width) hiragana katakana and jinmeiyo Japanese character writing systems. . . .
. . .
hiragana katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin settings. Implemented as PHP script the converter needs no adoption to the Operating System. New feature: In order to index ‘OpenDocument files . . .
. . .
activated separately for spreadsheet files (.ods) and text files (.odt) in Admin settings. New command line options: - Erase content of database <;-erase>; - Set ‘Last indexed’ date and time to 0000 <;-preall>; Improved support for Japanese coded sites (charset: SHIFT_JIS EUC-JP and UTF-8). Template design 'Pure' reduced to 'like . . .
. . .
(charset: SHIFT_JIS EUC-JP and UTF-8). Template design 'Pure' reduced to 'like Google'. Improved search algorithm: significantly reduced search time. Improved support for Greek language: 1. Transliterate queries with Latin characters into their Greek equivalents. Will for example transform query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
2. Accept Greek queries containing vowels without accents. Query input of letter α will be valid also for ἀ ᾶ and ᾷ The same behavior for all other Greek vowels as well as for the upper case vowels. Both options will create a tolerant result listing. New options in . . .
. . .
case vowels. Both options will create a tolerant result listing. New options in 'Add site' and 'Edit site' menus: - Enter URL of individual Sitemap If Sitemap is not in root folder the URL of the individual Sitemap could be entered. New option to manipulate the result listing: For result sorting 'By URL names' the number of results shown per . . .
. . .
names' the number of results shown per domain is selectable. Offers result presentation similar to 'Like Google' but additionally offers a selectable count of links. Search option 'More results from this domain' not only enabled for result sorting 'Like Google' but also for 'By URL names'. Bug fixed that prevented correct interpretation of . . .
. . .
correct interpretation of http 301 redirects. Bug fixed that causes invalid results for multiple word queries which contain numbers like 'price 25 euro'. Bug fixed that prevented highlighting of keywords if found in position 0 of title or full text. Bug fixed that prevented suppressing of 'Show result scores' in Admin settings. Bug fixed . . .
. . .
Italian language file. Thanks to Giorgio Nanni. Involved files that have been modified / added for this release: /search.php /admin/ all files /converter/ods_reader.php /converter/odt_reader.php /converter/dictionaries/jp_shiftJIS.dic /converter/OpenDocumentSheet/ all files /include/commonfuncs.php /include/searchfuncs.php . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.4 Release date: July 03, 2010 Build up with Sphider: v.135 New feature: In order to reduce the time for indexing, multithreaded indexing was implemented. As part of the Admin settings, 1-10 threads are to be activated. Available also for command . . .
. . .
also for command line operation without limitation of the thread counts. For details see the documentation chapter: Multithreaded indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. . . .
. . .
hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin settings. Implemented as PHP script, the converter needs no adoption to the Operating System. New feature: In order to index ‘OpenDocument files, . . .
. . .
activated separately for spreadsheet files (.ods) and text files (.odt) in Admin settings. New command line options: - Erase content of database <;-erase>; - Set ‘Last indexed’ date and time to 0000 <;-preall>; Improved support for Japanese coded sites (charset: SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like . . .
. . .
(charset: SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like Google'. Improved search algorithm: significantly reduced search time. Improved support for Greek language: 1. Transliterate queries with Latin characters into their Greek equivalents. Will for example transform query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
2. Accept Greek queries containing vowels without accents. Query input of letter α will be valid also for ἀ, ἁ, ἂ, ἃ, ἄ, ἅ, ἆ, ἇ, ὰ, ά, ά, ᾀ, ᾁ, ᾂ, ᾃ, ᾄ, ᾅ, ᾆ, ᾇ, ᾰ, ᾱ, ᾲ, ᾳ, ᾴ, ᾶ and ᾷ The same behavior for all other Greek vowels, as well as for the upper case vowels. Both options will create a tolerant result listing. New options in . . .
. . .
case vowels. Both options will create a tolerant result listing. New options in 'Add site' and 'Edit site' menus: - Enter URL of individual Sitemap If Sitemap is not in root folder, the URL of the individual Sitemap could be entered. New option to manipulate the result listing: For result sorting 'By URL names' the number of results shown per . . .
. . .
names' the number of results shown per domain is selectable. Offers result presentation similar to 'Like Google', but additionally offers a selectable count of links. Search option 'More results from this domain' not only enabled for result sorting 'Like Google', but also for 'By URL names'. Bug fixed that prevented correct interpretation of . . .
. . .
correct interpretation of http 301 redirects. Bug fixed that causes invalid results for multiple word queries, which contain numbers, like 'price 25 euro'. Bug fixed that prevented highlighting of keywords, if found in position 0 of title or full text. Bug fixed that prevented suppressing of 'Show result scores' in Admin settings. Bug fixed . . .
. . .
Italian language file. Thanks to Giorgio Nanni. Involved files that have been modified / added for this release: /search.php /admin/ all files /converter/ods_reader.php /converter/odt_reader.php /converter/dictionaries/jp_shiftJIS.dic /converter/OpenDocumentSheet/ all files /include/commonfuncs.php /include/searchfuncs.php . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.4 Release date: July 03, 2010 Build up with Sphider: v.1.3.5 New feature: In order to reduce the time for indexing, multithreaded indexing was implemented. As part of the Admin settings, 1-10 threads are to be activated. Available also for command . . .
. . .
also for command line operation without limitation of the thread counts. For details see the documentation chapter: Multithreaded indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 572424 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. . . .
. . .
hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin settings. Implemented as PHP script, the converter needs no adoption to the Operating System. New feature: In order to index ‘OpenDocument files, . . .
. . .
activated separately for spreadsheet files (.ods) and text files (.odt) in Admin settings. New command line options: - Erase content of database <;-erase>; - Set ‘Last indexed’ date and time to 0000 <;-preall>; Improved support for Japanese coded sites (charset: SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like . . .
. . .
(charset: SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like Google'. Improved search algorithm: significantly reduced search time. Improved support for Greek language: 1. Transliterate queries with Latin characters into their Greek equivalents. Will for example transform query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
2. Accept Greek queries containing vowels without accents. Query input of letter α will be valid also for ἀ, ἁ, ἂ, ἃ, ἄ, ἅ, ἆ, ἇ, ὰ, ά, ά, ᾀ, ᾁ, ᾂ, ᾃ, ᾄ, ᾅ, ᾆ, ᾇ, ᾰ, ᾱ, ᾲ, ᾳ, ᾴ, ᾶ and ᾷ The same behavior for all other Greek vowels, as well as for the upper case vowels. Both options will create a tolerant result listing. New options in . . .
. . .
case vowels. Both options will create a tolerant result listing. New options in 'Add site' and 'Edit site' menus: - Enter URL of individual Sitemap If Sitemap is not in root folder, the URL of the individual Sitemap could be entered. New option to manipulate the result listing: For result sorting 'By URL names' the number of results shown per . . .
. . .
names' the number of results shown per domain is selectable. Offers result presentation similar to 'Like Google', but additionally offers a selectable count of links. Search option 'More results from this domain' not only enabled for result sorting 'Like Google', but also for 'By URL names'. Bug fixed that prevented correct interpretation of . . .
. . .
correct interpretation of http 301 redirects. Bug fixed that causes invalid results for multiple word queries, which contain numbers, like 'price 25 euro'. Bug fixed that prevented highlighting of keywords, if found in position 0 of title or full text. Bug fixed that prevented suppressing of 'Show result scores' in Admin settings. Bug fixed . . .
. . .
Italian language file. Thanks to Giorgio Nanni. Involved files that have been modified / added for this release: /search.php /admin/ all files /converter/ods_reader.php /converter/odt_reader.php /converter/dictionaries/jp_shiftJIS.dic /converter/OpenDocumentSheet/ all files /include/commonfuncs.php /include/searchfuncs.php . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.4 Release date: July 03, 2010 Build up with Sphider: v.1.3.5 New feature: In order to reduce the time for indexing, multithreaded indexing was implemented. As part of the Admin settings, 1-10 threads are to be activated. Available also for command . . .
. . .
also for command line operation without limitation of the thread counts. For details see the documentation chapter: Multithreaded indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. . . .
. . .
hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin settings. Implemented as PHP script, the converter needs no adoption to the Operating System. New feature: In order to index ‘OpenDocument files, . . .
. . .
activated separately for spreadsheet files (.ods) and text files (.odt) in Admin settings. New command line options: - Erase content of database <;-erase>; - Set ‘Last indexed’ date and time to 0000 <;-preall>; Improved support for Japanese coded sites (charset: SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like . . .
. . .
(charset: SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like Google'. Improved search algorithm: significantly reduced search time. Improved support for Greek language: 1. Transliterate queries with Latin characters into their Greek equivalents. Will for example transform query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
2. Accept Greek queries containing vowels without accents. Query input of letter α will be valid also for ἀ, ἁ, ἂ, ἃ, ἄ, ἅ, ἆ, ἇ, ὰ, ά, ά, ᾀ, ᾁ, ᾂ, ᾃ, ᾄ, ᾅ, ᾆ, ᾇ, ᾰ, ᾱ, ᾲ, ᾳ, ᾴ, ᾶ and ᾷ The same behavior for all other Greek vowels, as well as for the upper case vowels. Both options will create a tolerant result listing. New options in . . .
. . .
case vowels. Both options will create a tolerant result listing. New options in 'Add site' and 'Edit site' menus: - Enter URL of individual Sitemap If Sitemap is not in root folder, the URL of the individual Sitemap could be entered. New option to manipulate the result listing: For result sorting 'By URL names' the number of results shown per . . .
. . .
names' the number of results shown per domain is selectable. Offers result presentation similar to 'Like Google', but additionally offers a selectable count of links. Search option 'More results from this domain' not only enabled for result sorting 'Like Google', but also for 'By URL names'. Bug fixed that prevented correct interpretation of . . .
. . .
correct interpretation of http 301 redirects. Bug fixed that causes invalid results for multiple word queries, which contain numbers, like 'price 25 euro'. Bug fixed that prevented highlighting of keywords, if found in position 0 of title or full text. Bug fixed that prevented suppressing of 'Show result scores' in Admin settings. Bug fixed . . .
. . .
Italian language file. Thanks to Giorgio Nanni. Involved files that have been modified / added for this release: /search.php /admin/ all files /converter/ods_reader.php /converter/odt_reader.php /converter/dictionaries/jp_shiftJIS.dic /converter/OpenDocumentSheet/ all files /include/commonfuncs.php /include/searchfuncs.php . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.4 Release date July 03, 2010 Build up with Sphider v.1.3.5 New feature In order to reduce the time for indexing, multithreaded indexing was implemented. As part of the Admin settings, 1-10 threads are to be activated. Available also for command . . .
. . .
also for command line operation without limitation of the thread counts. For details see the documentation chapter Multithreaded indexing New feature Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. . . .
. . .
hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset Shift_JIS, EUC-JP and UTF-8 New feature Index CVS files. To be activated in Admin settings. Implemented as PHP script, the converter needs no adoption to the Operating System. New feature In order to index ‘OpenDocument files, . . .
. . .
activated separately for spreadsheet files (.ods) and text files (.odt) in Admin settings. New command line options - Erase content of database <;-erase>; - Set ‘Last indexed’ date and time to 0000 <;-preall>; Improved support for Japanese coded sites (charset SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like . . .
. . .
(charset SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like Google'. Improved search algorithm significantly reduced search time. Improved support for Greek language 1. Transliterate queries with Latin characters into their Greek equivalents. Will for example transform query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
2. Accept Greek queries containing vowels without accents. Query input of letter α will be valid also for ἀ, ἁ, ἂ, ἃ, ἄ, ἅ, ἆ, ἇ, ὰ, ά, ά, ᾀ, ᾁ, ᾂ, ᾃ, ᾄ, ᾅ, ᾆ, ᾇ, ᾰ, ᾱ, ᾲ, ᾳ, ᾴ, ᾶ and ᾷ The same behavior for all other Greek vowels, as well as for the upper case vowels. Both options will create a tolerant result listing. New options in . . .
. . .
case vowels. Both options will create a tolerant result listing. New options in 'Add site' and 'Edit site' menus - Enter URL of individual Sitemap If Sitemap is not in root folder, the URL of the individual Sitemap could be entered. New option to manipulate the result listing For result sorting 'By URL names' the number of results shown per . . .
. . .
names' the number of results shown per domain is selectable. Offers result presentation similar to 'Like Google', but additionally offers a selectable count of links. Search option 'More results from this domain' not only enabled for result sorting 'Like Google', but also for 'By URL names'. Bug fixed that prevented correct interpretation of . . .
. . .
correct interpretation of http 301 redirects. Bug fixed that causes invalid results for multiple word queries, which contain numbers, like 'price 25 euro'. Bug fixed that prevented highlighting of keywords, if found in position 0 of title or full text. Bug fixed that prevented suppressing of 'Show result scores' in Admin settings. Bug fixed . . .
. . .
Italian language file. Thanks to Giorgio Nanni. Involved files that have been modified / added for this release /search.php /admin/ all files /converter/ods_reader.php /converter/odt_reader.php /converter/dictionaries/jp_shiftJIS.dic /converter/OpenDocumentSheet/ all files /include/commonfuncs.php /include/searchfuncs.php . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.4 Release date July 03, 2010 Build up with Sphider v.1.3.5 New feature In order to reduce the time for indexing, multithreaded indexing was implemented. As part of the Admin settings, 1-10 threads are to be activated. Available also for command . . .
. . .
also for command line operation without limitation of the thread counts. For details see the documentation chapter Multithreaded indexing New feature Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. . . .
. . .
hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset Shift_JIS, EUC-JP and UTF-8 New feature Index CVS files. To be activated in Admin settings. Implemented as PHP script, the converter needs no adoption to the Operating System. New feature In order to index ‘OpenDocument files, . . .
. . .
activated separately for spreadsheet files (.ods) and text files (.odt) in Admin settings. New command line options - Erase content of database <;-erase>; - Set ‘Last indexed’ date and time to 0000 <;-preall>; Improved support for Japanese coded sites (charset SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like . . .
. . .
(charset SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like Google'. Improved search algorithm significantly reduced search time. Improved support for Greek language 1. Transliterate queries with Latin characters into their Greek equivalents. Will for example transform query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
2. Accept Greek queries containing vowels without accents. Query input of letter α will be valid also for ἀ, ἁ, ἂ, ἃ, ἄ, ἅ, ἆ, ἇ, ὰ, ά, ά, ᾀ, ᾁ, ᾂ, ᾃ, ᾄ, ᾅ, ᾆ, ᾇ, ᾰ, ᾱ, ᾲ, ᾳ, ᾴ, ᾶ and ᾷ The same behavior for all other Greek vowels, as well as for the upper case vowels. Both options will create a tolerant result listing. New options in . . .
. . .
case vowels. Both options will create a tolerant result listing. New options in 'Add site' and 'Edit site' menus - Enter URL of individual Sitemap If Sitemap is not in root folder, the URL of the individual Sitemap could be entered. New option to manipulate the result listing For result sorting 'By URL names' the number of results shown per . . .
. . .
names' the number of results shown per domain is selectable. Offers result presentation similar to 'Like Google', but additionally offers a selectable count of links. Search option 'More results from this domain' not only enabled for result sorting 'Like Google', but also for 'By URL names'. Bug fixed that prevented correct interpretation of . . .
. . .
correct interpretation of http 301 redirects. Bug fixed that causes invalid results for multiple word queries, which contain numbers, like 'price 25 euro'. Bug fixed that prevented highlighting of keywords, if found in position 0 of title or full text. Bug fixed that prevented suppressing of 'Show result scores' in Admin settings. Bug fixed . . .
. . .
Italian language file. Thanks to Giorgio Nanni. Involved files that have been modified / added for this release /search.php /admin/ all files /converter/ods_reader.php /converter/odt_reader.php /converter/dictionaries/jp_shiftJIS.dic /converter/OpenDocumentSheet/ all files /include/commonfuncs.php /include/searchfuncs.php . . .

Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version v.2.4 Release date: July 03, 2010 Build up with Sphider: v135 New feature: In order to reduce the time for indexing, multithreaded indexing was implemented. As part of the Admin settings, 1-10 threads are to be activated. Available also for command . . .
. . .
also for command line operation without limitation of the thread counts. For details see the documentation chapter: Multithreaded indexing New feature: Segmentation of Japanese phrases. To be activated in Admin settings. Segmenting 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. . . .
. . .
hiragana, katakana and jinmeiyo Japanese character writing systems. Available for Japanese sites with charset: Shift_JIS, EUC-JP and UTF-8 New feature: Index CVS files. To be activated in Admin settings. Implemented as PHP script, the converter needs no adoption to the Operating System. New feature: In order to index ‘OpenDocument files, . . .
. . .
activated separately for spreadsheet files (.ods) and text files (.odt) in Admin settings. New command line options: - Erase content of database <;-erase>; - Set ‘Last indexed’ date and time to 0000 <;-preall>; Improved support for Japanese coded sites (charset: SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like . . .
. . .
(charset: SHIFT_JIS, EUC-JP and UTF-8). Template design 'Pure' reduced to 'like Google'. Improved search algorithm: significantly reduced search time. Improved support for Greek language: 1. Transliterate queries with Latin characters into their Greek equivalents. Will for example transform query input alla to find ἀλλὰ and baptismatos to find . . .
. . .
2. Accept Greek queries containing vowels without accents. Query input of letter α will be valid also for ἀ, ἁ, ἂ, ἃ, ἄ, ἅ, ἆ, ἇ, ὰ, ά, ά, ᾀ, ᾁ, ᾂ, ᾃ, ᾄ, ᾅ, ᾆ, ᾇ, ᾰ, ᾱ, ᾲ, ᾳ, ᾴ, ᾶ and ᾷ The same behavior for all other Greek vowels, as well as for the upper case vowels. Both options will create a tolerant result listing. New options in . . .
. . .
case vowels. Both options will create a tolerant result listing. New options in 'Add site' and 'Edit site' menus: - Enter URL of individual Sitemap If Sitemap is not in root folder, the URL of the individual Sitemap could be entered. New option to manipulate the result listing: For result sorting 'By URL names' the number of results shown per . . .
. . .
names' the number of results shown per domain is selectable. Offers result presentation similar to 'Like Google', but additionally offers a selectable count of links. Search option 'More results from this domain' not only enabled for result sorting 'Like Google', but also for 'By URL names'. Bug fixed that prevented correct interpretation of . . .
. . .
correct interpretation of http 301 redirects. Bug fixed that causes invalid results for multiple word queries, which contain numbers, like 'price 25 euro'. Bug fixed that prevented highlighting of keywords, if found in position 0 of title or full text. Bug fixed that prevented suppressing of 'Show result scores' in Admin settings. Bug fixed . . .
. . .
Italian language file. Thanks to Giorgio Nanni. Involved files that have been modified / added for this release: /search.php /admin/ all files /converter/ods_reader.php /converter/odt_reader.php /converter/dictionaries/jp_shiftJIS.dic /converter/OpenDocumentSheet/ all files /include/commonfuncs.php /include/searchfuncs.php . . .

17.   Sphider-plus - The PHP Search Engine Visit in a new window

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.5 Release date: November 30 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships . . .
. . .
items have been added / modified: New feature: Bound database. This option will delete all keyword relationships exceeding a definable amount of query results. Beside the result cache this option will significantly speed up the search procedure for huge databases. For details see chapter: Bound database New feature: In order to get indexed . . .
. . .
user suggested sites optionally need a meta tag in header. Defined by the Sphider-plus admin during approval the tag values could be used to verify the ownership of the suggested URLs offer commercial dependencies or perform a membership verification. For details see chapter: User suggested sites New feature: Intrusion Detection System . . .
. . .
(IDS) included to protect Sphider-plus against hacking attempts. It includes extensive regex rules to tags like XSS SQLI RCE LFI DT CSRF LDAP Injections and DoS. Admin selectable the IDS will block further user input create a log-file present a warning message or even block any traffic of IP’s known to be evil. For details see chapter: . . .
. . .
Intrusion Detection System (IDS) New feature Index only links and their link text. If activated in Admin settings full text and media content will not be indexed but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags: title="this text" alternatively alt="this text". Result listing . . .
. . .
listing presents the (active) links with respect to the page at which they were found. If searching for a link text the different search modes are available. New feature in Admin settings: Add new domains found during index procedure to 'Approve Sites' table. To be activated in section ‘General Settings’ this option is available for those . . .
. . .
to 'Approve Sites' table. To be activated in section ‘General Settings’ this option is available for those sites having activated ‘Spider can leave domain’ in their options. New feature in Admin sites menu: Index all the suspended. Will continue the index procedure for all the sites that are marked as 'Unfinished'. New feature: Index media . . .
. . .
New feature: Index media content with respect to frame/iframe position. To be activated in Admin settings this option allows to index media links which are addressed as links relative to the frame/iframe position (folder). Improved URL import/export function: Now all options of each site will be stored in backup file and re-imported. . . .
. . .
log during index / re-index and also for all erase functions. Will reset all 'Search' statistics in Admin backend as well as the 'Most popular search' table at the bottom of result listing New feature in Admin backend: When opening the Admin interface a warning message will be presented about new suggested sites waiting for approval. Working . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.135 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, exceeding a definable amount of query results. Beside the result cache, this option will significantly speed up the search procedure for huge databases. For details see chapter: Bound database New feature: In order to get indexed, . . .
. . .
user suggested sites optionally need a meta tag in header. Defined by the Sphider-plus admin during approval, the tag values could be used to verify the ownership of the suggested URLs, offer commercial dependencies, or perform a membership verification. For details see chapter: User suggested sites New feature: Intrusion Detection System . . .
. . .
(IDS) included to protect Sphider-plus against hacking attempts. It includes extensive regex rules to tags like XSS, SQLI, RCE, LFI, DT, CSRF, LDAP Injections, and DoS. Admin selectable, the IDS will block further user input, create a log-file, present a warning message, or even block any traffic of IP’s known to be evil. For details see chapter: . . .
. . .
Intrusion Detection System (IDS) New feature Index only links and their link text. If activated in Admin settings, full text and media content will not be indexed, but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags: title="this text", alternatively alt="this text". Result listing . . .
. . .
listing presents the (active) links with respect to the page at which they were found. If searching for a link text, the different search modes are available. New feature in Admin settings: Add new domains found during index procedure to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those . . .
. . .
to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those sites, having activated ‘Spider can leave domain’ in their options. New feature in Admin sites menu: Index all the suspended. Will continue the index procedure for all the sites that are marked as 'Unfinished'. New feature: Index media . . .
. . .
New feature: Index media content with respect to frame/iframe position. To be activated in Admin settings, this option allows to index media links, which are addressed as links relative to the frame/iframe position (folder). Improved URL import/export function: Now all options of each site will be stored in backup file and re-imported. . . .
. . .
log during index / re-index and also for all erase functions. Will reset all 'Search' statistics in Admin backend, as well as the 'Most popular search' table at the bottom of result listing New feature in Admin backend: When opening the Admin interface, a warning message will be presented about new suggested sites, waiting for approval. Working . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 25 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, exceeding a definable amount of query results. Beside the result cache, this option will significantly speed up the search procedure for huge databases. For details see chapter: Bound database New feature: In order to get indexed, . . .
. . .
user suggested sites optionally need a meta tag in header. Defined by the Sphider-plus admin during approval, the tag values could be used to verify the ownership of the suggested URLs, offer commercial dependencies, or perform a membership verification. For details see chapter: User suggested sites New feature: Intrusion Detection System . . .
. . .
(IDS) included to protect Sphider-plus against hacking attempts. It includes extensive regex rules to tags like XSS, SQLI, RCE, LFI, DT, CSRF, LDAP Injections, and DoS. Admin selectable, the IDS will block further user input, create a log-file, present a warning message, or even block any traffic of IP’s known to be evil. For details see chapter: . . .
. . .
Intrusion Detection System (IDS) New feature Index only links and their link text. If activated in Admin settings, full text and media content will not be indexed, but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags: title="this text", alternatively alt="this text". Result listing . . .
. . .
listing presents the (active) links with respect to the page at which they were found. If searching for a link text, the different search modes are available. New feature in Admin settings: Add new domains found during index procedure to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those . . .
. . .
to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those sites, having activated ‘Spider can leave domain’ in their options. New feature in Admin sites menu: Index all the suspended. Will continue the index procedure for all the sites that are marked as 'Unfinished'. New feature: Index media . . .
. . .
New feature: Index media content with respect to frame/iframe position. To be activated in Admin settings, this option allows to index media links, which are addressed as links relative to the frame/iframe position (folder). Improved URL import/export function: Now all options of each site will be stored in backup file and re-imported. . . .
. . .
log during index / re-index and also for all erase functions. Will reset all 'Search' statistics in Admin backend, as well as the 'Most popular search' table at the bottom of result listing New feature in Admin backend: When opening the Admin interface, a warning message will be presented about new suggested sites, waiting for approval. Working . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version 2.5 Release date November 30, 2010 Build up with Sphider v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified New feature Bound database. This option will delete all keyword relationships, . . .
. . .
items have been added / modified New feature Bound database. This option will delete all keyword relationships, exceeding a definable amount of query results. Beside the result cache, this option will significantly speed up the search procedure for huge databases. For details see chapter Bound database New feature In order to get indexed, . . .
. . .
user suggested sites optionally need a meta tag in header. Defined by the Sphider-plus admin during approval, the tag values could be used to verify the ownership of the suggested URLs, offer commercial dependencies, or perform a membership verification. For details see chapter User suggested sites New feature Intrusion Detection System . . .
. . .
(IDS) included to protect Sphider-plus against hacking attempts. It includes extensive regex rules to tags like XSS, SQLI, RCE, LFI, DT, CSRF, LDAP Injections, and DoS. Admin selectable, the IDS will block further user input, create a log-file, present a warning message, or even block any traffic of IP’s known to be evil. For details see chapter . . .
. . .
Intrusion Detection System (IDS) New feature Index only links and their link text. If activated in Admin settings, full text and media content will not be indexed, but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags title="this text", alternatively alt="this text". Result listing . . .
. . .
listing presents the (active) links with respect to the page at which they were found. If searching for a link text, the different search modes are available. New feature in Admin settings Add new domains found during index procedure to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those . . .
. . .
to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those sites, having activated ‘Spider can leave domain’ in their options. New feature in Admin sites menu Index all the suspended. Will continue the index procedure for all the sites that are marked as 'Unfinished'. New feature Index media . . .
. . .
New feature Index media content with respect to frame/iframe position. To be activated in Admin settings, this option allows to index media links, which are addressed as links relative to the frame/iframe position (folder). Improved URL import/export function Now all options of each site will be stored in backup file and re-imported. . . .
. . .
log during index / re-index and also for all erase functions. Will reset all 'Search' statistics in Admin backend, as well as the 'Most popular search' table at the bottom of result listing New feature in Admin backend When opening the Admin interface, a warning message will be presented about new suggested sites, waiting for approval. Working . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version 2.5 Release date November 30, 2010 Build up with Sphider v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified New feature Bound database. This option will delete all keyword relationships, . . .
. . .
items have been added / modified New feature Bound database. This option will delete all keyword relationships, exceeding a definable amount of query results. Beside the result cache, this option will significantly speed up the search procedure for huge databases. For details see chapter Bound database New feature In order to get indexed, . . .
. . .
user suggested sites optionally need a meta tag in header. Defined by the Sphider-plus admin during approval, the tag values could be used to verify the ownership of the suggested URLs, offer commercial dependencies, or perform a membership verification. For details see chapter User suggested sites New feature Intrusion Detection System . . .
. . .
(IDS) included to protect Sphider-plus against hacking attempts. It includes extensive regex rules to tags like XSS, SQLI, RCE, LFI, DT, CSRF, LDAP Injections, and DoS. Admin selectable, the IDS will block further user input, create a log-file, present a warning message, or even block any traffic of IP’s known to be evil. For details see chapter . . .
. . .
Intrusion Detection System (IDS) New feature Index only links and their link text. If activated in Admin settings, full text and media content will not be indexed, but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags title="this text", alternatively alt="this text". Result listing . . .
. . .
listing presents the (active) links with respect to the page at which they were found. If searching for a link text, the different search modes are available. New feature in Admin settings Add new domains found during index procedure to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those . . .
. . .
to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those sites, having activated ‘Spider can leave domain’ in their options. New feature in Admin sites menu Index all the suspended. Will continue the index procedure for all the sites that are marked as 'Unfinished'. New feature Index media . . .
. . .
New feature Index media content with respect to frame/iframe position. To be activated in Admin settings, this option allows to index media links, which are addressed as links relative to the frame/iframe position (folder). Improved URL import/export function Now all options of each site will be stored in backup file and re-imported. . . .
. . .
log during index / re-index and also for all erase functions. Will reset all 'Search' statistics in Admin backend, as well as the 'Most popular search' table at the bottom of result listing New feature in Admin backend When opening the Admin interface, a warning message will be presented about new suggested sites, waiting for approval. Working . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, exceeding a definable amount of query results. Beside the result cache, this option will significantly speed up the search procedure for huge databases. For details see chapter: Bound database New feature: In order to get indexed, . . .
. . .
user suggested sites optionally need a meta tag in header. Defined by the Sphider-plus admin during approval, the tag values could be used to verify the ownership of the suggested URLs, offer commercial dependencies, or perform a membership verification. For details see chapter: User suggested sites New feature: Intrusion Detection System . . .
. . .
(IDS) included to protect Sphider-plus against hacking attempts. It includes extensive regex rules to tags like XSS, SQLI, RCE, LFI, DT, CSRF, LDAP Injections, and DoS. Admin selectable, the IDS will block further user input, create a log-file, present a warning message, or even block any traffic of IP’s known to be evil. For details see chapter: . . .
. . .
Intrusion Detection System (IDS) New feature Index only links and their link text. If activated in Admin settings, full text and media content will not be indexed, but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags: title="this text", alternatively alt="this text". Result listing . . .
. . .
listing presents the (active) links with respect to the page at which they were found. If searching for a link text, the different search modes are available. New feature in Admin settings: Add new domains found during index procedure to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those . . .
. . .
to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those sites, having activated ‘Spider can leave domain’ in their options. New feature in Admin sites menu: Index all the suspended. Will continue the index procedure for all the sites that are marked as 'Unfinished'. New feature: Index media . . .
. . .
New feature: Index media content with respect to frame/iframe position. To be activated in Admin settings, this option allows to index media links, which are addressed as links relative to the frame/iframe position (folder). Improved URL import/export function: Now all options of each site will be stored in backup file and re-imported. . . .
. . .
log during index / re-index and also for all erase functions. Will reset all 'Search' statistics in Admin backend, as well as the 'Most popular search' table at the bottom of result listing New feature in Admin backend: When opening the Admin interface, a warning message will be presented about new suggested sites, waiting for approval. Working . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050result-headerhtml_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, exceeding a definable amount of query results. Beside the result cache, this option will significantly speed up the search procedure for huge databases. For details see chapter: Bound database New feature: In order to get indexed, . . .
. . .
user suggested sites optionally need a meta tag in header. Defined by the Sphider-plus admin during approval, the tag values could be used to verify the ownership of the suggested URLs, offer commercial dependencies, or perform a membership verification. For details see chapter: User suggested sites New feature: Intrusion Detection System . . .
. . .
(IDS) included to protect Sphider-plus against hacking attempts. It includes extensive regex rules to tags like XSS, SQLI, RCE, LFI, DT, CSRF, LDAP Injections, and DoS. Admin selectable, the IDS will block further user input, create a log-file, present a warning message, or even block any traffic of IP’s known to be evil. For details see chapter: . . .
. . .
Intrusion Detection System (IDS) New feature Index only links and their link text. If activated in Admin settings, full text and media content will not be indexed, but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags: title="this text", alternatively alt="this text". Result listing . . .
. . .
listing presents the (active) links with respect to the page at which they were found. If searching for a link text, the different search modes are available. New feature in Admin settings: Add new domains found during index procedure to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those . . .
. . .
to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those sites, having activated ‘Spider can leave domain’ in their options. New feature in Admin sites menu: Index all the suspended. Will continue the index procedure for all the sites that are marked as 'Unfinished'. New feature: Index media . . .
. . .
New feature: Index media content with respect to frame/iframe position. To be activated in Admin settings, this option allows to index media links, which are addressed as links relative to the frame/iframe position (folder). Improved URL import/export function: Now all options of each site will be stored in backup file and re-imported. . . .
. . .
log during index / re-index and also for all erase functions. Will reset all 'Search' statistics in Admin backend, as well as the 'Most popular search' table at the bottom of result listing New feature in Admin backend: When opening the Admin interface, a warning message will be presented about new suggested sites, waiting for approval. Working . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050result-headerhtml-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, exceeding a definable amount of query results. Beside the result cache, this option will significantly speed up the search procedure for huge databases. For details see chapter: Bound database New feature: In order to get indexed, . . .
. . .
user suggested sites optionally need a meta tag in header. Defined by the Sphider-plus admin during approval, the tag values could be used to verify the ownership of the suggested URLs, offer commercial dependencies, or perform a membership verification. For details see chapter: User suggested sites New feature: Intrusion Detection System . . .
. . .
(IDS) included to protect Sphider-plus against hacking attempts. It includes extensive regex rules to tags like XSS, SQLI, RCE, LFI, DT, CSRF, LDAP Injections, and DoS. Admin selectable, the IDS will block further user input, create a log-file, present a warning message, or even block any traffic of IP’s known to be evil. For details see chapter: . . .
. . .
Intrusion Detection System (IDS) New feature Index only links and their link text. If activated in Admin settings, full text and media content will not be indexed, but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags: title="this text", alternatively alt="this text". Result listing . . .
. . .
listing presents the (active) links with respect to the page at which they were found. If searching for a link text, the different search modes are available. New feature in Admin settings: Add new domains found during index procedure to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those . . .
. . .
to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those sites, having activated ‘Spider can leave domain’ in their options. New feature in Admin sites menu: Index all the suspended. Will continue the index procedure for all the sites that are marked as 'Unfinished'. New feature: Index media . . .
. . .
New feature: Index media content with respect to frame/iframe position. To be activated in Admin settings, this option allows to index media links, which are addressed as links relative to the frame/iframe position (folder). Improved URL import/export function: Now all options of each site will be stored in backup file and re-imported. . . .
. . .
log during index / re-index and also for all erase functions. Will reset all 'Search' statistics in Admin backend, as well as the 'Most popular search' table at the bottom of result listing New feature in Admin backend: When opening the Admin interface, a warning message will be presented about new suggested sites, waiting for approval. Working . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050result-headerhtml /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.5 Release date: November 30, 2010 Build up with Sphider: v135 In front of Sphider-plus version 2.4 the following items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, . . .
. . .
items have been added / modified: New feature: Bound database. This option will delete all keyword relationships, exceeding a definable amount of query results. Beside the result cache, this option will significantly speed up the search procedure for huge databases. For details see chapter: Bound database New feature: In order to get indexed, . . .
. . .
user suggested sites optionally need a meta tag in header. Defined by the Sphider-plus admin during approval, the tag values could be used to verify the ownership of the suggested URLs, offer commercial dependencies, or perform a membership verification. For details see chapter: User suggested sites New feature: Intrusion Detection System . . .
. . .
(IDS) included to protect Sphider-plus against hacking attempts. It includes extensive regex rules to tags like XSS, SQLI, RCE, LFI, DT, CSRF, LDAP Injections, and DoS. Admin selectable, the IDS will block further user input, create a log-file, present a warning message, or even block any traffic of IP’s known to be evil. For details see chapter: . . .
. . .
Intrusion Detection System (IDS) New feature Index only links and their link text. If activated in Admin settings, full text and media content will not be indexed, but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags: title="this text", alternatively alt="this text". Result listing . . .
. . .
listing presents the (active) links with respect to the page at which they were found. If searching for a link text, the different search modes are available. New feature in Admin settings: Add new domains found during index procedure to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those . . .
. . .
to 'Approve Sites' table. To be activated in section ‘General Settings’, this option is available for those sites, having activated ‘Spider can leave domain’ in their options. New feature in Admin sites menu: Index all the suspended. Will continue the index procedure for all the sites that are marked as 'Unfinished'. New feature: Index media . . .
. . .
New feature: Index media content with respect to frame/iframe position. To be activated in Admin settings, this option allows to index media links, which are addressed as links relative to the frame/iframe position (folder). Improved URL import/export function: Now all options of each site will be stored in backup file and re-imported. . . .
. . .
log during index / re-index and also for all erase functions. Will reset all 'Search' statistics in Admin backend, as well as the 'Most popular search' table at the bottom of result listing New feature in Admin backend: When opening the Admin interface, a warning message will be presented about new suggested sites, waiting for approval. Working . . .
. . .
/templates/html/021_html_search-form.html /templates/html/022_html_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/080_most_pop.html /templates/html/081_3D_tag_cloud.html /templates/html/100_all-media . . .

18.   Sphider-plus - The PHP Search Engine Visit in a new window

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.6b Release date: March 25 2011 Build up with Sphider: v.1.3.5 Debugged version of v.2.6 Build up with Sphider: v.1.3.5 In front of version 2.6 the following modifications have been added: New Admin setting: Protect the /admin/ . . .
. . .
have been added: New Admin setting: Protect the /admin/ folder by means of a .htaccess file. If activated and if the .htaccess file is not yet available the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox) the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox) the .htaccess file will be deleted by the script so that afterwards the admin folder is freed again for IP independent access. New feature: Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature: The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https:// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release: /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version: 2.6 Release date: March 08 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script the results will be presented as XML file in /xml/ For details see the documentation chapter: XML result output New feature: Index only parts of a page by <;div id='abc'>; If enabled in Admin settings the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to: Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>; New feature: . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend: 'Search' functions are available now in order to query for: - sites - links - keywords - categories New Admin setting: Define number of sites shown per page in Admin backend (pagination 10 20 30 50 100). Used for: - Sites view . . .
. . .
30 50 100). Used for: - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings: The table in Admin backend 'Sites' view could be sorted: - by index-date latest on top - by index-date oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature: Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10 20 30 50 or 100 sites per page it is possible to re-index only the URLs presented on page 1 and later on those of page 2 etc. New Admin setting: Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option so that individual sites could be influenced. If activated the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting: Do not index the full text. If activated only the page 'Title' the 'Keywords' Meta tag as well as the 'Description' Meta tag will be indexed. Never the less links found in full text will be followed. New feature: Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.6b Release date: March 25, 2011 Build up with Sphider: v.135 Debugged version of v.2.6 Build up with Sphider: v.135 In front of version 2.6 the following modifications have been added: New Admin setting: Protect the /admin/ . . .
. . .
have been added: New Admin setting: Protect the /admin/ folder by means of a .htaccess file. If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access. New feature: Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature: The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https:// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release: /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.135 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script, the results will be presented as XML file in /xml/ For details see the documentation chapter: XML result output New feature: Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to: Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>; New feature: . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend: 'Search' functions are available now in order to query for: - sites - links - keywords - categories New Admin setting: Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100). Used for: - Sites view . . .
. . .
30, 50, 100). Used for: - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings: The table in Admin backend 'Sites' view could be sorted: - by index-date, latest on top - by index-date, oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature: Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible to re-index only the URLs presented on page 1, and later on those of page 2 etc. New Admin setting: Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting: Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag, as well as the 'Description' Meta tag will be indexed. Never the less, links found in full text will be followed. New feature: Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.6b Release date: March 25, 2011 Build up with Sphider: v.1.3.5 Debugged version of v.2.6 Build up with Sphider: v.1.3.5 In front of version 2.6 the following modifications have been added: New Admin setting: Protect the /admin/ . . .
. . .
have been added: New Admin setting: Protect the /admin/ folder by means of a .htaccess file. If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access. New feature: Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature: The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https:// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release: /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 25 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script, the results will be presented as XML file in /xml/ For details see the documentation chapter: XML result output New feature: Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to: Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>; New feature: . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend: 'Search' functions are available now in order to query for: - sites - links - keywords - categories New Admin setting: Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100). Used for: - Sites view . . .
. . .
30, 50, 100). Used for: - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings: The table in Admin backend 'Sites' view could be sorted: - by index-date, latest on top - by index-date, oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature: Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible to re-index only the URLs presented on page 1, and later on those of page 2 etc. New Admin setting: Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting: Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag, as well as the 'Description' Meta tag will be indexed. Never the less, links found in full text will be followed. New feature: Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.6b Release date: March 25, 2011 Build up with Sphider: v.1.3.5 Debugged version of v.2.6 Build up with Sphider: v.1.3.5 In front of version 2.6 the following modifications have been added: New Admin setting: Protect the /admin/ . . .
. . .
have been added: New Admin setting: Protect the /admin/ folder by means of a .htaccess file. If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access. New feature: Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature: The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https:// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release: /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script, the results will be presented as XML file in /xml/ For details see the documentation chapter: XML result output New feature: Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to: Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>; New feature: . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend: 'Search' functions are available now in order to query for: - sites - links - keywords - categories New Admin setting: Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100). Used for: - Sites view . . .
. . .
30, 50, 100). Used for: - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings: The table in Admin backend 'Sites' view could be sorted: - by index-date, latest on top - by index-date, oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature: Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible to re-index only the URLs presented on page 1, and later on those of page 2 etc. New Admin setting: Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting: Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag, as well as the 'Description' Meta tag will be indexed. Never the less, links found in full text will be followed. New feature: Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.6b Release date: March 25, 2011 Build up with Sphider: v.1.3.5 Debugged version of v.2.6 Build up with Sphider: v.1.3.5 In front of version 2.6 the following modifications have been added: New Admin setting: Protect the /admin/ . . .
. . .
have been added: New Admin setting: Protect the /admin/ folder by means of a .htaccess file. If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access. New feature: Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature: The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https:// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release: /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script, the results will be presented as XML file in /xml/ For details see the documentation chapter: XML result output New feature: Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to: Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>; New feature: . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend: 'Search' functions are available now in order to query for: - sites - links - keywords - categories New Admin setting: Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100). Used for: - Sites view . . .
. . .
30, 50, 100). Used for: - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings: The table in Admin backend 'Sites' view could be sorted: - by index-date, latest on top - by index-date, oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature: Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible to re-index only the URLs presented on page 1, and later on those of page 2 etc. New Admin setting: Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting: Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag, as well as the 'Description' Meta tag will be indexed. Never the less, links found in full text will be followed. New feature: Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version 2.6b Release date March 25, 2011 Build up with Sphider v.1.3.5 Debugged version of v.2.6 Build up with Sphider v.1.3.5 In front of version 2.6 the following modifications have been added New Admin setting Protect the /admin/ . . .
. . .
have been added New Admin setting Protect the /admin/ folder by means of a .htaccess file. If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access. New feature Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version 2.6 Release date March 08, 2011 Build up with Sphider v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified New feature Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script, the results will be presented as XML file in /xml/ For details see the documentation chapter XML result output New feature Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter Indexing only parts of a page by <;div id='abc'>; New feature . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend 'Search' functions are available now in order to query for - sites - links - keywords - categories New Admin setting Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100). Used for - Sites view . . .
. . .
30, 50, 100). Used for - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings The table in Admin backend 'Sites' view could be sorted - by index-date, latest on top - by index-date, oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible to re-index only the URLs presented on page 1, and later on those of page 2 etc. New Admin setting Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag, as well as the 'Description' Meta tag will be indexed. Never the less, links found in full text will be followed. New feature Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version 2.6b Release date March 25, 2011 Build up with Sphider v.1.3.5 Debugged version of v.2.6 Build up with Sphider v.1.3.5 In front of version 2.6 the following modifications have been added New Admin setting Protect the /admin/ . . .
. . .
have been added New Admin setting Protect the /admin/ folder by means of a .htaccess file. If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access. New feature Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version 2.6 Release date March 08, 2011 Build up with Sphider v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified New feature Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script, the results will be presented as XML file in /xml/ For details see the documentation chapter XML result output New feature Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter Indexing only parts of a page by <;div id='abc'>; New feature . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend 'Search' functions are available now in order to query for - sites - links - keywords - categories New Admin setting Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100). Used for - Sites view . . .
. . .
30, 50, 100). Used for - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings The table in Admin backend 'Sites' view could be sorted - by index-date, latest on top - by index-date, oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible to re-index only the URLs presented on page 1, and later on those of page 2 etc. New Admin setting Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag, as well as the 'Description' Meta tag will be indexed. Never the less, links found in full text will be followed. New feature Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.6b Release date: March 25, 2011 Build up with Sphider: v.1.3.5 Debugged version of v.2.6 Build up with Sphider: v.1.3.5 In front of version 2.6 the following modifications have been added: New Admin setting: Protect the /admin/ . . .
. . .
have been added: New Admin setting: Protect the /admin/ folder by means of a .htaccess file. If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access. New feature: Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature: The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https:// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release: /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script, the results will be presented as XML file in /xml/ For details see the documentation chapter: XML result output New feature: Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to: Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>; New feature: . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend: 'Search' functions are available now in order to query for: - sites - links - keywords - categories New Admin setting: Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100). Used for: - Sites view . . .
. . .
30, 50, 100). Used for: - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings: The table in Admin backend 'Sites' view could be sorted: - by index-date, latest on top - by index-date, oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature: Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible to re-index only the URLs presented on page 1, and later on those of page 2 etc. New Admin setting: Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting: Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag, as well as the 'Description' Meta tag will be indexed. Never the less, links found in full text will be followed. New feature: Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.6b Release date: March 25, 2011 Build up with Sphider: v.1.3.5 Debugged version of v.2.6 Build up with Sphider: v.1.3.5 In front of version 2.6 the following modifications have been added: New Admin setting: Protect the /admin/ . . .
. . .
have been added: New Admin setting: Protect the /admin/ folder by means of a .htaccess file. If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access. New feature: Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature: The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https:// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release: /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script, the results will be presented as XML file in /xml/ For details see the documentation chapter: XML result output New feature: Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to: Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>; New feature: . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend: 'Search' functions are available now in order to query for: - sites - links - keywords - categories New Admin setting: Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100). Used for: - Sites view . . .
. . .
30, 50, 100). Used for: - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings: The table in Admin backend 'Sites' view could be sorted: - by index-date, latest on top - by index-date, oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature: Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible to re-index only the URLs presented on page 1, and later on those of page 2 etc. New Admin setting: Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting: Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag, as well as the 'Description' Meta tag will be indexed. Never the less, links found in full text will be followed. New feature: Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search50php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.6b Release date: March 25, 2011 Build up with Sphider: v135 Debugged version of v.2.6 Build up with Sphider: v135 In front of version 2.6 the following modifications have been added: New Admin setting: Protect the /admin/ . . .
. . .
have been added: New Admin setting: Protect the /admin/ folder by means of a .htaccess file. If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access. New feature: Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature: The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https:// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release: /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v135 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script, the results will be presented as XML file in /xml/ For details see the documentation chapter: XML result output New feature: Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to: Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>; New feature: . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend: 'Search' functions are available now in order to query for: - sites - links - keywords - categories New Admin setting: Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100). Used for: - Sites view . . .
. . .
30, 50, 100). Used for: - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings: The table in Admin backend 'Sites' view could be sorted: - by index-date, latest on top - by index-date, oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature: Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible to re-index only the URLs presented on page 1, and later on those of page 2 etc. New Admin setting: Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting: Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag, as well as the 'Description' Meta tag will be indexed. Never the less, links found in full text will be followed. New feature: Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.6b Release date: March 25, 2011 Build up with Sphider: v.1.3.5 Debugged version of v.2.6 Build up with Sphider: v.1.3.5 In front of version 2.6 the following modifications have been added: New Admin setting: Protect the /admin/ . . .
. . .
have been added: New Admin setting: Protect the /admin/ folder by means of a .htaccess file. If activated, and if the .htaccess file is not yet available, the script will automatically detect the IP of the admin and create your .htaccess file in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be . . .
. . .
in the ../admin/ folder. If the setting is deactivated (checkbox), the .htaccess file will be deleted by the script, so that afterwards the admin folder is freed again for IP independent access. New feature: Result listings 'By URL names' and 'Like Google' are sorted in alphabetic order now. New feature: The words specified in common list (to be . . .
. . .
Media search enabled for multiple database support. User debug mode enabled for link search. Indexing of https:// sites enabled. Bug fixed for applications not using the advanced search options. Bug fixed for embedded application. Bug fixed for result sorting (By URL names). Bug fixed in 'More results from URL'. Bug fixed for 'Usese . . .
. . .
in 'Use blacklist to prevent index of pages'. Involved files that have been modified / added for this debug release: /search.php /search_ini.php /admin/admin.php /admin/configset.php /admin/spider.php /admin/spiderfuncs.php /admin/url_backup.php /admin/settings/backup/Sphider-plus_default-configuration.php /admin/thumbs/.htaccess . . .
. . .
/include/searchfuncs.php /include/search_links.php /include/search_media.php /templates/html/ all files Version: 2.6 Release date: March 08, 2011 Build up with Sphider: v.1.3.5 In front of Sphider-plus version 2.5 the following items have been added / modified: New feature: Result output is available now also as an XML file. If requested in . . .
. . .
in search.php script, the results will be presented as XML file in /xml/ For details see the documentation chapter: XML result output New feature: Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use.txt will be used to index only the content between . . .
. . .
be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to: Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not.txt For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>; New feature: . . .
. . .
activated by selecting any of the 5 databases and any set of tables in the db’s. New feature in Admin backend: 'Search' functions are available now in order to query for: - sites - links - keywords - categories New Admin setting: Define number of sites shown per page in Admin backend (pagination 10, 20, 30, 50, 100). Used for: - Sites view . . .
. . .
30, 50, 100). Used for: - Sites view - Approve URLs - Banned domains - Statistic results Improved Admin settings: The table in Admin backend 'Sites' view could be sorted: - by index-date, latest on top - by index-date, oldest on top - by title as personally defined when adding the site - in alphabetic order (URL) New feature: Additional option . . .
. . .
Additional option to Re-index only the sites that are currently shown in 'Sites' view. By selecting (pagination) 10, 20, 30, 50 or 100 sites per page, it is possible to re-index only the URLs presented on page 1, and later on those of page 2 etc. New Admin setting: Obligatory use the preferred charset as defined in 'General Settings' for . . .
. . .
as defined in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the . . .
. . .
of debug mode for Admin backend and User interface. New Admin setting: Do not index the full text. If activated, only the page 'Title', the 'Keywords' Meta tag, as well as the 'Description' Meta tag will be indexed. Never the less, links found in full text will be followed. New feature: Queries containing ' && ' will overwrite the advanced . . .
. . .
/include/search_10.php /include/search_20.php /include/search_30.php /include/search_40.php /include/search_50.php /include/search_links.php /include/search_media.php /include/searchfuncs.php /include/show_id3.php /include/suggest.php /include/common/audio.txt /include/common/divs.txt /include/IDS/Config/Config.ini.php . . .

19.   Sphider-plus - The PHP Search Engine Visit in a new window


5077  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.7 Release date: October 18 2011 Build up with Sphider: v.1.3.5 In front of version 2.6 the following modifications have been added: New indexing feature: Re-indexing could be performed periodically. Once . . .
. . .
modifications have been added: New indexing feature: Re-indexing could be performed periodically. Once started this mode will automatically re-index all sites periodically. The time interval is Admin selectable for 3 hours 12 hours 1 day 1 week or 1 month. Also the count of periodically performed re-indexing procedures is Admin . . .
. . .
month. Also the count of periodically performed re-indexing procedures is Admin selectable. For details see chapter: Periodical Re-indexing New feature for media search: Find media results not only by media 'tile' but also by EXIF and ID3 info To be activated in Admin backend. New option in Admin settings called: "Use string list of 'URL Must . . .
. . .
of the involved sites and pages (links) will be prevented. In order to erase all sites and all pages completely it might become necessary to uncheck this option Improved search form. Now offering separated search buttons for 'text' and 'media' queries as well as a button for combined search. Improved search procedure for combined search of . . .
. . .
queries as well as a button for combined search. Improved search procedure for combined search of text and media in order to speed up the search procedure. Improved delete function in Admin backend: If a site is deleted from the admin backend now also all keyword relationships to that site are withdrawn from the database. Site-specific . . .
. . .
admin backend now also all keyword relationships to that site are withdrawn from the database. Site-specific links category relationships and other dependencies like registrations in temporary and pending tables had been already observed before. Improved Admin search function: Searching for 'Sites' the result listing now will present also . . .
. . .
function: Searching for 'Sites' the result listing now will present also the 'Options' button to select Edit Re-Index Erase & Re-index Erase Delete Pages Browse and Statistics Improved index procedure for media indexing: No longer accepting dead links. In order to become indexed the media file must be present. Improved index . . .
. . .
procedure to cooperate with those servers that do not accept basic authentication strings. Improved index procedure: If the 'User Agent String' as defined in Sphider-plus Admin backend is not accepted by the site to be indexed Sphider-plus will use a standard browser HTTP_USER_AGENT to connect to the site. New algorithm to delete the content of . . .
. . .
PHP function strip_tags(); now also unclosed and invalid tags will be observed during index procedure. As result also the text following an unclosed or invalid tag will become indexed. This part of the full text was cut off by the PHP function strip_tags(). Modified index procedure: The instructions 'RESET QUERY CACHE' and 'FLUSH TABLE' will . . .
. . .
If categories are available as per default the new sites are placed in category 'none'. Improved search function: If in admin backend the option 'Delete special characters like dots commas exclamation and question marks etc. as part of words' is activated also the search query will be cleaned from secondary characters. Consequently queries . . .
. . .
is activated also the search query will be cleaned from secondary characters. Consequently queries like 'book: kellner' and 'kellner rolf' will no longer fail. This modification will not be active for 'Phrase' search. Improved search function for queries containing hyphens. Improved HTML files. Now loading faster the search form. Improved . . .
. . .
<;div id='abc'>;" for multiple nested divs. Involved files that have been modified / added for this release: /addurl.php /search_ini.php /admin/admin.php /admin/admin_header.php /admin/admin_search.php /admin/auto_index.php /admin/db_common.php /admin/configset.php /admin/index_media.php /admin/messages.php /admin/spider.php . . .


5077  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.7 Release date: October 18, 2011 Build up with Sphider: v.135 In front of version 2.6 the following modifications have been added: New indexing feature: Re-indexing could be performed periodically. Once . . .
. . .
modifications have been added: New indexing feature: Re-indexing could be performed periodically. Once started, this mode will automatically re-index all sites periodically. The time interval is Admin selectable for 3 hours, 12 hours, 1 day, 1 week or 1 month. Also the count of periodically performed re-indexing procedures is Admin . . .
. . .
month. Also the count of periodically performed re-indexing procedures is Admin selectable. For details see chapter: Periodical Re-indexing New feature for media search: Find media results not only by media 'tile', but also by EXIF and ID3 info To be activated in Admin backend. New option in Admin settings called: "Use string list of 'URL Must . . .
. . .
of the involved sites and pages (links) will be prevented. In order to erase all sites and all pages completely, it might become necessary to uncheck this option Improved search form. Now offering separated search buttons for 'text' and 'media' queries, as well as a button for combined search. Improved search procedure for combined search of . . .
. . .
queries, as well as a button for combined search. Improved search procedure for combined search of text and media, in order to speed up the search procedure. Improved delete function in Admin backend: If a site is deleted from the admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific . . .
. . .
admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific links, category relationships and other dependencies, like registrations in temporary and pending tables, had been already observed before. Improved Admin search function: Searching for 'Sites', the result listing now will present also . . .
. . .
function: Searching for 'Sites', the result listing now will present also the 'Options' button to select Edit, Re-Index, Erase & Re-index, Erase, Delete, Pages, Browse and Statistics Improved index procedure for media indexing: No longer accepting dead links. In order to become indexed, the media file must be present. Improved index . . .
. . .
procedure to cooperate with those servers that do not accept basic authentication strings. Improved index procedure: If the 'User Agent String' as defined in Sphider-plus Admin backend is not accepted by the site to be indexed, Sphider-plus will use a standard browser HTTP_USER_AGENT to connect to the site. New algorithm to delete the content of . . .
. . .
PHP function strip_tags(); now also unclosed and invalid tags will be observed during index procedure. As result, also the text following an unclosed or invalid tag will become indexed. This part of the full text was cut off by the PHP function strip_tags(). Modified index procedure: The instructions 'RESET QUERY CACHE' and 'FLUSH TABLE' will . . .
. . .
If categories are available, as per default the new sites are placed in category 'none'. Improved search function: If in admin backend the option 'Delete special characters like dots, commas, exclamation and question marks etc. as part of words' is activated, also the search query will be cleaned from secondary characters. Consequently queries . . .
. . .
is activated, also the search query will be cleaned from secondary characters. Consequently queries like 'book: kellner' and 'kellner, rolf' will no longer fail. This modification will not be active for 'Phrase' search. Improved search function for queries containing hyphens. Improved HTML files. Now loading faster the search form. Improved . . .
. . .
<;div id='abc'>;" for multiple nested divs. Involved files that have been modified / added for this release: /addurl.php /search_ini.php /admin/admin.php /admin/admin_header.php /admin/admin_search.php /admin/auto_index.php /admin/db_common.php /admin/configset.php /admin/index_media.php /admin/messages.php /admin/spider.php . . .


5077  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.7 Release date: October 18, 2011 Build up with Sphider: v.1.3.5 In front of version 2.6 the following modifications have been added: New indexing feature: Re-indexing could be performed periodically. Once . . .
. . .
modifications have been added: New indexing feature: Re-indexing could be performed periodically. Once started, this mode will automatically re-index all sites periodically. The time interval is Admin selectable for 3 hours, 12 hours, 1 day, 1 week or 1 month. Also the count of periodically performed re-indexing procedures is Admin . . .
. . .
month. Also the count of periodically performed re-indexing procedures is Admin selectable. For details see chapter: Periodical Re-indexing New feature for media search: Find media results not only by media 'tile', but also by EXIF and ID3 info To be activated in Admin backend. New option in Admin settings called: "Use string list of 'URL Must . . .
. . .
of the involved sites and pages (links) will be prevented. In order to erase all sites and all pages completely, it might become necessary to uncheck this option Improved search form. Now offering separated search buttons for 'text' and 'media' queries, as well as a button for combined search. Improved search procedure for combined search of . . .
. . .
queries, as well as a button for combined search. Improved search procedure for combined search of text and media, in order to speed up the search procedure. Improved delete function in Admin backend: If a site is deleted from the admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific . . .
. . .
admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific links, category relationships and other dependencies, like registrations in temporary and pending tables, had been already observed before. Improved Admin search function: Searching for 'Sites', the result listing now will present also . . .
. . .
function: Searching for 'Sites', the result listing now will present also the 'Options' button to select Edit, Re-Index, Erase & Re-index, Erase, Delete, Pages, Browse and Statistics Improved index procedure for media indexing: No longer accepting dead links. In order to become indexed, the media file must be present. Improved index . . .
. . .
procedure to cooperate with those servers that do not accept basic authentication strings. Improved index procedure: If the 'User Agent String' as defined in Sphider-plus Admin backend is not accepted by the site to be indexed, Sphider-plus will use a standard browser HTTP_USER_AGENT to connect to the site. New algorithm to delete the content of . . .
. . .
PHP function strip_tags(); now also unclosed and invalid tags will be observed during index procedure. As result, also the text following an unclosed or invalid tag will become indexed. This part of the full text was cut off by the PHP function strip_tags(). Modified index procedure: The instructions 'RESET QUERY CACHE' and 'FLUSH TABLE' will . . .
. . .
If categories are available, as per default the new sites are placed in category 'none'. Improved search function: If in admin backend the option 'Delete special characters like dots, commas, exclamation and question marks etc. as part of words' is activated, also the search query will be cleaned from secondary characters. Consequently queries . . .
. . .
is activated, also the search query will be cleaned from secondary characters. Consequently queries like 'book: kellner' and 'kellner, rolf' will no longer fail. This modification will not be active for 'Phrase' search. Improved search function for queries containing hyphens. Improved HTML files. Now loading faster the search form. Improved . . .
. . .
<;div id='abc'>;" for multiple nested divs. Involved files that have been modified / added for this release: /addurl.php /search_ini.php /admin/admin.php /admin/admin_header.php /admin/admin_search.php /admin/auto_index.php /admin/db_common.php /admin/configset.php /admin/index_media.php /admin/messages.php /admin/spider.php . . .


5077  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version 2.7 Release date October 18, 2011 Build up with Sphider v.1.3.5 In front of version 2.6 the following modifications have been added New indexing feature Re-indexing could be performed periodically. Once . . .
. . .
modifications have been added New indexing feature Re-indexing could be performed periodically. Once started, this mode will automatically re-index all sites periodically. The time interval is Admin selectable for 3 hours, 12 hours, 1 day, 1 week or 1 month. Also the count of periodically performed re-indexing procedures is Admin . . .
. . .
month. Also the count of periodically performed re-indexing procedures is Admin selectable. For details see chapter Periodical Re-indexing New feature for media search Find media results not only by media 'tile', but also by EXIF and ID3 info To be activated in Admin backend. New option in Admin settings called "Use string list of 'URL Must . . .
. . .
of the involved sites and pages (links) will be prevented. In order to erase all sites and all pages completely, it might become necessary to uncheck this option Improved search form. Now offering separated search buttons for 'text' and 'media' queries, as well as a button for combined search. Improved search procedure for combined search of . . .
. . .
queries, as well as a button for combined search. Improved search procedure for combined search of text and media, in order to speed up the search procedure. Improved delete function in Admin backend If a site is deleted from the admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific . . .
. . .
admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific links, category relationships and other dependencies, like registrations in temporary and pending tables, had been already observed before. Improved Admin search function Searching for 'Sites', the result listing now will present also . . .
. . .
function Searching for 'Sites', the result listing now will present also the 'Options' button to select Edit, Re-Index, Erase & Re-index, Erase, Delete, Pages, Browse and Statistics Improved index procedure for media indexing No longer accepting dead links. In order to become indexed, the media file must be present. Improved index . . .
. . .
procedure to cooperate with those servers that do not accept basic authentication strings. Improved index procedure If the 'User Agent String' as defined in Sphider-plus Admin backend is not accepted by the site to be indexed, Sphider-plus will use a standard browser HTTP_USER_AGENT to connect to the site. New algorithm to delete the content of . . .
. . .
PHP function strip_tags(); now also unclosed and invalid tags will be observed during index procedure. As result, also the text following an unclosed or invalid tag will become indexed. This part of the full text was cut off by the PHP function strip_tags(). Modified index procedure The instructions 'RESET QUERY CACHE' and 'FLUSH TABLE' will . . .
. . .
If categories are available, as per default the new sites are placed in category 'none'. Improved search function If in admin backend the option 'Delete special characters like dots, commas, exclamation and question marks etc. as part of words' is activated, also the search query will be cleaned from secondary characters. Consequently queries . . .
. . .
is activated, also the search query will be cleaned from secondary characters. Consequently queries like 'book kellner' and 'kellner, rolf' will no longer fail. This modification will not be active for 'Phrase' search. Improved search function for queries containing hyphens. Improved HTML files. Now loading faster the search form. Improved . . .
. . .
<;div id='abc'>;" for multiple nested divs. Involved files that have been modified / added for this release /addurl.php /search_ini.php /admin/admin.php /admin/admin_header.php /admin/admin_search.php /admin/auto_index.php /admin/db_common.php /admin/configset.php /admin/index_media.php /admin/messages.php /admin/spider.php . . .


5077  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version 2.7 Release date October 18, 2011 Build up with Sphider v.1.3.5 In front of version 2.6 the following modifications have been added New indexing feature Re-indexing could be performed periodically. Once . . .
. . .
modifications have been added New indexing feature Re-indexing could be performed periodically. Once started, this mode will automatically re-index all sites periodically. The time interval is Admin selectable for 3 hours, 12 hours, 1 day, 1 week or 1 month. Also the count of periodically performed re-indexing procedures is Admin . . .
. . .
month. Also the count of periodically performed re-indexing procedures is Admin selectable. For details see chapter Periodical Re-indexing New feature for media search Find media results not only by media 'tile', but also by EXIF and ID3 info To be activated in Admin backend. New option in Admin settings called "Use string list of 'URL Must . . .
. . .
of the involved sites and pages (links) will be prevented. In order to erase all sites and all pages completely, it might become necessary to uncheck this option Improved search form. Now offering separated search buttons for 'text' and 'media' queries, as well as a button for combined search. Improved search procedure for combined search of . . .
. . .
queries, as well as a button for combined search. Improved search procedure for combined search of text and media, in order to speed up the search procedure. Improved delete function in Admin backend If a site is deleted from the admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific . . .
. . .
admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific links, category relationships and other dependencies, like registrations in temporary and pending tables, had been already observed before. Improved Admin search function Searching for 'Sites', the result listing now will present also . . .
. . .
function Searching for 'Sites', the result listing now will present also the 'Options' button to select Edit, Re-Index, Erase & Re-index, Erase, Delete, Pages, Browse and Statistics Improved index procedure for media indexing No longer accepting dead links. In order to become indexed, the media file must be present. Improved index . . .
. . .
procedure to cooperate with those servers that do not accept basic authentication strings. Improved index procedure If the 'User Agent String' as defined in Sphider-plus Admin backend is not accepted by the site to be indexed, Sphider-plus will use a standard browser HTTP_USER_AGENT to connect to the site. New algorithm to delete the content of . . .
. . .
PHP function strip_tags(); now also unclosed and invalid tags will be observed during index procedure. As result, also the text following an unclosed or invalid tag will become indexed. This part of the full text was cut off by the PHP function strip_tags(). Modified index procedure The instructions 'RESET QUERY CACHE' and 'FLUSH TABLE' will . . .
. . .
If categories are available, as per default the new sites are placed in category 'none'. Improved search function If in admin backend the option 'Delete special characters like dots, commas, exclamation and question marks etc. as part of words' is activated, also the search query will be cleaned from secondary characters. Consequently queries . . .
. . .
is activated, also the search query will be cleaned from secondary characters. Consequently queries like 'book kellner' and 'kellner, rolf' will no longer fail. This modification will not be active for 'Phrase' search. Improved search function for queries containing hyphens. Improved HTML files. Now loading faster the search form. Improved . . .
. . .
<;div id='abc'>;" for multiple nested divs. Involved files that have been modified / added for this release /addurl.php /search_ini.php /admin/admin.php /admin/admin_header.php /admin/admin_search.php /admin/auto_index.php /admin/db_common.php /admin/configset.php /admin/index_media.php /admin/messages.php /admin/spider.php . . .


5077  Introduction Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.7 Release date: October 18, 2011 Build up with Sphider: v135 In front of version 2.6 the following modifications have been added: New indexing feature: Re-indexing could be performed periodically. Once . . .
. . .
modifications have been added: New indexing feature: Re-indexing could be performed periodically. Once started, this mode will automatically re-index all sites periodically. The time interval is Admin selectable for 3 hours, 12 hours, 1 day, 1 week or 1 month. Also the count of periodically performed re-indexing procedures is Admin . . .
. . .
month. Also the count of periodically performed re-indexing procedures is Admin selectable. For details see chapter: Periodical Re-indexing New feature for media search: Find media results not only by media 'tile', but also by EXIF and ID3 info To be activated in Admin backend. New option in Admin settings called: "Use string list of 'URL Must . . .
. . .
of the involved sites and pages (links) will be prevented. In order to erase all sites and all pages completely, it might become necessary to uncheck this option Improved search form. Now offering separated search buttons for 'text' and 'media' queries, as well as a button for combined search. Improved search procedure for combined search of . . .
. . .
queries, as well as a button for combined search. Improved search procedure for combined search of text and media, in order to speed up the search procedure. Improved delete function in Admin backend: If a site is deleted from the admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific . . .
. . .
admin backend, now also all keyword relationships to that site are withdrawn from the database. Site-specific links, category relationships and other dependencies, like registrations in temporary and pending tables, had been already observed before. Improved Admin search function: Searching for 'Sites', the result listing now will present also . . .
. . .
function: Searching for 'Sites', the result listing now will present also the 'Options' button to select Edit, Re-Index, Erase & Re-index, Erase, Delete, Pages, Browse and Statistics Improved index procedure for media indexing: No longer accepting dead links. In order to become indexed, the media file must be present. Improved index . . .
. . .
procedure to cooperate with those servers that do not accept basic authentication strings. Improved index procedure: If the 'User Agent String' as defined in Sphider-plus Admin backend is not accepted by the site to be indexed, Sphider-plus will use a standard browser HTTP_USER_AGENT to connect to the site. New algorithm to delete the content of . . .
. . .
PHP function strip_tags(); now also unclosed and invalid tags will be observed during index procedure. As result, also the text following an unclosed or invalid tag will become indexed. This part of the full text was cut off by the PHP function strip_tags(). Modified index procedure: The instructions 'RESET QUERY CACHE' and 'FLUSH TABLE' will . . .
. . .
If categories are available, as per default the new sites are placed in category 'none'. Improved search function: If in admin backend the option 'Delete special characters like dots, commas, exclamation and question marks etc. as part of words' is activated, also the search query will be cleaned from secondary characters. Consequently queries . . .
. . .
is activated, also the search query will be cleaned from secondary characters. Consequently queries like 'book: kellner' and 'kellner, rolf' will no longer fail. This modification will not be active for 'Phrase' search. Improved search function for queries containing hyphens. Improved HTML files. Now loading faster the search form. Improved . . .
. . .
<;div id='abc'>;" for multiple nested divs. Involved files that have been modified / added for this release: /addurl.php /search_ini.php /admin/admin.php /admin/admin_header.php /admin/admin_search.php /admin/auto_index.php /admin/db_common.php /admin/configset.php /admin/index_media.php /admin/messages.php /admin/spider.php . . .

20.   Sphider-plus - The PHP Search Engine Visit in a new window

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.8 Release date: March 31 2012 Build up with Sphider: v.1.3.5 In front of version 2.7 the following modifications have been added: New feature: Same results for queries typed with pure vowels or with accents. Will deliver the . . .
. . .
Same results for queries typed with pure vowels or with accents. Will deliver the same results for queries like: cafe and café . To be activated in Admin backend. New feature for AND and OR search: If the length of the text extract in result listing is too short to highlight all search words additional text extract are build up to highlight . . .
. . .
search words additional text extract are build up to highlight all search words of the total query. New feature: Besides bulk Re-indexing of all sites the periodical Re-indexer is now available also site specific. To be activated individual in "Options" menu of each site. New feature: Bound the length of full text indexed at each page. Will . . .
. . .
indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google MSN Amazon etc. For details see chapter: "Prevent queries from Meta search engines and crawler . . .
. . .
New option to be set in Admin backend: Block all queries sent by crawler known to be evil. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New option to be set in Admin backend: Delete special characters inside of words. Underscores hyphens and symbols like ‘ ・ “ etc. as part of words are . . .
. . .
and symbols like ‘ ・ “ etc. as part of words are deleted. So only the pure words will be indexed. New feature: The indexer could be interrupted periodically after indexing a predefined count of pages (links). Configurable in Admin settings. New option to be activated in Admin backend: Convert all kind of double quotes like “ and ” into . . .
. . .
all kind of double quotes like “ and ” into standard quotes " New option to be activated/disabled in Admin backend: Show time elapsed (to fetch the results) in result header. New option to be activated/disabled in Admin backend: In result listing show the actual result number of each result. New option to be activated/disabled in Admin backend: . . .
. . .
In result listing show the URL of each result in a separate row. New option to be selected in Admin backend : Define the default chronological order for media result listing - By title (alphabetic) - By image size - By 'Last queried' - By 'Most popular' - By file suffix New option to be activated in Admin backend : Limit the amount of . . .
. . .
per page. The image results are counted separately from audio video streams. New method of thumbnail storage: The thumbnails are no longer stored in a sub folder of the Sphider-plus installation but now are stored in database table "media" in field "thumbnail". Improved media search: AND OR and TOLERANT modes are now selectable for media . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_not.txt file the file now alternatively may contain regexp patterns. Improved option: Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_use.txt file the file now alternatively may contain regexp patterns. Presenting of multiple hits in result listing enabled now also for strict search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- . . .
. . .
search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- and blacklist as well as the other lists in /include/common/ folder now are tolerating (ignoring) blank rows. Improved index procedure now also accepting links containing "blank" characters. Improved "Erase & Re-index all" function. Now . . .
. . .
For details see chapter: "Greek language support" Improved parser for RSS v.2.0 feeds. Bug fixed in index procedure which prevented correct indexing of text placed behind multiple tabs. Bug fixed in search function for searching in multiple databases. Bug fixed in result listing when presenting multiple hits per page. Some more small bugs . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.8 Release date: March 31, 2012 Build up with Sphider: v.135 In front of version 2.7 the following modifications have been added: New feature: Same results for queries typed with pure vowels or with accents. Will deliver the . . .
. . .
Same results for queries typed with pure vowels or with accents. Will deliver the same results for queries like: cafe and café . To be activated in Admin backend. New feature for AND and OR search: If the length of the text extract in result listing is too short to highlight all search words, additional text extract are build up to highlight . . .
. . .
search words, additional text extract are build up to highlight all search words of the total query. New feature: Besides bulk Re-indexing of all sites, the periodical Re-indexer is now available also site specific. To be activated individual in "Options" menu of each site. New feature: Bound the length of full text indexed at each page. Will . . .
. . .
indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler . . .
. . .
New option to be set in Admin backend: Block all queries sent by crawler known to be evil. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New option to be set in Admin backend: Delete special characters inside of words. Underscores, hyphens and symbols like ‘ ・ “ etc. as part of words are . . .
. . .
and symbols like ‘ ・ “ etc. as part of words are deleted. So only the pure words will be indexed. New feature: The indexer could be interrupted periodically after indexing a predefined count of pages (links). Configurable in Admin settings. New option to be activated in Admin backend: Convert all kind of double quotes like “ and ” into . . .
. . .
all kind of double quotes like “ and ” into standard quotes " New option to be activated/disabled in Admin backend: Show time elapsed (to fetch the results) in result header. New option to be activated/disabled in Admin backend: In result listing show the actual result number of each result. New option to be activated/disabled in Admin backend: . . .
. . .
In result listing show the URL of each result in a separate row. New option to be selected in Admin backend : Define the default chronological order for media result listing - By title (alphabetic) - By image size - By 'Last queried' - By 'Most popular' - By file suffix New option to be activated in Admin backend : Limit the amount of . . .
. . .
per page. The image results are counted separately from audio video streams. New method of thumbnail storage: The thumbnails are no longer stored in a sub folder of the Sphider-plus installation, but now are stored in database table "media" in field "thumbnail". Improved media search: AND, OR and TOLERANT modes are now selectable for media . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_not.txt file, the file now alternatively may contain regexp patterns. Improved option: Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_use.txt file, the file now alternatively may contain regexp patterns. Presenting of multiple hits in result listing enabled now also for strict search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- . . .
. . .
search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- and blacklist, as well as the other lists in /include/common/ folder now are tolerating (ignoring) blank rows. Improved index procedure, now also accepting links containing "blank" characters. Improved "Erase & Re-index all" function. Now . . .
. . .
For details see chapter: "Greek language support" Improved parser for RSS v.2.0 feeds. Bug fixed in index procedure, which prevented correct indexing of text placed behind multiple tabs. Bug fixed in search function for searching in multiple databases. Bug fixed in result listing when presenting multiple hits per page. Some more small bugs . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.8 Release date: March 31, 2012 Build up with Sphider: v.1.3.5 In front of version 2.7 the following modifications have been added: New feature: Same results for queries typed with pure vowels or with accents. Will deliver the . . .
. . .
Same results for queries typed with pure vowels or with accents. Will deliver the same results for queries like: cafe and café . To be activated in Admin backend. New feature for AND and OR search: If the length of the text extract in result listing is too short to highlight all search words, additional text extract are build up to highlight . . .
. . .
search words, additional text extract are build up to highlight all search words of the total query. New feature: Besides bulk Re-indexing of all sites, the periodical Re-indexer is now available also site specific. To be activated individual in "Options" menu of each site. New feature: Bound the length of full text indexed at each page. Will . . .
. . .
indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler . . .
. . .
New option to be set in Admin backend: Block all queries sent by crawler known to be evil. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New option to be set in Admin backend: Delete special characters inside of words. Underscores, hyphens and symbols like ‘ ・ “ etc. as part of words are . . .
. . .
and symbols like ‘ ・ “ etc. as part of words are deleted. So only the pure words will be indexed. New feature: The indexer could be interrupted periodically after indexing a predefined count of pages (links). Configurable in Admin settings. New option to be activated in Admin backend: Convert all kind of double quotes like “ and ” into . . .
. . .
all kind of double quotes like “ and ” into standard quotes " New option to be activated/disabled in Admin backend: Show time elapsed (to fetch the results) in result header. New option to be activated/disabled in Admin backend: In result listing show the actual result number of each result. New option to be activated/disabled in Admin backend: . . .
. . .
In result listing show the URL of each result in a separate row. New option to be selected in Admin backend : Define the default chronological order for media result listing - By title (alphabetic) - By image size - By 'Last queried' - By 'Most popular' - By file suffix New option to be activated in Admin backend : Limit the amount of . . .
. . .
per page. The image results are counted separately from audio video streams. New method of thumbnail storage: The thumbnails are no longer stored in a sub folder of the Sphider-plus installation, but now are stored in database table "media" in field "thumbnail". Improved media search: AND, OR and TOLERANT modes are now selectable for media . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_not.txt file, the file now alternatively may contain regexp patterns. Improved option: Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_use.txt file, the file now alternatively may contain regexp patterns. Presenting of multiple hits in result listing enabled now also for strict search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- . . .
. . .
search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- and blacklist, as well as the other lists in /include/common/ folder now are tolerating (ignoring) blank rows. Improved index procedure, now also accepting links containing "blank" characters. Improved "Erase & Re-index all" function. Now . . .
. . .
For details see chapter: "Greek language support" Improved parser for RSS v.2.0 feeds. Bug fixed in index procedure, which prevented correct indexing of text placed behind multiple tabs. Bug fixed in search function for searching in multiple databases. Bug fixed in result listing when presenting multiple hits per page. Some more small bugs . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version 2.8 Release date March 31, 2012 Build up with Sphider v.1.3.5 In front of version 2.7 the following modifications have been added New feature Same results for queries typed with pure vowels or with accents. Will deliver the . . .
. . .
Same results for queries typed with pure vowels or with accents. Will deliver the same results for queries like cafe and café . To be activated in Admin backend. New feature for AND and OR search If the length of the text extract in result listing is too short to highlight all search words, additional text extract are build up to highlight . . .
. . .
search words, additional text extract are build up to highlight all search words of the total query. New feature Besides bulk Re-indexing of all sites, the periodical Re-indexer is now available also site specific. To be activated individual in "Options" menu of each site. New feature Bound the length of full text indexed at each page. Will . . .
. . .
indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter "Prevent queries from Meta search engines and crawler . . .
. . .
New option to be set in Admin backend Block all queries sent by crawler known to be evil. For details see chapter "Prevent queries from Meta search engines and crawler known to be evil." New option to be set in Admin backend Delete special characters inside of words. Underscores, hyphens and symbols like ‘ ・ “ etc. as part of words are . . .
. . .
and symbols like ‘ ・ “ etc. as part of words are deleted. So only the pure words will be indexed. New feature The indexer could be interrupted periodically after indexing a predefined count of pages (links). Configurable in Admin settings. New option to be activated in Admin backend Convert all kind of double quotes like “ and ” into . . .
. . .
all kind of double quotes like “ and ” into standard quotes " New option to be activated/disabled in Admin backend Show time elapsed (to fetch the results) in result header. New option to be activated/disabled in Admin backend In result listing show the actual result number of each result. New option to be activated/disabled in Admin backend . . .
. . .
In result listing show the URL of each result in a separate row. New option to be selected in Admin backend Define the default chronological order for media result listing - By title (alphabetic) - By image size - By 'Last queried' - By 'Most popular' - By file suffix New option to be activated in Admin backend Limit the amount of . . .
. . .
per page. The image results are counted separately from audio video streams. New method of thumbnail storage The thumbnails are no longer stored in a sub folder of the Sphider-plus installation, but now are stored in database table "media" in field "thumbnail". Improved media search AND, OR and TOLERANT modes are now selectable for media . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_not.txt file, the file now alternatively may contain regexp patterns. Improved option Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_use.txt file, the file now alternatively may contain regexp patterns. Presenting of multiple hits in result listing enabled now also for strict search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- . . .
. . .
search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- and blacklist, as well as the other lists in /include/common/ folder now are tolerating (ignoring) blank rows. Improved index procedure, now also accepting links containing "blank" characters. Improved "Erase & Re-index all" function. Now . . .
. . .
For details see chapter "Greek language support" Improved parser for RSS v.2.0 feeds. Bug fixed in index procedure, which prevented correct indexing of text placed behind multiple tabs. Bug fixed in search function for searching in multiple databases. Bug fixed in result listing when presenting multiple hits per page. Some more small bugs . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version 2.8 Release date March 31, 2012 Build up with Sphider v.1.3.5 In front of version 2.7 the following modifications have been added New feature Same results for queries typed with pure vowels or with accents. Will deliver the . . .
. . .
Same results for queries typed with pure vowels or with accents. Will deliver the same results for queries like cafe and café . To be activated in Admin backend. New feature for AND and OR search If the length of the text extract in result listing is too short to highlight all search words, additional text extract are build up to highlight . . .
. . .
search words, additional text extract are build up to highlight all search words of the total query. New feature Besides bulk Re-indexing of all sites, the periodical Re-indexer is now available also site specific. To be activated individual in "Options" menu of each site. New feature Bound the length of full text indexed at each page. Will . . .
. . .
indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter "Prevent queries from Meta search engines and crawler . . .
. . .
New option to be set in Admin backend Block all queries sent by crawler known to be evil. For details see chapter "Prevent queries from Meta search engines and crawler known to be evil." New option to be set in Admin backend Delete special characters inside of words. Underscores, hyphens and symbols like ‘ ・ “ etc. as part of words are . . .
. . .
and symbols like ‘ ・ “ etc. as part of words are deleted. So only the pure words will be indexed. New feature The indexer could be interrupted periodically after indexing a predefined count of pages (links). Configurable in Admin settings. New option to be activated in Admin backend Convert all kind of double quotes like “ and ” into . . .
. . .
all kind of double quotes like “ and ” into standard quotes " New option to be activated/disabled in Admin backend Show time elapsed (to fetch the results) in result header. New option to be activated/disabled in Admin backend In result listing show the actual result number of each result. New option to be activated/disabled in Admin backend . . .
. . .
In result listing show the URL of each result in a separate row. New option to be selected in Admin backend Define the default chronological order for media result listing - By title (alphabetic) - By image size - By 'Last queried' - By 'Most popular' - By file suffix New option to be activated in Admin backend Limit the amount of . . .
. . .
per page. The image results are counted separately from audio video streams. New method of thumbnail storage The thumbnails are no longer stored in a sub folder of the Sphider-plus installation, but now are stored in database table "media" in field "thumbnail". Improved media search AND, OR and TOLERANT modes are now selectable for media . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_not.txt file, the file now alternatively may contain regexp patterns. Improved option Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_use.txt file, the file now alternatively may contain regexp patterns. Presenting of multiple hits in result listing enabled now also for strict search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- . . .
. . .
search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- and blacklist, as well as the other lists in /include/common/ folder now are tolerating (ignoring) blank rows. Improved index procedure, now also accepting links containing "blank" characters. Improved "Erase & Re-index all" function. Now . . .
. . .
For details see chapter "Greek language support" Improved parser for RSS v.2.0 feeds. Bug fixed in index procedure, which prevented correct indexing of text placed behind multiple tabs. Bug fixed in search function for searching in multiple databases. Bug fixed in result listing when presenting multiple hits per page. Some more small bugs . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.8 Release date: March 31, 2012 Build up with Sphider: v.1.3.5 In front of version 2.7 the following modifications have been added: New feature: Same results for queries typed with pure vowels or with accents. Will deliver the . . .
. . .
Same results for queries typed with pure vowels or with accents. Will deliver the same results for queries like: cafe and café . To be activated in Admin backend. New feature for AND and OR search: If the length of the text extract in result listing is too short to highlight all search words, additional text extract are build up to highlight . . .
. . .
search words, additional text extract are build up to highlight all search words of the total query. New feature: Besides bulk Re-indexing of all sites, the periodical Re-indexer is now available also site specific. To be activated individual in "Options" menu of each site. New feature: Bound the length of full text indexed at each page. Will . . .
. . .
indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler . . .
. . .
New option to be set in Admin backend: Block all queries sent by crawler known to be evil. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New option to be set in Admin backend: Delete special characters inside of words. Underscores, hyphens and symbols like ‘ ・ “ etc. as part of words are . . .
. . .
and symbols like ‘ ・ “ etc. as part of words are deleted. So only the pure words will be indexed. New feature: The indexer could be interrupted periodically after indexing a predefined count of pages (links). Configurable in Admin settings. New option to be activated in Admin backend: Convert all kind of double quotes like “ and ” into . . .
. . .
all kind of double quotes like “ and ” into standard quotes " New option to be activated/disabled in Admin backend: Show time elapsed (to fetch the results) in result header. New option to be activated/disabled in Admin backend: In result listing show the actual result number of each result. New option to be activated/disabled in Admin backend: . . .
. . .
In result listing show the URL of each result in a separate row. New option to be selected in Admin backend : Define the default chronological order for media result listing - By title (alphabetic) - By image size - By 'Last queried' - By 'Most popular' - By file suffix New option to be activated in Admin backend : Limit the amount of . . .
. . .
per page. The image results are counted separately from audio video streams. New method of thumbnail storage: The thumbnails are no longer stored in a sub folder of the Sphider-plus installation, but now are stored in database table "media" in field "thumbnail". Improved media search: AND, OR and TOLERANT modes are now selectable for media . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_not.txt file, the file now alternatively may contain regexp patterns. Improved option: Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_use.txt file, the file now alternatively may contain regexp patterns. Presenting of multiple hits in result listing enabled now also for strict search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- . . .
. . .
search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- and blacklist, as well as the other lists in /include/common/ folder now are tolerating (ignoring) blank rows. Improved index procedure, now also accepting links containing "blank" characters. Improved "Erase & Re-index all" function. Now . . .
. . .
For details see chapter: "Greek language support" Improved parser for RSS v.2.0 feeds. Bug fixed in index procedure, which prevented correct indexing of text placed behind multiple tabs. Bug fixed in search function for searching in multiple databases. Bug fixed in result listing when presenting multiple hits per page. Some more small bugs . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050result-headerhtml_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.8 Release date: March 31, 2012 Build up with Sphider: v.1.3.5 In front of version 2.7 the following modifications have been added: New feature: Same results for queries typed with pure vowels or with accents. Will deliver the . . .
. . .
Same results for queries typed with pure vowels or with accents. Will deliver the same results for queries like: cafe and café . To be activated in Admin backend. New feature for AND and OR search: If the length of the text extract in result listing is too short to highlight all search words, additional text extract are build up to highlight . . .
. . .
search words, additional text extract are build up to highlight all search words of the total query. New feature: Besides bulk Re-indexing of all sites, the periodical Re-indexer is now available also site specific. To be activated individual in "Options" menu of each site. New feature: Bound the length of full text indexed at each page. Will . . .
. . .
indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler . . .
. . .
New option to be set in Admin backend: Block all queries sent by crawler known to be evil. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New option to be set in Admin backend: Delete special characters inside of words. Underscores, hyphens and symbols like ‘ ・ “ etc. as part of words are . . .
. . .
and symbols like ‘ ・ “ etc. as part of words are deleted. So only the pure words will be indexed. New feature: The indexer could be interrupted periodically after indexing a predefined count of pages (links). Configurable in Admin settings. New option to be activated in Admin backend: Convert all kind of double quotes like “ and ” into . . .
. . .
all kind of double quotes like “ and ” into standard quotes " New option to be activated/disabled in Admin backend: Show time elapsed (to fetch the results) in result header. New option to be activated/disabled in Admin backend: In result listing show the actual result number of each result. New option to be activated/disabled in Admin backend: . . .
. . .
In result listing show the URL of each result in a separate row. New option to be selected in Admin backend : Define the default chronological order for media result listing - By title (alphabetic) - By image size - By 'Last queried' - By 'Most popular' - By file suffix New option to be activated in Admin backend : Limit the amount of . . .
. . .
per page. The image results are counted separately from audio video streams. New method of thumbnail storage: The thumbnails are no longer stored in a sub folder of the Sphider-plus installation, but now are stored in database table "media" in field "thumbnail". Improved media search: AND, OR and TOLERANT modes are now selectable for media . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_not.txt file, the file now alternatively may contain regexp patterns. Improved option: Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_use.txt file, the file now alternatively may contain regexp patterns. Presenting of multiple hits in result listing enabled now also for strict search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- . . .
. . .
search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- and blacklist, as well as the other lists in /include/common/ folder now are tolerating (ignoring) blank rows. Improved index procedure, now also accepting links containing "blank" characters. Improved "Erase & Re-index all" function. Now . . .
. . .
For details see chapter: "Greek language support" Improved parser for RSS v.2.0 feeds. Bug fixed in index procedure, which prevented correct indexing of text placed behind multiple tabs. Bug fixed in search function for searching in multiple databases. Bug fixed in result listing when presenting multiple hits per page. Some more small bugs . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050result-headerhtml-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.8 Release date: March 31, 2012 Build up with Sphider: v.1.3.5 In front of version 2.7 the following modifications have been added: New feature: Same results for queries typed with pure vowels or with accents. Will deliver the . . .
. . .
Same results for queries typed with pure vowels or with accents. Will deliver the same results for queries like: cafe and café . To be activated in Admin backend. New feature for AND and OR search: If the length of the text extract in result listing is too short to highlight all search words, additional text extract are build up to highlight . . .
. . .
search words, additional text extract are build up to highlight all search words of the total query. New feature: Besides bulk Re-indexing of all sites, the periodical Re-indexer is now available also site specific. To be activated individual in "Options" menu of each site. New feature: Bound the length of full text indexed at each page. Will . . .
. . .
indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler . . .
. . .
New option to be set in Admin backend: Block all queries sent by crawler known to be evil. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New option to be set in Admin backend: Delete special characters inside of words. Underscores, hyphens and symbols like ‘ ・ “ etc. as part of words are . . .
. . .
and symbols like ‘ ・ “ etc. as part of words are deleted. So only the pure words will be indexed. New feature: The indexer could be interrupted periodically after indexing a predefined count of pages (links). Configurable in Admin settings. New option to be activated in Admin backend: Convert all kind of double quotes like “ and ” into . . .
. . .
all kind of double quotes like “ and ” into standard quotes " New option to be activated/disabled in Admin backend: Show time elapsed (to fetch the results) in result header. New option to be activated/disabled in Admin backend: In result listing show the actual result number of each result. New option to be activated/disabled in Admin backend: . . .
. . .
In result listing show the URL of each result in a separate row. New option to be selected in Admin backend : Define the default chronological order for media result listing - By title (alphabetic) - By image size - By 'Last queried' - By 'Most popular' - By file suffix New option to be activated in Admin backend : Limit the amount of . . .
. . .
per page. The image results are counted separately from audio video streams. New method of thumbnail storage: The thumbnails are no longer stored in a sub folder of the Sphider-plus installation, but now are stored in database table "media" in field "thumbnail". Improved media search: AND, OR and TOLERANT modes are now selectable for media . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_not.txt file, the file now alternatively may contain regexp patterns. Improved option: Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_use.txt file, the file now alternatively may contain regexp patterns. Presenting of multiple hits in result listing enabled now also for strict search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- . . .
. . .
search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- and blacklist, as well as the other lists in /include/common/ folder now are tolerating (ignoring) blank rows. Improved index procedure, now also accepting links containing "blank" characters. Improved "Erase & Re-index all" function. Now . . .
. . .
For details see chapter: "Greek language support" Improved parser for RSS v.2.0 feeds. Bug fixed in index procedure, which prevented correct indexing of text placed behind multiple tabs. Bug fixed in search function for searching in multiple databases. Bug fixed in result listing when presenting multiple hits per page. Some more small bugs . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050result-headerhtml /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .

Release and Legal Info Installation Documentation Change Log [ Change Log Summary ] [ Outdated version ] Version: 2.8 Release date: March 31, 2012 Build up with Sphider: v135 In front of version 2.7 the following modifications have been added: New feature: Same results for queries typed with pure vowels or with accents. Will deliver the . . .
. . .
Same results for queries typed with pure vowels or with accents. Will deliver the same results for queries like: cafe and café . To be activated in Admin backend. New feature for AND and OR search: If the length of the text extract in result listing is too short to highlight all search words, additional text extract are build up to highlight . . .
. . .
search words, additional text extract are build up to highlight all search words of the total query. New feature: Besides bulk Re-indexing of all sites, the periodical Re-indexer is now available also site specific. To be activated individual in "Options" menu of each site. New feature: Bound the length of full text indexed at each page. Will . . .
. . .
indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. New option to be set in Admin backend: Block all queries sent by Meta search engines like Google, MSN, Amazon, etc. For details see chapter: "Prevent queries from Meta search engines and crawler . . .
. . .
New option to be set in Admin backend: Block all queries sent by crawler known to be evil. For details see chapter: "Prevent queries from Meta search engines and crawler known to be evil." New option to be set in Admin backend: Delete special characters inside of words. Underscores, hyphens and symbols like ‘ ・ “ etc. as part of words are . . .
. . .
and symbols like ‘ ・ “ etc. as part of words are deleted. So only the pure words will be indexed. New feature: The indexer could be interrupted periodically after indexing a predefined count of pages (links). Configurable in Admin settings. New option to be activated in Admin backend: Convert all kind of double quotes like “ and ” into . . .
. . .
all kind of double quotes like “ and ” into standard quotes " New option to be activated/disabled in Admin backend: Show time elapsed (to fetch the results) in result header. New option to be activated/disabled in Admin backend: In result listing show the actual result number of each result. New option to be activated/disabled in Admin backend: . . .
. . .
In result listing show the URL of each result in a separate row. New option to be selected in Admin backend : Define the default chronological order for media result listing - By title (alphabetic) - By image size - By 'Last queried' - By 'Most popular' - By file suffix New option to be activated in Admin backend : Limit the amount of . . .
. . .
per page. The image results are counted separately from audio video streams. New method of thumbnail storage: The thumbnails are no longer stored in a sub folder of the Sphider-plus installation, but now are stored in database table "media" in field "thumbnail". Improved media search: AND, OR and TOLERANT modes are now selectable for media . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_not.txt file, the file now alternatively may contain regexp patterns. Improved option: Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in . . .
. . .
now is working alternately also for <;div class='abc'>; Besides the string list in divs_use.txt file, the file now alternatively may contain regexp patterns. Presenting of multiple hits in result listing enabled now also for strict search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- . . .
. . .
search. Language files added for Norwegian (nynorsk and bokmål). Thanks to Geir Kleiveland. White- and blacklist, as well as the other lists in /include/common/ folder now are tolerating (ignoring) blank rows. Improved index procedure, now also accepting links containing "blank" characters. Improved "Erase & Re-index all" function. Now . . .
. . .
For details see chapter: "Greek language support" Improved parser for RSS v.2.0 feeds. Bug fixed in index procedure, which prevented correct indexing of text placed behind multiple tabs. Bug fixed in search function for searching in multiple databases. Bug fixed in result listing when presenting multiple hits per page. Some more small bugs . . .
. . .
/templates/html/020_search-form.html /templates/html/021_search-form.html /templates/html/022_search-form.html /templates/html/050_result-header.html /templates/html/060_text-results.html /templates/html/070_more-results.html /templates/html/090_footer.html /templates/html/120_media-only results.html /templates/html/140_image-results.html . . .
Result page:1 2 3 4 Next

Most popular queries

Query Count Results Last queried
sphider 5 63 2024-04-19 11:04:35
cookies 3 2 2024-04-19 11:04:23
debug 2 14 2024-04-18 21:31:57
germany 2 1 2024-04-17 16:46:21
suggest 2 20 2024-04-19 02:20:12

Top

Visit Visit Sphider site in new window Sphider-plus