. . . in not encoded PDFs. Realized as pure PHP script, the new script does no longer require the definition to its individual path. New spreadsheet converter scripts for .xls and .xlsx files. New open document text converter script for .odt files. New PowerPoint /Impress converter script for .pptx presentations. New converter to extract .ZIP . . .
. . . Settings = Search Settings New web service to create web shot thumbnails of each indexed page. Will be presented individually for each search result. For details about the new web service, please notice chapter 5.7 of the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed. . .
. . . links found during indexing. Also ignore TLD, SLD and www Bug fixed in 'Ignoring parts of a page defined by <div id=> or <div class=>' in conjunction with nested div tags. Bug fixed in 'Activate/disable database' menu for multiple databases containing the same table prefix. Bug fixed in 'Import / Export URL list' for multiple. . .
. . . unwanted queries. Enhanced .htaccess file. For details see item 11 in /.htaccess Bug fixed in option 'Use list of div ids and classes to ignore the div content during index/re-index'. Some small bugs fixed. Involved files that have been modified / added for this release: /.htaccess /addurl.php /search.php /search_ini.php /admin/admin.php. . .
. . . and ignore any text. To be activated in Admin settings. New option: Do not index content, which is placed as div class="s-hidden" like: <div id=" . . ." class="s-hidden" > this content </ New option: Treat localhost URLs RFC 1808 conform. If activated, http:/localhost/ will always be used as root directory. Otherwise URLs like. . .
. . . 4.3 Ignoring links 4.4 Ignoring parts of a page by <! sphider_noindex > 4.5 Ignoring parts of a page by <div id='abc'> 4.6 Indexing only parts of a page by <div id='abc'> 4.7 Ignore HTML elements defined by <tagname> . . . . </tagname > 4.8 Index only HTML elements defined by <tagname> . . . . </tagname . . .
. . . Add Site - Index only the new - Re-index all - Re-index only preferred URLs - Erase Re-index (available also for individual URLs) - Import/export URL list - Approve sites - Banned domains Categories: - Add, edit, delete - Create new subcategory under Index: - Basic indexing options - Advanced options Clean: - Clean keywords not associated with. . .
. . . Finnish, French, German, Greek, Hungarian, Italian, Portuguese, Russian, Spanish and Swedish To be activated individually for the language that needs to be indexed. Automatically the according common word list (holding the stop words not to be stored in database) will be activated together with the stemming language. For Chinese, Greek and. . .
. . . and aborted in Admin backend, by selecting the 'Periodical Re-index' submenu in 'Sites' view. Instead for site individual Re-indexing, the periodical Re-indexer could be started and aborted in the "Options" menu of each site. 2.5 Preferred Re-indexing Each new URL added to the Admin backend, could be supplied with a priority level. This level. . .
. . . will be incremented by each thread. If multithreaded indexing is not activated, the ID will be set to '0'. The individual threads will be activated by means of the Admin dialog. For example, if 'Erase & Re-index' is selected, after the 'Erasing' dialog, the threads could be started in sequencing order. It is not necessary to invoke all. . .
. . . for all indexed media. Open the indexed media with according player software. Multiple database support Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined . . .
. . . in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of tables. Result cache Extremely reduced response time for queries already cached. Controller to keep the 'Most Popular Queries' always in cache. Separate caches for text and media. . .
. . . containing the according level, will be re-indexed.. Erase Re-index and Continue suspended index procedures Individual (site specific) or bulk update of database. Support of XML product feeds Index and search of feed content, inclusive formatting the search results. RDF, RSD, RSS and Atom feed support Index and search of feed content,. . .
. . . User IP, Country code, Host name, Last queried, Top keywords, etc. Segmentation of Chinese and Korean words Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 , so that all will become searchable. Dictionaries with 106.800 radicals. Segmentation of Japanese words Segmentation of 5.724 kanji (new, old and half width), hiragana,. . .
. . . site. < id/class value driven, <ul> class value driven, <pre> class value driven. A common list of div id values is used to ignore parts of a page. Content between <div id='this_value'> and </ as well as <div class='this_value'> and </ will be ignored. However links inside the tags are followed. Multiple and. . .
. . . Follow CDATA directives for feed content. Additional item in Admin settings: Index 'Dublin Core' and other indiv idually marked tags in RDF feeds. Additional item in Admin settings: Follow the 'preferred (true/false)' directive in RSD feeds. Detection of encoding (charset) added for XML and XHTML files. New item in Admin settings: During . . .
. . . settings, the count of results will be limited for text and media results New item in Admin settings: Use list of div id's to ignore the corresponding div content during index/re-index A common list of div id values is used to ignore parts of a page. Content between <;div id=’this_value’>; and <;/ ; will be ignored, however links in it. . .
. . . <;div id=’this_value’>; and <;/ ; will be ignored, however links in it are followed. Multiple and nested div ’s will be attended. Values in common list may end with a wildcard, so that 'menu*' will work for menu1, menu2, menu_left, etc. Usable also for external pages, if it is impossible to add the <;! sphider_noindex >; tags.. . .
. . . The contents will be transferred to the corresponding option fields when calling 'Add site' in Admin menu. Indiv idually de-selectable by checkbox. Details in documentation chapter: Must include / must not include string list Log output suppressed, if the indexer is only redirected from http://www.abc.de to http://www.abc.de/index.html. . .
. . . /include/common/must_include.txt /include/common/must_not_include.txt /include/common/not_div .txt /include/images/no_fonts.jpg /languages/ all files /templates/all folders/hdline.jpg /templates/all folders/thisstyle.css /converter/rss2html.php rss.html rss_parser.php = no longer required Attention: This version requires. . .
. . . /xml/ For details see the documentation chapter: XML result output New feature: Index only parts of a page by <;div id='abc'>; If enabled in Admin settings, the values as defined in the list-file /include/common/divs_use .txt will be used to index only the content between <;div id='abc'>; and <;/ ; . This is the contray function to: . . .
. . . between <;div id='abc'>; and <;/ ; . This is the contray function to: Ignoring parts of a page by <;div id='abc'>; which is controlled by the list file /include/common/divs_not .txt For details see the documentation chapter: Indexing only parts of a page by <;div id='abc'>; New feature: Individual (Admin) settings for each. . .
. . . in 'General Settings' for indexing. The corresponding option is to be found in sites 'Edit' option, so that individual sites could be influenced. If activated, the header information like <;meta http-equiv='Content-Type' content='text/html; charset=windows-1256 />; of the site to be indexed, will be overwritten by the preferred. . .
. . . Separated style sheet files are now included for Admin backend and for the User interface. This enables to individualize the User style sheet without destroying the Admin design. For details see the documentation chapter: Integration of Sphider-plus into existing sites Improved 'Did you mean' algorithm. Now searching for a wider range of. . .
. . . bulk Re-indexing of all sites, the periodical Re-indexer is now available also site specific. To be activated individual in "Options" menu of each site. New feature: Bound the length of full text indexed at each page. Will limit the indexed keywords to be extracted only from the first part of the full text, if set to values like 500 or 1000. . . .
. . . Full Index or folder depth definition - Spider can leave domain - Use preferred charset for indexing Afterwards individual settings could be performed site specific in the advanced option of each site URL. The global settings will also be used for suggested sites (addurl form). New option in Admin 'Clear' menu: Clear all entries in 'Addurl'. . .
. . . Admin 'Clear' menu: Clear all entries in 'Banned' table. Improved option: Ignoring parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in divs_not .txt file, the file now alternatively may contain regexp patterns. Improved option: Indexing only parts of a page. . .
. . . file now alternatively may contain regexp patterns. Improved option: Indexing only parts of a page defined by <;div id='abc'>; now is working alternately also for <;div class='abc'>; Besides the string list in divs_use .txt file, the file now alternatively may contain regexp patterns. Presenting of multiple hits in result listing. . .
. . . links found during indexing. Also ignore TLD, SLD and www Bug fixed in 'Ignoring parts of a page defined by <div id=> or <div class=>' in conjunction with nested div tags. Bug fixed in 'Activate/disable database' menu for multiple databases containing the same table prefix. Bug fixed in 'Import / Export URL list' for multiple . . .
. . . a file name and/or query. Bug fixed in option 'Crawler can leave domain'. Bug fixed in option 'Use list of div ids to ignore the div content during index/re-index'. Bug fixed in option 'Enable to decode entity coded sites into standard HTML characters'. Bug fixed in 'addurl' form, which prevented input of words containing accents in. . .
. . . any sub folder of the suggested URL will be ignored. New feature: Ignore the content of style="display:none" in div elements. Something like: <div style="display:none">ignore_this_content</ New feature: In order to enable immediate query input, auto focus is set to the search form. New suggest framework. The auto-complete feature of . . .
. . . URLs, containing quotes. Improved black list comparison. Improved xlsx converter. Bug fixed in option 'Use list of div ids to ignore the complete div content during index/re-index'. Bug fixed in 'Most Popular search' table at the bottom of result listing. Bug fixed while indexing <! sphider_noindex > Bug fixed in result listing, while . . .