Sphider-plus



Displaying results 1 - 20 of 21 matches

1.   Sphider-plus - The PHP Search Engine Visit in a new window

sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini . . .
. . .
standard, by either putting a robots.txt file into the root directory of the server, or adding the necessary Meta tags into the page headers. This directive could be temporary overwritten site specific for the next index procedure by the advanced option: Temporary ignore 'robots.txt' 4.2 Must include / must not include string list A powerful . . .
. . .
for each URL individually. 4.3 Ignoring links Sphider-plus respect rel= nofollow attribute in <a href..> tags, so for example the link foo.html in <a href= foo.html rel= nofollow > is ignored. Also if the nofollow flag is set in the header of a site, this link will not been followed. This directive could be temporary overwritten . . .
. . .
a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to . . .
. . .
will be used to delete the content between <div id='abc'> and </ Never the less links inside of the div tags will be followed. The values in the list-file will not be interpreted case sensitive. Values in this common list may end with a wildcard, so that 'menu*' will work for ids like menu1, menu2, menu_left, etc. Multiple and nested . . .
. . .
be used to index only the content between <div id='abc'> and </ Never the less links outside of the div tags will be followed. The values in the list-file will not be interpreted case sensitive. Values in this common list may end with a wildcard, so that 'menu*' will work for ids like menu1, menu2, menu_left, etc. Multiple and nested . . .
. . .
the HTML element and tag references at http://www.w3schools.com/html/html_elements.asp http://www.w3schools.com/tags/ If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to delete the content between <tagname> and </tagname> . Never the less, links inside of the tags . . .
. . .
a relative path, but is not allowed to refer to a different domain. Unfortunately the creation of canonical link tags needs to be done manually. So special care has to be taken that other directives like robots.txt or rel="nofollow" will not prevent the crawling of the canonical origin. Top 5. UTF-8 Support and 'Preferred charset' Starting with . . .
. . .
be powerful. First of all: the complete full text, and all header information like title, keywords and description tags need to be converted into Unicode. Consequence is an increase of time required for indexing. As also suggested by Yiannes [pikos], three steps are integrated to realize this procedure: 1. Detect the charset of site, page or . . .

2.   Sphider-plus - The PHP Search Engine Visit in a new window

sitemap.xml output sitemap list offering file/page suffixes - IDS log offering: IP, host, query, impact, involved tags, date and time of intrusion. - Flood attempts log offering: IP, query, date and time of flood attempt. - Auto Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, php.ini . . .
. . .
standard, by either putting a robots.txt file into the root directory of the server, or adding the necessary Meta tags into the page headers. This directive could be temporary overwritten site specific for the next index procedure by the advanced option: Temporary ignore 'robots.txt' 4.2 Must include / must not include string list A powerful . . .
. . .
for each URL individually. 4.3 Ignoring links Sphider-plus respect rel= nofollow attribute in <a href..> tags, so for example the link foo.html in <a href= foo.html rel= nofollow > is ignored. Also if the nofollow flag is set in the header of a site, this link will not been followed. This directive could be temporary overwritten . . .
. . .
a header, footer or a menu). Any part of a page between <! sphider_noindex > and <! /sphider_noindex > tags is not indexed, however links in it are followed. 4.5 Ignoring parts of a page by <div id='abc'> Ignoring parts of a page by the <! sphider_noindex > tags requires direct access to the page, because the tags need to . . .
. . .
will be used to delete the content between <div id='abc'> and </ Never the less links inside of the div tags will be followed. The values in the list-file will not be interpreted case sensitive. Values in this common list may end with a wildcard, so that 'menu*' will work for ids like menu1, menu2, menu_left, etc. Multiple and nested . . .
. . .
be used to index only the content between <div id='abc'> and </ Never the less links outside of the div tags will be followed. The values in the list-file will not be interpreted case sensitive. Values in this common list may end with a wildcard, so that 'menu*' will work for ids like menu1, menu2, menu_left, etc. Multiple and nested . . .
. . .
the HTML element and tag references at http://www.w3schools.com/html/html_elements.asp http://www.w3schools.com/tags/ If enabled in Admin settings, the values as defined in the list-file /include/common/elements_not.txt will be used to delete the content between <tagname> and </tagname> . Never the less, links inside of the tags . . .
. . .
a relative path, but is not allowed to refer to a different domain. Unfortunately the creation of canonical link tags needs to be done manually. So special care has to be taken that other directives like robots.txt or rel="nofollow" will not prevent the crawling of the canonical origin. Top 5. UTF-8 Support and 'Preferred charset' Starting with . . .
. . .
be powerful. First of all: the complete full text, and all header information like title, keywords and description tags need to be converted into Unicode. Consequence is an increase of time required for indexing. As also suggested by Yiannes [pikos], three steps are integrated to realize this procedure: 1. Detect the charset of site, page or . . .

3.   Sphider-plus - The PHP Search Engine Visit in a new window

correctly presenting queries containing quotes. Improved search option 'Tolerant Search'. Updated detection of ID3 tags during index procedure. Updated file lists for IPs and suffixes to be ignored during index procedure. Some small bugs fixed. Improved PHP error handling for warnings, deprecated, notices etc. Involved folders and files that have . . .
. . .
the readme.pdf documentation. Improved algorithm for 'wildcard' search function. Updated algorithm to extract ID3 tags. Bug fixed in option 'Use private sitemap instead of global sitemap.xml'. Some small bugs fixed. Prepared to work in PHP7.4 environment. Involved folders and files that have been modified / added for this release: /search.php . . .
. . .
control structure include(). Improved input verification for admin. Improved index procedure to detect frameset tags. Improved admin backend to cooperate with 'Windows' based server. Cache control implemented for Internet browser, to prevent their caching. Updated scripts for ID3 extraction. Now parsing ID3v1(v1.0 & v1.1) as well as ID3v2(v2.2 . . .
. . .
3.2016d October 11, 2016 Build up with Sphider: v.1.3.5 New feature: Ignore the content inside of 'option' tags like , which are placed in body part of the HTML. To be activated in admin backend. New feature: Besides JSON and XML result output file, now also a RSS feed is created. Separately for text and media results. New feature: In . . .
. . .
settings, domains could be searched with and also without the www prefix. New feature: Ignore the content of meta tags like , which are placed in body part of the HTML. Never the less all links will be followed. To be activated in admin backend. New feature: Ignore the content inside of noscript tags like <noscript> THIS CONTENT . . .
. . .
3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', use admin edited title and description. Will be overwritten, if one of the following new features is selected: - Use admin edited title, if count of words in HTML title tag is less than: xxx - Use . . .
. . .
in 'Ignoring parts of a page defined by <div id=> or <div class=>' in conjunction with nested div tags. Bug fixed in 'Activate/disable database' menu for multiple databases containing the same table prefix. Bug fixed in 'Import / Export URL list' for multiple categories per site. Some small bugs fixed. Involved files that have . . .
. . .
class='this_valu'> and </ul> will be ignored, however links in it are followed. Multiple and nested ul tags will be attended. Values in common list may end with a wildcard, so that 'menu*' will work for menu1, menu2, menu_left, etc. Usable also for external pages, if it is impossible to add the <! sphider_noindex > tags. Details . . .
. . .
is available (page contains less than x words). Improved index procedure: - Now extracting the 'option' values in tags. - Now splitting words also at multiple special characters. - Remove all tag content from full text. - Improved charset detection. Improved option 'Convert all kind of accents like á, ç, ê, ì, ü, into their basic vowels' Now . . .

4.   Sphider-plus - The PHP Search Engine Visit in a new window

in result listing, if the found keywords are not part of the full text, but were found only in URL or meta tags. You may disable this warning message in Admin / Settings/ Search Settings / by unchecking the checkbox: Show warning message if query was not found in full text; but only in 'Title' of page, 'Keywords' 'Meta tags' or 'URL' Top . . .
. . .
size of page content, which could be indexed, but the max. size for the extracted full text. Without images, HTML tags, JavaScript etc. 16 MiB of pure text, extracted by the Sphider-plus index procedure. Top Error message: MySQL failure: MySQL server has gone away (option 2) Depending on the MySQL server environment, also the following solution . . .
. . .
part of a page gets indexed. The rest of the text got lost. Why? Might be a problem of incorrect defined HTML tags. In case that a tag is not closed correctly, indexing for that page will be ended with the incorrect tag. Words inside of tags are not part of the full text. But only the text of a page should be indexed. The indexer is using . . .
. . .
part of the full text. But only the text of a page should be indexed. The indexer is using the PHP function strip_tags() to delete the tags from the page content. Cit from the PHP manual: "Because strip_tags() does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected." In order to . . .
. . .
http://validator.w3.org This problem is solved since Sphider-plus version 2.7, because the PHP function strip_tags() is no longer used. A new function was created, now accepting also unclosed and invalid HTML and PHP tags. Top Indexing from command line shows "Fatal error: Call to undefined function getHttpVars()" The indexation script is . . .

5.   Sphider-plus - The PHP Search Engine Visit in a new window

of 20. New option: Check correct converting of content into UTF-8 Will detect invalid charset definitions in Meta tags of HTML header, or invalid charset definition supplied via HTTP by the client server. If an invalid charset is detected, the index procedure will be aborted for the regarding link. New feature: The addurl form now will only . . .
. . .
other script directives. New options in Admin 'Settings' menu: Follow URL redirections, which are invoked by body tags like <BODY onLoad = "parent.location = 'home.asp'"> 'HTTP-EQUIV= . . refresh . . content= . . .' and several other tags New option in Admin 'Settings' menu: Obey refresh delay directives, placed in meta tags like <meta . . .
. . .
New option in Admin 'Settings' menu: Do not index comment parts and scripts outside the HTML tags. New option in Admin 'Settings' menu: If not already exist, add a final slash to the path for all detected links. If a file name exists as part of the path, this option will be bypassed. Also, if the http request for the main . . .
. . .
characters are indicated by the high bit set to 1) UTF-8 support implemented for media titles, file names and ID-3 tags. SQLi connector implemented between PHP and a MySQL database. Performed by OOP. Bug fixed in option: Do not index the full text. Bug fixed for URLs containing CP1252 coded paths. Bug fixed in detection of www/non www links. Now . . .

6.   Sphider-plus - The PHP Search Engine Visit in a new window

search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media . . .
. . .
<SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a . . .
. . .
and </ as well as <div class='this_value'> and </ will be ignored. However links inside the tags are followed. Multiple and nested divs are attended. The same feature is available for classes in ul and pre tags. Index only parts of a site. < id/class value driven A common list of div id values is used to select parts of . . .

URL: http://sphider-plus.eu/ - 25.6 kb

7.   Sphider-plus - The PHP Search Engine Visit in a new window

search results. RDF, RSD, RSS and Atom feed support Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. Various search modes Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media . . .
. . .
<SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT> Follow header redirections, refresh tags and canonical links Automatical forwarding for the indexer. Follow links found in JavaScript and index also the content of document.write Will index JavaScript commands. Detect and follow links like: document.write(' <a . . .
. . .
and </ as well as <div class='this_value'> and </ will be ignored. However links inside the tags are followed. Multiple and nested divs are attended. The same feature is available for classes in ul and pre tags. Index only parts of a site. < id/class value driven A common list of div id values is used to select parts of . . .

8.   Sphider-plus - The PHP Search Engine Visit in a new window

counter for remaining input in 'title' and 'description' field. Phrase search is enabled now also for title tags, not only for full text. Improved suggest framework: For search in categories, the suggestions now will be presented with respect to the pre-selected category. For 'Search with wildcards' now the complete word is highlighted in . . .
. . .
Admin setting in section 'Suggest Options': For 'Media search' get suggestions also from EXIF info and ID3 tags. Files for database setting and script configuration are protected now against direct client access by pre-defining a named constant. Updated Swedish language file. Thanks to Holger Gremminger. Bug fixed in 'Search for . . .

9.   Sphider-plus - The PHP Search Engine Visit in a new window

directives for feed content. Additional item in Admin settings: Index 'Dublin Core' and other individually marked tags in RDF feeds. Additional item in Admin settings: Follow the 'preferred (true/false)' directive in RSD feeds. Detection of encoding (charset) added for XML and XHTML files. New item in Admin settings: During index procedure, . . .
. . .
menu2, menu_left, etc. Usable also for external pages, if it is impossible to add the <;! sphider_noindex >; tags. Common 'URL Must include' and 'URL must Not include' rules, which are valid for all new sites, may be placed now in 2 files. The contents will be transferred to the corresponding option fields when calling 'Add site' in Admin . . .

10.   Sphider-plus - The PHP Search Engine Visit in a new window

System (IDS) included to protect Sphider-plus against hacking attempts. It includes extensive regex rules to tags like XSS, SQLI, RCE, LFI, DT, CSRF, LDAP Injections, and DoS. Admin selectable, the IDS will block further user input, create a log-file, present a warning message, or even block any traffic of IP’s known to be evil. For details . . .
. . .
indexed, but only the link text (titles) of all links. Will also work for image links and their 'title' and 'alt' tags: title="this text", alternatively alt="this text". Result listing presents the (active) links with respect to the page at which they were found. If searching for a link text, the different search modes are available. New feature . . .

11.   Sphider-plus - The PHP Search Engine Visit in a new window

use a standard browser HTTP_USER_AGENT to connect to the site. New algorithm to delete the content of HTML and PHP tags No longer using the PHP function strip_tags(); now also unclosed and invalid tags will be observed during index procedure. As result, also the text following an unclosed or invalid tag will become indexed. This part of the full . . .
. . .
an unclosed or invalid tag will become indexed. This part of the full text was cut off by the PHP function strip_tags(). Modified index procedure: The instructions 'RESET QUERY CACHE' and 'FLUSH TABLE' will only be used, if the following Admin setting is activated: 'Clean resources during index / re-index and also for search function' Improved . . .

12.   Sphider-plus - The PHP Search Engine Visit in a new window

class='this_valu'> and </ul> will be ignored, however links in it are followed. Multiple and nested ul tags will be attended. Values in common list may end with a wildcard, so that 'menu*' will work for menu1, menu2, menu_left, etc. Usable also for external pages, if it is impossible to add the <! sphider_noindex > tags. Details . . .

13.   Sphider-plus - The PHP Search Engine Visit in a new window

3.2015a the following modifications have been added: New feature for index procedure: - Instead of the HTML tags 'title' and 'description', use admin edited title and description. Will be overwritten, if one of the following new features is selected: - Use admin edited title, if count of words in HTML title tag is less than: xxx - Use . . .
. . .
in 'Ignoring parts of a page defined by <div id=> or <div class=>' in conjunction with nested div tags. Bug fixed in 'Activate/disable database' menu for multiple databases containing the same table prefix. Bug fixed in 'Import / Export URL list' for multiple categories per site. Some small bugs fixed. Involved files that have . . .

14.   Sphider-plus - The PHP Search Engine Visit in a new window

settings, domains could be searched with and also without the www prefix. New feature: Ignore the content of meta tags like , which are placed in body part of the HTML. Never the less all links will be followed. To be activated in admin backend. New feature: Ignore the content inside of noscript tags like <noscript> THIS CONTENT . . .

15.   Sphider-plus - The PHP Search Engine Visit in a new window

or <title> tag. If the search string will be found only in title or Url, but not in the HTML body or meta tags, there is no short description for that Url with no possibility to highlight the search string. A warning message will be displayed instead: Search string was found only in page title or Url. This mod is Admin selectable. Index . . .
. . .
in Admin Search Settings is presented for Admin determination. Dynamic adaptation of <title> and <h1> tags. In order to create an individual title for the result pages, a new input field in Admin settings 'Search Settings' is presented. Additionally the result page <title> in HTML-header is provided with - User defined title - . . .

16.   Sphider-plus - The PHP Search Engine Visit in a new window

or <title> tag. If the search string will be found only in title or Url, but not in the HTML body or meta tags, there is no short description for that Url with no possibility to highlight the search string. A warning message will be displayed instead: Search string was found only in page title or Url. This mod is Admin selectable. Index . . .
. . .
in Admin Search Settings is presented for Admin determination. Dynamic adaptation of <title> and <h1> tags. In order to create an individual title for the result pages, a new input field in Admin settings 'Search Settings' is presented. Additionally the result page <title> in HTML-header is provided with - User defined title - . . .

17.   Sphider-plus - The PHP Search Engine Visit in a new window

the canonical link will be indexed, but links found there will be ignored. New feature: Obey the 'refresh' meta tags as part of HTML headers. Now following the redirection and delayed indexing. New option: Support UTF-16 coded sites. Will convert UTF-16 coded sites into UTF-8. To be activated in Admin settings New option: For index procedure . . .

18.   Sphider-plus - The PHP Search Engine Visit in a new window

is available (page contains less than x words). Improved index procedure: - Now extracting the 'option' values in tags. - Now splitting words also at multiple special characters. - Remove all tag content from full text. - Improved charset detection. Improved option 'Convert all kind of accents like á, ç, ê, ì, ü, into their basic vowels' Now . . .

19.   Sphider-plus - The PHP Search Engine Visit in a new window

3.2016d October 11, 2016 Build up with Sphider: v.1.3.5 New feature: Ignore the content inside of 'option' tags like , which are placed in body part of the HTML. To be activated in admin backend. New feature: Besides JSON and XML result output file, now also a RSS feed is created. Separately for text and media results. New feature: In . . .

20.   Sphider-plus - The PHP Search Engine Visit in a new window

include(). &n 833 bsp; Improved input verification for admin. Improved index procedure to detect frameset tags. Improved admin backend to cooperate with 'Windows' based server. Cache control implemented for Internet browser, to prevent their caching. Updated scripts for ID3 extraction. Now parsing ID3v1(v1.0 & v1.1) as well as ID3v2(v2.2 . . .
Result page:1 2 Next

Most popular queries

Query Count Results Last queried
sphider 3 63 2024-04-19 05:08:07
cookies 2 2 2024-04-18 17:12:37
debug 2 14 2024-04-18 21:31:57
germany 2 1 2024-04-17 16:46:21
suggest 2 20 2024-04-19 02:20:12

Top

Visit Visit Sphider site in new window Sphider-plus