Changeset 1623
- Timestamp:
- 02/03/05 14:56:38 (4 years ago)
- Files:
-
- trunk/swish-e/filters/SWISH/Filter.pm.in (modified) (27 diffs)
- trunk/swish-e/perl/API.pm (modified) (28 diffs)
- trunk/swish-e/pod/CHANGES.pod (modified) (4 diffs)
- trunk/swish-e/pod/SWISH-CONFIG.pod (modified) (8 diffs)
- trunk/swish-e/pod/SWISH-FAQ.pod (modified) (1 diff)
- trunk/swish-e/pod/SWISH-RUN.pod (modified) (2 diffs)
- trunk/swish-e/prog-bin/spider.pl.in (modified) (4 diffs)
- trunk/swish-e/src/metanames.c (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
trunk/swish-e/filters/SWISH/Filter.pm.in
r1586 r1623 125 125 =over 4 126 126 127 =item $filter = SWISH::Filter- >new()127 =item $filter = SWISH::Filter-E<gt>new() 128 128 129 129 This creates a SWISH::Filter object. You may pass in options as a list or a hash reference. … … 131 131 =back 132 132 133 =head2 SWISH::Filter- >new Options133 =head2 SWISH::Filter-E<gt>new Options 134 134 135 135 There is currently only one option that can be passed in to new(): … … 218 218 219 219 220 =item $doc_object = $filter- >convert();220 =item $doc_object = $filter-E<gt>convert(); 221 221 222 222 This method filters a document. Returns an object of the class SWISH::Filter::document … … 411 411 412 412 413 =item $filter- >mywarn()413 =item $filter-E<gt>mywarn() 414 414 415 415 Internal function used for writing warning messages to STDERR if … … 425 425 } 426 426 427 =item @filters = $filter- >filter_list;427 =item @filters = $filter-E<gt>filter_list; 428 428 429 429 Returns a list of filter objects installed. … … 514 514 } 515 515 516 =item @filter = $filter- >can_filter( $content_type );516 =item @filter = $filter-E<gt>can_filter( $content_type ); 517 517 518 518 This is useful for testing to see if a mimetype might be handled by SWISH::Filter … … 585 585 586 586 Once a filter returns something other than undef no more filters will be 587 called. If the filter calls $filter- >set_continue then processing will587 called. If the filter calls $filter-E<gt>set_continue then processing will 588 588 continue as if the file was not filtered. For example, a filter can uncompress 589 data and then set $filter- >set_continue and let other filters process the589 data and then set $filter-E<gt>set_continue and let other filters process the 590 590 document. 591 591 … … 834 834 835 835 836 =item $doc_ref = $doc_object- >fetch_doc_reference;836 =item $doc_ref = $doc_object-E<gt>fetch_doc_reference; 837 837 838 838 Returns a scalar reference to the document. This can be used when the filter … … 842 842 If the file is currently on disk then it will be read into memory. If the file was stored 843 843 in a temporary file on disk the file will be deleted once read into memory. 844 The file will be read in binmode if $doc- >is_binary is true.845 846 Note that $doc_object- >fetch_doc is an alias.844 The file will be read in binmode if $doc-E<gt>is_binary is true. 845 846 Note that $doc_object-E<gt>fetch_doc is an alias. 847 847 848 848 =cut … … 861 861 862 862 863 =item $was_filtered = $doc_object- >was_filtered863 =item $was_filtered = $doc_object-E<gt>was_filtered 864 864 865 865 Returns true if some filter processed the document … … 872 872 } 873 873 874 =item $content_type = $doc_object- >content_type;874 =item $content_type = $doc_object-E<gt>content_type; 875 875 876 876 Fetches the current content type for the document. … … 887 887 } 888 888 889 =item $type = $doc_object- >swish_parser_type889 =item $type = $doc_object-E<gt>swish_parser_type 890 890 891 891 Returns a parser type based on the content type … … 915 915 } 916 916 917 =item $doc_object- >is_binary917 =item $doc_object-E<gt>is_binary 918 918 919 919 Returns true if the document's content-type does not match "text/". … … 932 932 =over 4 933 933 934 =item $file_name = $doc_object- >fetch_filename;934 =item $file_name = $doc_object-E<gt>fetch_filename; 935 935 936 936 Returns a path to the document as stored on disk. … … 941 941 the file name passed to be the real path of the document. 942 942 943 The file will be written in binmode if $doc- >is_binary is true.943 The file will be written in binmode if $doc-E<gt>is_binary is true. 944 944 945 945 This method is not normally used by end-users of SWISH::Filter. … … 958 958 } 959 959 960 =item $doc_object- >set_continue;960 =item $doc_object-E<gt>set_continue; 961 961 962 962 Processing will continue to the next filter if this is set to a true value. … … 979 979 980 980 981 =item $doc_object- >set_content_type( $type );981 =item $doc_object-E<gt>set_content_type( $type ); 982 982 983 983 Sets the content type for a document. … … 1036 1036 1037 1037 1038 =item $doc_object- >name1038 =item $doc_object-E<gt>name 1039 1039 1040 1040 Fetches the name of the current file. This is useful for printing out the 1041 1041 name of the file in an error message. 1042 This is the name passed in to the SWISH::Filter- >convert method.1042 This is the name passed in to the SWISH::Filter-E<gt>convert method. 1043 1043 It is optional and thus may not always be set. 1044 1044 … … 1047 1047 1048 1048 1049 =item $doc_object- >user_data1049 =item $doc_object-E<gt>user_data 1050 1050 1051 1051 Fetches the the user_data passed in to the filter. 1052 This can be any data or data structure passed into SWISH::Filter- >new.1052 This can be any data or data structure passed into SWISH::Filter-E<gt>new. 1053 1053 1054 1054 This is an easy way to pass special parameters into your filters. … … 1105 1105 =over 4 1106 1106 1107 =item $self- >type1107 =item $self-E<gt>type 1108 1108 1109 1109 This method fetches the type of the filter. The value returned sets the … … 1119 1119 sub type { 2 }; 1120 1120 1121 =item $self- >priority1121 =item $self-E<gt>priority 1122 1122 1123 1123 This method fetches the priority of the filter. The value returned sets the … … 1136 1136 sub priority { 50 }; # default priority 1137 1137 1138 =item @types = $self- >mimetypes1138 =item @types = $self-E<gt>mimetypes 1139 1139 1140 1140 Returns the list of mimetypes (as regular expressions) set for the filter. … … 1154 1154 } 1155 1155 1156 =item $pattern = $self- >can_filter_mimetype( $content_type )1156 =item $pattern = $self-E<gt>can_filter_mimetype( $content_type ) 1157 1157 1158 1158 Returns true if passed in content type matches one of the filter's mimetypes … … 1183 1183 } 1184 1184 1185 =item $boolean = $self- >set_programs( @program_list );1185 =item $boolean = $self-E<gt>set_programs( @program_list ); 1186 1186 1187 1187 Returns true if all the programs listed in @program_list are found … … 1224 1224 1225 1225 1226 =item $path = $self- >find_binary( $prog );1226 =item $path = $self-E<gt>find_binary( $prog ); 1227 1227 1228 1228 Use in a filter's new() method to test for a necesary program located in $PATH. … … 1306 1306 } 1307 1307 1308 =item $bool = $self- >use_modules( @module_list );1308 =item $bool = $self-E<gt>use_modules( @module_list ); 1309 1309 1310 1310 Attempts to load each of the module listed and calls its import() method. … … 1342 1342 } 1343 1343 1344 =item $doc_ref = $self- >run_program( $program, @args );1344 =item $doc_ref = $self-E<gt>run_program( $program, @args ); 1345 1345 1346 1346 Runs $program with @args. Must pass in @args. trunk/swish-e/perl/API.pm
r1586 r1623 136 136 =over 4 137 137 138 =item $swish = SWISH::API- >new( $index_files );138 =item $swish = SWISH::API-E<gt>new( $index_files ); 139 139 140 140 This method returns a swish handle object blessed into the SWISH::API class. … … 143 143 Caller must check for errors (see below). 144 144 145 =item @indexes = $swish- >IndexNames;145 =item @indexes = $swish-E<gt>IndexNames; 146 146 147 147 Returns a list of index names associated with the swish handle. 148 These were the indexes specified as a parameter on the SWISH::API- >new call.148 These were the indexes specified as a parameter on the SWISH::API-E<gt>new call. 149 149 This can be used in calls below that require specifying the index file name. 150 150 151 =item @header_names = $swish- >HeaderNames;151 =item @header_names = $swish-E<gt>HeaderNames; 152 152 153 153 Returns a list of possible header names. These can be used to lookup 154 154 header values. See C<SwishHeaderValue> method below. 155 155 156 =item @values = $swish- >HeaderValue( $index_file, $header_name );156 =item @values = $swish-E<gt>HeaderValue( $index_file, $header_name ); 157 157 158 158 A swish-e index has data associated with it stored in the index header. This method … … 164 164 The list of possible header names can be obtained from the SwishHeaderNames method. 165 165 166 =item $swish- >RankScheme( 0|1 );166 =item $swish-E<gt>RankScheme( 0|1 ); 167 167 168 168 Similar to the -R option with the swish-e command line tool. The default … … 196 196 =over 4 197 197 198 =item $swish- >Error198 =item $swish-E<gt>Error 199 199 200 200 Returns true if an error occurred on the last operation. On errors the value returned 201 201 is the internal Swish-e error number (which is less than zero). 202 202 203 =item $swish- >CriticalError203 =item $swish-E<gt>CriticalError 204 204 205 205 Returns true if the last error was a critical error 206 206 207 =item $swish- >AbortLastError207 =item $swish-E<gt>AbortLastError 208 208 209 209 Aborts the running program and prints an error message to STDERR. 210 210 211 =item $str = $swish- >ErrorString211 =item $str = $swish-E<gt>ErrorString 212 212 213 213 Returns the string description of the current error (based on the value 214 returned by $swish- >Error). This is a generic error string.215 216 =item $msg = $swish- >LastErrorMsg214 returned by $swish-E<gt>Error). This is a generic error string. 215 216 =item $msg = $swish-E<gt>LastErrorMsg 217 217 218 218 Returns a string with specific information about the last error, if any. … … 221 221 badmeta=foo 222 222 223 and "badmeta" is an invalid metaname $swish- >ErrorString224 might return "Unknown metaname", but $swish- >LastErrorMsg might return "badmeta".223 and "badmeta" is an invalid metaname $swish-E<gt>ErrorString 224 might return "Unknown metaname", but $swish-E<gt>LastErrorMsg might return "badmeta". 225 225 226 226 … … 231 231 =over 4 232 232 233 =item $search = $swish- >New_Search_Object( $query );233 =item $search = $swish-E<gt>New_Search_Object( $query ); 234 234 235 235 This creates a new search object blessed into the SWISH::API::Search class. The optional … … 246 246 } 247 247 248 =item $results = $swish- >Query( $query );248 =item $results = $swish-E<gt>Query( $query ); 249 249 250 250 This is a short-cut which avoids the step of creating a separate search object. … … 266 266 =over 4 267 267 268 =item $search- >SetQuery( $query );268 =item $search-E<gt>SetQuery( $query ); 269 269 270 270 This will set (or replace) the query string associated with a search object. … … 272 272 actual query or when creating a search object. 273 273 274 =item $search- >SetStructure( $structure_bits );274 =item $search-E<gt>SetStructure( $structure_bits ); 275 275 276 276 This method may change in the future. … … 294 294 295 295 296 =item $search- >PhraseDelimiter( $char );296 =item $search-E<gt>PhraseDelimiter( $char ); 297 297 298 298 Sets the character used as the phrase delimiter in searches. The default 299 299 is double-quotes ("). 300 300 301 =item $search- >SetSearchLimit( $property, $low, $high );301 =item $search-E<gt>SetSearchLimit( $property, $low, $high ); 302 302 303 303 Sets a range from $low to $high inclusive that the give $property must be in … … 323 323 method first. 324 324 325 =item $search- >ResetSearchLimit;325 =item $search-E<gt>ResetSearchLimit; 326 326 327 327 Clears the limit parameters for the given object. This must be called if 328 328 the limit parameters need to be changed. 329 329 330 =item $search- >SetSort( $sort_string );330 =item $search-E<gt>SetSort( $sort_string ); 331 331 332 332 Sets the sort order of search results. The string is a space separated … … 352 352 =over 4 353 353 354 =item $results = $search- >Execute( $query );354 =item $results = $search-E<gt>Execute( $query ); 355 355 356 356 Executes a query based on the parameters in the search object. … … 373 373 =over 4 374 374 375 =item $hits = $results- >Hits;375 =item $hits = $results-E<gt>Hits; 376 376 377 377 Returns the number of results for the query. If zero and no errors were reported 378 after calling $search- >Execute then the query returned zero results.379 380 =item @parsed_words = $results- >ParsedWords( $index_name );378 after calling $search-E<gt>Execute then the query returned zero results. 379 380 =item @parsed_words = $results-E<gt>ParsedWords( $index_name ); 381 381 382 382 Returns an array of tokenized words and operators with stopwords removed. … … 384 384 385 385 $index_name must match one of the index files specified on the creation of 386 the swish object (via the SWISH::API- >new call).386 the swish object (via the SWISH::API-E<gt>new call). 387 387 388 388 The parsed words are useful for highlighting search terms in associated documents. 389 389 390 =item @removed_stopwords = $results- >RemovedStopwords( $index_name) ;390 =item @removed_stopwords = $results-E<gt>RemovedStopwords( $index_name) ; 391 391 392 392 Returns an array of stopwords removed from a query, if any, for the index … … 394 394 395 395 $index_name must match one of the index files specified on the creation of 396 the swish object (via the SWISH::API- >new call).397 398 =item $results- >SeekResult( $position );396 the swish object (via the SWISH::API-E<gt>new call). 397 398 =item $results-E<gt>SeekResult( $position ); 399 399 400 400 Seeks to the position specified in the result list. Zero is the first position 401 and $results- >Hits-1 is the last position. Seeking past the end of results401 and $results-E<gt>Hits-1 is the last position. Seeking past the end of results 402 402 sets a non-critical error condition. 403 403 404 404 Useful for seeking to a specific "page" of results. 405 405 406 =item $result = $results- >NextResult;406 =item $result = $results-E<gt>NextResult; 407 407 408 408 Fetches the next result from the list of results. Returns undef if no … … 418 418 =over 4 419 419 420 =item $prop = $result- >Property( $prop_name );420 =item $prop = $result-E<gt>Property( $prop_name ); 421 421 422 422 Fetches the property specified for the current result. … … 429 429 format the strings (or just call scalar localtime( $prop ) ). 430 430 431 =item $prop = $result- >ResultPropertyStr( $prop_name );431 =item $prop = $result-E<gt>ResultPropertyStr( $prop_name ); 432 432 433 433 Fetches and formats the property. Unlike above, invalid property names return the … … 437 437 438 438 439 =item $value = $result- >ResultIndexValue( $header_name );439 =item $value = $result-E<gt>ResultIndexValue( $header_name ); 440 440 441 441 Returns the header value specified. This is similar to 442 $swish- >HeaderValue(), but the index file is not specified442 $swish-E<gt>HeaderValue(), but the index file is not specified 443 443 (it is determined by the result). 444 444 … … 449 449 =over 4 450 450 451 =item @metas = $swish- >MetaList( $index_name );451 =item @metas = $swish-E<gt>MetaList( $index_name ); 452 452 453 453 Swish-e has "MetaNames" which allow searching by fields in the index. … … 470 470 value is zero. 471 471 472 =item @props = $swish- >PropertyList( $index_name );472 =item @props = $swish-E<gt>PropertyList( $index_name ); 473 473 474 474 Swish-e can store content or "properties" in the index and return this data … … 494 494 value is zero. 495 495 496 =item @propes = $result- >PropertyList;497 498 =item @meta = $result- >MetaList;496 =item @propes = $result-E<gt>PropertyList; 497 498 =item @meta = $result-E<gt>MetaList; 499 499 500 500 These also return a list of Property or Metaname description objects, but are … … 504 504 505 505 506 =item $stemmed_word = $swish- >StemWord( $word );506 =item $stemmed_word = $swish-E<gt>StemWord( $word ); 507 507 508 508 *Deprecated* … … 515 515 516 516 517 =item $fuzzy_word = $swish- >Fuzzy( $indexname, $word );517 =item $fuzzy_word = $swish-E<gt>Fuzzy( $indexname, $word ); 518 518 519 519 Like StemWord used to work, only it uses whatever stemmer is named in $indexname. 520 520 Returns the same kind of fuzzy_word object as the FuzzyWord() method. 521 521 522 =item $mode_string = $result- >FuzzyMode;522 =item $mode_string = $result-E<gt>FuzzyMode; 523 523 524 524 Returns the string (e.g. "Stemming_en", "Soundex", "None" ) indicating the stemming 525 525 method used while indexing the given document. 526 526 527 =item $fuzzy_word = $result- >FuzzyWord( $word );527 =item $fuzzy_word = $result-E<gt>FuzzyWord( $word ); 528 528 529 529 Converts $word using the same fuzzy mode used to index the $result. … … 531 531 to access the converted words and other data as shown below. 532 532 533 =item $count = $fuzzy_word- >WordCount;533 =item $count = $fuzzy_word-E<gt>WordCount; 534 534 535 535 Returns the number of output words. Normally this is the value one, but may … … 537 537 for a single input string. 538 538 539 =item $status = $fuzzy_word- >WordError;539 =item $status = $fuzzy_word-E<gt>WordError; 540 540 541 541 Returns any error code that the stemmer might set. Normally, this return value … … 543 543 are defined in the swish-e source file /src/stemmer.h. 544 544 545 =item @words = $fuzzy_word- >WordList;545 =item @words = $fuzzy_word-E<gt>WordList; 546 546 547 547 Returns the converted words from the stemming/fuzzy operation. Normally, the array will … … 550 550 551 551 In the event that a word does not stem (e.g. trying to stem a number), this method 552 will return the original input word specified when $result- >FuzzyWord( $word )552 will return the original input word specified when $result-E<gt>FuzzyWord( $word ) 553 553 was called. 554 554 555 555 556 =item @parsed_words = $swish- >SwishWords( $string, $index_file );556 =item @parsed_words = $swish-E<gt>SwishWords( $string, $index_file ); 557 557 558 558 * Not implemented * … … 594 594 595 595 But as long as a SWISH::API::Result object is around, so is the entire list 596 of results generated by the $handle- >Query() call, and the index file is596 of results generated by the $handle-E<gt>Query() call, and the index file is 597 597 still open (because a SWISH::API::Result depends on a SWISH::API::Results object, which 598 598 depends on a SWISH::API object). trunk/swish-e/pod/CHANGES.pod
r1613 r1623 616 616 617 617 618 You should have swish-e packages in your RPMS/ <arch>directory. [augur]618 You should have swish-e packages in your RPMS/$arch directory. [augur] 619 619 620 620 =item * Changed default perl binary location … … 748 748 Filters (FileFilter directive) did not work correctly when spidering 749 749 with the -S http method. A new filter system was developed and now 750 filtering of documents (e.g. pdf- >html or MSWord->text) is handled750 filtering of documents (e.g. pdf-E<gt>html or MSWord-E<gt>text) is handled 751 751 by the src/SwishSpider program. 752 752 … … 840 840 841 841 If you are parsing output headers in a program then you may need to 842 adjust your code. There's a new switch <-H>to control the level of842 adjust your code. There's a new switch '-H' to control the level of 843 843 header output when searching. 844 844 … … 1138 1138 =item * New directive: ImageLinksMetaName 1139 1139 1140 Defines a metaname to use for indexing src links in <img> tags.1140 Defines a metaname to use for indexing src links in E<lt>imgE<gt> tags. 1141 1141 Allow you to search image pathnames within HTML pages. Available only 1142 1142 with libxml2 parser. trunk/swish-e/pod/SWISH-CONFIG.pod
r1613 r1623 395 395 =item * 396 396 397 L<StoreDescription|/StoreDescription> [XML <tag>|HTML <meta>|TXT size]397 L<StoreDescription|/StoreDescription> [XML E<lt>tagE<gt>|HTML E<lt>metaE<gt>|TXT size] 398 398 399 399 =item * … … 403 403 =item * 404 404 405 L<SwishSearchDefaultRule|/SwishSearchDefaultRule> [ <AND-WORD>|<or-word>]406 407 =item * 408 409 L<SwishSearchOperators|/SwishSearchOperators> <and-word> <or-word> <not-word>405 L<SwishSearchDefaultRule|/SwishSearchDefaultRule> [E<lt>AND-WORDE<gt>|E<lt>or-wordE<gt>] 406 407 =item * 408 409 L<SwishSearchOperators|/SwishSearchOperators> E<lt>and-wordE<gt> E<lt>or-wordE<gt> E<lt>not-wordE<gt> 410 410 411 411 =item * … … 545 545 " " = following word will be searched in documents 546 546 547 =item SwishSear chOperators <and-word> <or-word> <not-word>547 =item SwishSearhOperators E<lt>and-wordE<gt> E<lt>or-wordE<gt> E<lt>not-wordE<gt> 548 548 549 549 B<NOTE>: This following item is currently not available. … … 557 557 SwishSearchOperators UND ODER NICHT 558 558 559 =item SwishSearchDefaultRule [ <AND-WORD>|<or-word>]559 =item SwishSearchDefaultRule [E<lt>AND-WORDE<gt>|E<lt>or-wordE<gt>] 560 560 561 561 B<NOTE>: This following item is currently not available. … … 1094 1094 =item IndexAltTagMetaName *tagname*|as-text 1095 1095 1096 Allows indexing of images <IMG> ALT tag text. Specify either a tag name which will be1096 Allows indexing of images E<lt>IMGE<gt> ALT tag text. Specify either a tag name which will be 1097 1097 used as a metaname, or the special text "as-text" which says to index the ALT text as 1098 1098 if it were plain text at the current location. … … 1136 1136 If this is set true then Swish-e will attempt to convert relative URIs 1137 1137 extracted from HTML documents for use with C<HTMLLinksMetaName> and 1138 C<ImageLinksMetaName> into absolute URIs. Swish-e will use any <BASE> tag1138 C<ImageLinksMetaName> into absolute URIs. Swish-e will use any E<lt>BASEE<gt> tag 1139 1139 found in the document, otherwise it will use the file's pathname. The pathname 1140 1140 used will be the pathname *after* C<ReplaceRules> has been applied to the … … 1323 1323 Indexing done! 1324 1324 1325 One thing to note is that the first <person> block finds a class name1325 One thing to note is that the first E<lt>personE<gt> block finds a class name 1326 1326 "student" so all metanames that are created from attributes use the 1327 combined name "person.student". The second <person> block doesn't contain1327 combined name "person.student". The second E<lt>personE<gt> block doesn't contain 1328 1328 a "class" so, the attribute name is combined directly with the element 1329 1329 name (e.g. "person.greeting"). … … 1577 1577 1578 1578 1579 =item StoreDescription [XML <tag> size|HTML <meta> size|TXT size]1579 =item StoreDescription [XML E<lt>tagE<gt> size|HTML E<lt>metaE<gt> size|TXT size] 1580 1580 1581 1581 B<StoreDescription> allows you to store a document description in the index trunk/swish-e/pod/SWISH-FAQ.pod
r1613 r1623 1120 1120 1121 1121 That means there was one instance of our word in the title of the file. 1122 It's context was in the <head> tagset, inside the <title>.1123 The <title> is the most specific structure, so it gets the1122 It's context was in the E<lt>headE<gt> tagset, inside the E<lt>titleE<gt>. 1123 The E<lt>titleE<gt> is the most specific structure, so it gets the 1124 1124 RANK_TITLE score: 7. The base rank of 1 plus the structure score of 7 equals 8. If there 1125 1125 had been two instances of this word in the title, then the score would have been C<8 + 8 = 16>. trunk/swish-e/pod/SWISH-RUN.pod
r1492 r1623 594 594 EM, or STRONG), and c is HTML comment tags 595 595 596 search only in header ( <H*>) tags596 search only in header (E<lt>H*E<gt>) tags 597 597 598 598 swish-e -w word -t h … … 914 914 -x "xml_out: <swishtitle fmt='<title>%s</title>'>\n" 915 915 916 =item -H [0|1|2|3| <n>] (header output verbosity)916 =item -H [0|1|2|3|E<lt>nE<gt>] (header output verbosity) 917 917 918 918 The C<-H n> switch generates extened I<header> output. This is most useful when searching more than one trunk/swish-e/prog-bin/spider.pl.in
r1598 r1623 1710 1710 The spider does require Perl's LWP library and a few other reasonably common 1711 1711 modules. Most well maintained systems should have these modules installed. 1712 See L< REQUIREMENTS> below for more information. It's a good idea to check1712 See L</"REQUIREMENTS"> below for more information. It's a good idea to check 1713 1713 that you are running a current version of these modules. 1714 1714 … … 1729 1729 1730 1730 By default, this script will not spider files blocked by F<robots.txt>. In addition, 1731 The script will check for <meta name="robots"..> tags, which allows finer1731 The script will check for E<lt>meta name="robots"..E<gt> tags, which allows finer 1732 1732 control over what files are indexed and/or spidered. 1733 1733 See http://www.robotstxt.org/wc/exclusion.html for details. 1734 1734 1735 This spider provides an extension to the <meta> tag exclusion, by adding a1735 This spider provides an extension to the E<lt>metaE<gt> tag exclusion, by adding a 1736 1736 B<NOCONTENTS> attribute. This attribute turns on the C<no_contents> setting, which 1737 1737 asks swish-e to only index the document's title (or file name if not title is found). … … 2346 2346 2347 2347 The function calls are wrapped in an eval, so calling die (or doing something that dies) will just cause 2348 that URL to be skipped. If you really want to stop processing you need to set $server- >{abort} in your2348 that URL to be skipped. If you really want to stop processing you need to set $server-E<gt>{abort} in your 2349 2349 subroutine (or send a kill -HUP to the spider). 2350 2350 … … 2597 2597 2598 2598 Note that you can create your own counters to display in the summary list when spidering 2599 is finished by adding a value to the hash pointed to by C<$server- >{counts}>.2599 is finished by adding a value to the hash pointed to by C<$server-E<gt>{counts}>. 2600 2600 2601 2601 test_url => sub { trunk/swish-e/src/metanames.c
r1544 r1623 452 452 efree( meta->metaName ); 453 453 454 #ifndef USE_ BTREE454 #ifndef USE_PRESORT_ARRAY 455 455 if ( meta->sorted_data) 456 456 efree( meta->sorted_data );
