Changeset 1937

Show
Ignore:
Timestamp:
05/24/07 10:19:43 (1 year ago)
Author:
moseley
Message:

Peter can you fix my bad writing if needed? I got about 3 hours of sleep last night.
Add that on top of my normal poor writing....

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • swish-e/trunk/pod/SWISH-FAQ.pod

    r1888 r1937  
    434434C<-S prog> or C<-S http>. 
    435435 
    436 If you are spidering, use a F<robots.text> file in your document root. 
    437 This is a standard way to excluded files from search engines, and is 
    438 fully supported by Swish-e.  See http://www.robotstxt.org/ 
    439  
    440 You can also modify the F<spider.pl> spider perl program to skip, index 
    441 content only, or spider only listed web pages.  Type C<perldoc spider.pl> 
    442 in the C<prog-bin> directory for details. 
    443  
    444 If using the libxml2 library for parsing HTML, you may also use the Meta 
    445 Robots Exclusion in your documents: 
     436If you are spidering a site you have control over, use a F<robots.txt> file in 
     437your document root.  This is a standard way to excluded files from search 
     438engines, and is fully supported by Swish-e.  See http://www.robotstxt.org/ 
     439 
     440If spidering a website with the included F<spider.pl> program then add any 
     441necessary tests to the spider's configuration file. 
     442Type <perldoc spider.pl> in the C<prog-bin> directory for details or 
     443see the spider documentation on the Swish-e website.  Look for the section 
     444on L<callback functions|SWISH-FAQ/"callback_functions">. 
     445 
     446If using the libxml2 library for parsing HTML (which you probably are), you may 
     447also use the Meta Robots Exclusion in your documents: 
    446448 
    447449    <meta name="robots" content="noindex">