| 436 | | If you are spidering, use a F<robots.text> file in your document root. |
|---|
| 437 | | This is a standard way to excluded files from search engines, and is |
|---|
| 438 | | fully supported by Swish-e. See http://www.robotstxt.org/ |
|---|
| 439 | | |
|---|
| 440 | | You can also modify the F<spider.pl> spider perl program to skip, index |
|---|
| 441 | | content only, or spider only listed web pages. Type C<perldoc spider.pl> |
|---|
| 442 | | in the C<prog-bin> directory for details. |
|---|
| 443 | | |
|---|
| 444 | | If using the libxml2 library for parsing HTML, you may also use the Meta |
|---|
| 445 | | Robots Exclusion in your documents: |
|---|
| | 436 | If you are spidering a site you have control over, use a F<robots.txt> file in |
|---|
| | 437 | your document root. This is a standard way to excluded files from search |
|---|
| | 438 | engines, and is fully supported by Swish-e. See http://www.robotstxt.org/ |
|---|
| | 439 | |
|---|
| | 440 | If spidering a website with the included F<spider.pl> program then add any |
|---|
| | 441 | necessary tests to the spider's configuration file. |
|---|
| | 442 | Type <perldoc spider.pl> in the C<prog-bin> directory for details or |
|---|
| | 443 | see the spider documentation on the Swish-e website. Look for the section |
|---|
| | 444 | on L<callback functions|SWISH-FAQ/"callback_functions">. |
|---|
| | 445 | |
|---|
| | 446 | If using the libxml2 library for parsing HTML (which you probably are), you may |
|---|
| | 447 | also use the Meta Robots Exclusion in your documents: |
|---|