root/swish-e/trunk/conf/example6.config

Revision 861, 2.4 kB (checked in by whmoseley, 7 years ago)

Updated the swish URL

  • Property svn:eol-style set to native
  • Property svn:executable set to *
  • Property svn:keywords set to Author Date Id Revision
Line 
1 # ----- Example 6 - Spider using "prog" feature -------
2 #
3 #  Please see the swish-e documentation for
4 #  information on configuration directives.
5 #  Documentation is included with the swish-e
6 #  distribution, and also can be found on-line
7 #  at http://swish-e.org
8 #
9 #
10 #  This example demonstrates how to use the
11 #  new (as of 2.2) "prog" document source feature
12 #  to spider a webserver.
13 #
14 #  The "prog" document source feature allows
15 #  an external program to feed documents to
16 #  swish, one after another.  This allows you
17 #  to index documents from any source (e.g. web, DBMS)
18 #  and to filter and adjust the content before swish
19 #  indexes the content.
20 #
21 #  This example uses the provided spider.pl program
22 #  to spider a remote web server.  This spider offers
23 #  more features than the "http" spider method shown
24 #  in example7.config.
25 #
26 # ** Please don't test with this exact config **
27 #         spider your own web server
28 #
29 #  Indexing (spidering) is started with the following
30 #  command issued from the "conf" directory:
31 #
32 #     swish-e -S prog -c example6.config
33 #
34 #  Note: You should have the current Bundle::LWP bundle
35 #  of perl modules installed.  This was tested with:
36 #     libwww-perl-5.53
37 #  Run "perldoc spider.pl" in the prog-bin directory for
38 #  more information.
39 #
40 #  ** Do not spider a web server without permission **
41 #
42 #---------------------------------------------------
43
44 # Include our site-wide configuration settings:
45
46 IncludeConfigFile example4.config
47
48 # Specify the program to run
49 IndexDir ../prog-bin/spider.pl
50
51
52 # When running under the "prog" document source method you can
53 # pass a list of parameters to the program (specified with -i or IndexDir).
54
55 # If a parameter is passed to spider.pl, it will use that as the configuration
56 # file.
57
58 # As a special case, the word "default" followed by URL(s).
59 # In this case the spider will use default settings to spider the provided URLs.
60
61 SwishProgParameters default http://swish-e.org
62
63 # Note: the default used by spider.pl is SwishSpiderConfig.pl.
64 # See prog-bin/SwishSpiderConfig.pl for examples
65 # that include filtering PDF and MS Word documents.
66
67 # Tell swish that about how to parse the content
68 DefaultContents HTML
69 IndexContents HTML .htm .html
70 IndexContents TXT .txt .conf
71
72
73
74 # Just to make it interesting, let's modify the URL that get's indexed:
75 # replace http://swish-e.org/ => http:/localhost/
76
77 ReplaceRules replace swish-e.org localhost
78
79
80 # end of example
81
Note: See TracBrowser for help on using the browser.