root/swish-e/branches/2.6/filters/README

Revision 1334, 3.4 kB (checked in by whmoseley, 5 years ago)

suppressed warning in XLtoHTML, and updated swish-filter-test
to work with OO interface of SWISH::Filtery

Argh -- A reason not to build in the source directory -- I edited
swish-filter-test instead of swish-filter-test.in

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
Line 
1 Filtering documents with SWISH::Filter
2 --------------------------------------
3
4 Swish-e knows only how to parse HTML, XML, and text files.
5 Other file types may be indexed with the help of filters.
6
7 SWISH::Filter is a Perl module designed to make converting
8 documents from one type of content to another type of content
9 easy.  It's uses a plug-in type of system where new filters
10 can be added with little effort.
11
12 SWISH::Filter (and associated plug-in filter modules) do not
13 normally do the actual filtering.  This system provides only
14 an interface to the programs that do the filtering.
15
16 For example, the Swish-e distribution includes a filter plug-in
17 called SWISH::Filters::Pdf2HTML.  For this filter to work you must
18 install the xpdf package that includes the pdftotext and pdfinfo
19 programs.  SWISH::Filters::Pdf2HTML only provides a unified interface
20 to this programs.
21
22 The included program F<spider.pl> will use SWISH::Filter by default.
23 This means that installing the programs that do the filter is all that
24 is needed to start filtering documents.  For example, installing the
25 xpdf package will enable indexing of PDF file when spidering.
26
27 The filter modules are in the $libexecdir/perl directory.  Running swish-e
28 -h will list the setting for $libexecdir, but is typically
29 /usr/local/lib/swish-e if swish-e was built from source, or /usr/lib/swish-e
30 if installed as a package.  On Window $libexecdir will be set at
31 installation time.
32
33 Note that $libexecdir/perl is not normally part of Perl's @INC array. So to
34 read documenation on a specific filter you will need to either specify the
35 full path to the filter or set PERL5LIB.  For example:
36
37     export PERL5LIB=/usr/local/lib/swish-e/perl
38     perldoc SWISH::Filter
39
40 Documentation for SWISH::Filter can also be found in the html directory and
41 at http://swish-e.org.
42
43 Swish-e has another filter system. The FileFilter directive that can be used
44 to filter documents through an external program while indexing. That system
45 requires a separate filter setup for each type of document. See the
46 SWISH-CONFIG page for information on that type of filtering.
47
48
49 Testing SWISH::Filter
50 ---------------------
51
52 The program swish-filter-test in installed by default (in the same location as
53 the swish-e binary).  This program can be used to test SWISH::Filter.  For example,
54 run the command:
55
56     $ swish-filter-test foo.pdf foo.txt
57
58     Document foo.pdf was  filtered.
59        Document:     foo.pdf
60        Content-Type: text/html  (initial was application/pdf)
61        Parser type:  HTML*
62
63     Document foo.txt was not filtered.
64        Document:     foo.txt
65        Content-Type: text/plain  (initial was text/plain)
66        Parser type:  TXT*
67
68 Run the command
69
70    $ swish-filter-test -man
71
72 for documentation.
73
74
75 Current filters distributed with Swish-e:
76 -----------------------------------------
77
78 All of these filters require installation of helper programs and/or Perl modules.
79 See the individual module's documentation for dependencies.
80
81     SWISH::Filters::Doc2txt     - converts MS Word documents to text
82     SWISH::Filters::Pdf2HTML    - converts PDF files to HTML with info tags as metanames
83     SWISH::Filters::ID3toHTML   - extracts out ID3 (v1 and v2) tags from MP3 files
84     SWISH::Filters::XLtoHTML    - converts MS Excel to HTML
85
86 Filters that depend on Perl modules that are not installed will not load.
87 Setting the environment variable FILTER_DEBUG may report helpful errors when using
88 filters.
89
90 See perldoc SWISH::Filter for instructions on creating filters.
91
Note: See TracBrowser for help on using the browser.