| 1 |
=head1 NAME |
|---|
| 2 |
|
|---|
| 3 |
CHANGES - List of revisions |
|---|
| 4 |
|
|---|
| 5 |
=head1 OVERVIEW |
|---|
| 6 |
|
|---|
| 7 |
This document contains list of bug fixes and feature additions to Swish-e. |
|---|
| 8 |
|
|---|
| 9 |
=head2 Version 2.4.6 - 10 March 2008 |
|---|
| 10 |
|
|---|
| 11 |
=over 4 |
|---|
| 12 |
|
|---|
| 13 |
=item MinWordLength respected in query parser |
|---|
| 14 |
|
|---|
| 15 |
Clark Vent reported that the query parser was not respecting MinWordLength |
|---|
| 16 |
settings. See http://dev.swish-e.org/changeset/2145 |
|---|
| 17 |
|
|---|
| 18 |
=item Patch to file.c. |
|---|
| 19 |
|
|---|
| 20 |
The file.c patch was in response to |
|---|
| 21 |
http://swish-e.org/archive/2007-03/11321.html |
|---|
| 22 |
although that user never responded about that patch. |
|---|
| 23 |
|
|---|
| 24 |
=item SWISH_DEBUG_RANK env var now enables rank debugging |
|---|
| 25 |
|
|---|
| 26 |
Set SWISH_DEBUG_RANK to a true value to enable lots of rank debugging |
|---|
| 27 |
on stderr. |
|---|
| 28 |
|
|---|
| 29 |
=item Perl Makefile.PL patched to fix MakeMaker issue |
|---|
| 30 |
|
|---|
| 31 |
Recent versions of ExtUtils::MakeMaker revealed a bug in Makefile.PL. |
|---|
| 32 |
Patch from mschwern via RT, report by mpeters. |
|---|
| 33 |
|
|---|
| 34 |
=item LARGEFILE support detected automatically in configure |
|---|
| 35 |
|
|---|
| 36 |
jrobinson852@yahoo.com suggest LARGEFILE support be auto-detected since |
|---|
| 37 |
it is needed so often on Linux systems. |
|---|
| 38 |
|
|---|
| 39 |
=item New Snowball stemmers |
|---|
| 40 |
|
|---|
| 41 |
Trygve Falch contributed patches to update |
|---|
| 42 |
the Snowball stemmers, including new Hungarian and Romanian stemmers. |
|---|
| 43 |
|
|---|
| 44 |
=item Patched leaks |
|---|
| 45 |
|
|---|
| 46 |
Anthony Dovgal patched two leaks. One when there's a failure to |
|---|
| 47 |
open a file the file name was not freed. |
|---|
| 48 |
|
|---|
| 49 |
SwishSetSearchLimit() was nulling the search limits when an error was |
|---|
| 50 |
found in the parameters, but not freeing the existing limits. |
|---|
| 51 |
|
|---|
| 52 |
=item Leak in SwishResetSearchLimit |
|---|
| 53 |
|
|---|
| 54 |
Fixed a leak if a limit was set and then reset but not prepared. |
|---|
| 55 |
Patch provided by Antony Dovgal. |
|---|
| 56 |
|
|---|
| 57 |
=item New API functions added |
|---|
| 58 |
|
|---|
| 59 |
Added SwishGetStructure() and SwishGetPhraseDelimiter() functions which return |
|---|
| 60 |
relevant properties of the search object. |
|---|
| 61 |
Patch provided by Antony Dovgal. |
|---|
| 62 |
|
|---|
| 63 |
|
|---|
| 64 |
=back |
|---|
| 65 |
|
|---|
| 66 |
=head2 Version 2.4.5 - 22 Jan 2007 |
|---|
| 67 |
|
|---|
| 68 |
=over 4 |
|---|
| 69 |
|
|---|
| 70 |
=item Fixed 'deflate' handling in spider.pl |
|---|
| 71 |
|
|---|
| 72 |
spider.pl was using the wrong method do uncompress HTTP responses that were |
|---|
| 73 |
'deflate' encoded. Also decode content based on the document's charset and |
|---|
| 74 |
encode back to charset before outputting. |
|---|
| 75 |
|
|---|
| 76 |
=item re-indexing required |
|---|
| 77 |
|
|---|
| 78 |
The magic numbers in src/swish.h were changed to require re-indexing from |
|---|
| 79 |
version 2.4.4 indexes. This should have been done in 2.4.4 as well, and anytime |
|---|
| 80 |
the index format changes. -- karman |
|---|
| 81 |
|
|---|
| 82 |
=item fixed stemmer bug introduced in 2.4.4 |
|---|
| 83 |
|
|---|
| 84 |
stemmer.c had a mix up in the deprecated stemmer assignments for "Stemmer_en" |
|---|
| 85 |
and "Stem". Also fixed stemmer.h so that 2.4.3 indexes can be read correctly. |
|---|
| 86 |
-- karman |
|---|
| 87 |
|
|---|
| 88 |
=item Now fork/exec to run filters |
|---|
| 89 |
|
|---|
| 90 |
FileFilter* was using popen to run the filter, which could pass user |
|---|
| 91 |
data though the shell. Now uses fork/exec if fork is available which |
|---|
| 92 |
should be everywhere except Windows. In windows popen is used but all |
|---|
| 93 |
parameters are double-quoted. -- moseley |
|---|
| 94 |
|
|---|
| 95 |
=item fixed signed/unsigned warnings from gcc 4.x |
|---|
| 96 |
|
|---|
| 97 |
Cleaned up search.c to catch mismatched signedness warnings from newer GCC versions. |
|---|
| 98 |
This issue pre-existed 2.4.4 but the new wildcard features in search.c made for a lot |
|---|
| 99 |
more warnings. -- karman |
|---|
| 100 |
|
|---|
| 101 |
=item Makefile.mingw included in distrib |
|---|
| 102 |
|
|---|
| 103 |
Modified root Makefile to include the perl/Makefile.mingw file. -- karman |
|---|
| 104 |
|
|---|
| 105 |
=back |
|---|
| 106 |
|
|---|
| 107 |
=head2 Version 2.4.4 - 11 Oct 2006 |
|---|
| 108 |
|
|---|
| 109 |
=over 4 |
|---|
| 110 |
|
|---|
| 111 |
=item Version 2.4.4 RC1 |
|---|
| 112 |
|
|---|
| 113 |
Release Candidate 1 for 2.4.4, 2 Oct 2006. |
|---|
| 114 |
|
|---|
| 115 |
=item quote fix for FileFilter config param |
|---|
| 116 |
|
|---|
| 117 |
Ludovic Drolez contributed a patch to fix a quoting issue with filenames. This affects |
|---|
| 118 |
non-Windows builds only. |
|---|
| 119 |
|
|---|
| 120 |
=item SWISH::Filter now on CPAN |
|---|
| 121 |
|
|---|
| 122 |
SWISH::Filter is now available on http://cpan.org/. The version in the distribution is |
|---|
| 123 |
B<not> kept in sync with the CPAN version. Install the CPAN version if you want |
|---|
| 124 |
the latest and greatest version. |
|---|
| 125 |
|
|---|
| 126 |
=item SWISH::API updated to 0.04 |
|---|
| 127 |
|
|---|
| 128 |
Added several fixes, including: |
|---|
| 129 |
|
|---|
| 130 |
=over |
|---|
| 131 |
|
|---|
| 132 |
=item Perlish method names from mpeters@plusthree.com |
|---|
| 133 |
|
|---|
| 134 |
=item switched to XSLoader with DynaLoader as fallback |
|---|
| 135 |
|
|---|
| 136 |
=item added VERSION method to satisfy some versions of MakeMaker |
|---|
| 137 |
|
|---|
| 138 |
=item Fuzzify() method now actually works as advertised |
|---|
| 139 |
|
|---|
| 140 |
=back |
|---|
| 141 |
|
|---|
| 142 |
=item added proximity feature and single character wildcard with '?' instead of '*' |
|---|
| 143 |
|
|---|
| 144 |
Herman Knoops contributed these patches. |
|---|
| 145 |
See http://swish-e.org/archive/2006-05/10543.html |
|---|
| 146 |
|
|---|
| 147 |
Error messages were also changed to better reflect correct use of wildcards. |
|---|
| 148 |
|
|---|
| 149 |
=item fixed bug when using DoubleMetaphone |
|---|
| 150 |
|
|---|
| 151 |
Fixed problem reported by Andreas Völter where a query that generated a |
|---|
| 152 |
two-word query with DoubleMetaphone fuzzy mode was not working. |
|---|
| 153 |
|
|---|
| 154 |
=item fix sparc64 property issue |
|---|
| 155 |
|
|---|
| 156 |
Sorithy Seng (pourlassi@gmail.com) submitted a patch against docprop.c to fix |
|---|
| 157 |
an issue on sparc64 platforms. It is unknown whether this bug affected other 64-bit |
|---|
| 158 |
architectures. |
|---|
| 159 |
|
|---|
| 160 |
=item fixed bug when StopWords resulted in no unique words |
|---|
| 161 |
|
|---|
| 162 |
Added check in db_native.c to check that some words exist before writing index. |
|---|
| 163 |
|
|---|
| 164 |
=item updates to SWISH-RUN.1 |
|---|
| 165 |
|
|---|
| 166 |
Added doc for -u and -r options. |
|---|
| 167 |
|
|---|
| 168 |
=item filename only in SWISH::Filters |
|---|
| 169 |
|
|---|
| 170 |
added fix to SWISH::Filters::pp2html and SWISH::Filters::XLtoHTML to |
|---|
| 171 |
save only filename as title without full path |
|---|
| 172 |
|
|---|
| 173 |
=item Removed Stem and Stemmer_en |
|---|
| 174 |
|
|---|
| 175 |
The legacy Porter stemmer was removed. This had been deprecated some time ago. |
|---|
| 176 |
A warning will issue if the old stemmer is indicated in config file, and Stemmer_en1 |
|---|
| 177 |
will be used instead. |
|---|
| 178 |
|
|---|
| 179 |
=item GPL'd all the source files with the new Swish-e License |
|---|
| 180 |
|
|---|
| 181 |
After a source code review, the developers decided to put Swish-e under the GPL |
|---|
| 182 |
with a special exception for linking against libswish-e. See http://swish-e.org/license.html |
|---|
| 183 |
for the details. |
|---|
| 184 |
|
|---|
| 185 |
=item Fixed Segfault with updating incremental index |
|---|
| 186 |
|
|---|
| 187 |
Dobrica Pavlinusic reported a segfaut after updating an index multiple times. |
|---|
| 188 |
José provided updated worddata.c. - April 27, 2005 |
|---|
| 189 |
|
|---|
| 190 |
=item Fixed NOT check with incremental indexes |
|---|
| 191 |
|
|---|
| 192 |
Swish was returning results for deleted files when the NOT operator was used. |
|---|
| 193 |
|
|---|
| 194 |
=item Fixed bug when using old parsers with zero length input |
|---|
| 195 |
|
|---|
| 196 |
Thomas Angst reported swish consuming memory when using -S prog |
|---|
| 197 |
to process large number of empty documents. |
|---|
| 198 |
|
|---|
| 199 |
When -S prog generated a zero length file the old parsers (e.g. TXT) would |
|---|
| 200 |
attempt to read in *all* content from the -S prog program into a buffer. |
|---|
| 201 |
The old parser incorrectly assumed it was reading from a filter and tried to |
|---|
| 202 |
read to eof(). |
|---|
| 203 |
|
|---|
| 204 |
=item Changes to ParserWarnLevel |
|---|
| 205 |
|
|---|
| 206 |
The default value for ParserWarnLevel was changed form zero to two. |
|---|
| 207 |
|
|---|
| 208 |
The ParserWarnLevel controls the error handling of the libxml2 parser. The higher |
|---|
| 209 |
the setting, the more verbose the output. The change to the default is to report |
|---|
| 210 |
when libxml2 has problems parsing a document (which often times results in processing |
|---|
| 211 |
only part of a document). |
|---|
| 212 |
|
|---|
| 213 |
To get the old behavior, either set ParserWarnLevel to zero in your config file, |
|---|
| 214 |
or use the new -W command line option to set the ParserWarnLevel at run time. |
|---|
| 215 |
If ParserWarnLevel is set in the config file, it will override the -W option. |
|---|
| 216 |
|
|---|
| 217 |
Also, to see UTF-8 to 8859-1 conversion errors set ParserWarnLevel to 3 or more. Previously, |
|---|
| 218 |
these warning were issues at ParserWarnLevel of one. |
|---|
| 219 |
|
|---|
| 220 |
=item Documentation changes |
|---|
| 221 |
|
|---|
| 222 |
Removed all the target documentation (html, pdf, ps) from cvs. There's now a separate |
|---|
| 223 |
cvs module "swish_website" that is used to generate both the website and the html |
|---|
| 224 |
docs. If building swish-e from cvs please see the README.cvs file for instructions. |
|---|
| 225 |
|
|---|
| 226 |
=item Fixed bug in pre-sorted indexes with USE_BTREE |
|---|
| 227 |
|
|---|
| 228 |
Gunnar Mätzler reported a problem with reading the pre-sorted property index |
|---|
| 229 |
tables when running with USE_BTREE (--enable-enremental). Not all entries were |
|---|
| 230 |
being written to disk. There was/is a question if the "array" code used for |
|---|
| 231 |
pre-sorted indexes with USE_BTREE would be slower. So, added a separate |
|---|
| 232 |
define USE_PRESORT_ARRAY to enable that code when USE_BTREE is set. This allows |
|---|
| 233 |
using the old integer arrays with USE_BTREE. Gunnar reported that this is working, |
|---|
| 234 |
but more testing is needed. Need to compare speed of the array code vs. the non-array |
|---|
| 235 |
code, and to verify the workings of USE_PRESORT_ARRAY code. |
|---|
| 236 |
|
|---|
| 237 |
=item Add strcoll() usage for sorting properties |
|---|
| 238 |
|
|---|
| 239 |
Andreas Seltenreich provided a patch to use strcoll when sorting properties. |
|---|
| 240 |
strcoll is locale dependent. |
|---|
| 241 |
|
|---|
| 242 |
=item Fix incremental indexing when adding back a file |
|---|
| 243 |
|
|---|
| 244 |
Jose fixed a problem with incremental indexing where a file could not be |
|---|
| 245 |
added back to the index once removed. |
|---|
| 246 |
|
|---|
| 247 |
Patch initially provided by Dobrica Pavlinusic: |
|---|
| 248 |
|
|---|
| 249 |
http://swish-e.org/Discussion/archive/2004-12/8694.html |
|---|
| 250 |
|
|---|
| 251 |
|
|---|
| 252 |
|
|---|
| 253 |
=item Documentation correction |
|---|
| 254 |
|
|---|
| 255 |
A change in the default way the index is compressed was not documented |
|---|
| 256 |
in 2.4.3. The change resulted in larger indexes. See CompressPositions |
|---|
| 257 |
below and in SWISH-CONFIG. |
|---|
| 258 |
|
|---|
| 259 |
=item libxml2 UTF-8 conversion failures |
|---|
| 260 |
|
|---|
| 261 |
Fixed issue where a UTF-8 to Latin1 encoding failure would skip |
|---|
| 262 |
more input than just the failed character. Libxml2 passes swish text |
|---|
| 263 |
that is not null terminated, but the libxml2 functions to skip UTF-8 |
|---|
| 264 |
chars expected a null-terminated string. Replace libxml2 call with |
|---|
| 265 |
fixed version. |
|---|
| 266 |
|
|---|
| 267 |
=back |
|---|
| 268 |
|
|---|
| 269 |
=head2 Version 2.4.3 December 9, 2004 |
|---|
| 270 |
|
|---|
| 271 |
=over 4 |
|---|
| 272 |
|
|---|
| 273 |
=item New config directive: CompressPositions |
|---|
| 274 |
|
|---|
| 275 |
This option enables zlib compression for word data in the index. |
|---|
| 276 |
Previously word data was always compressed but resulted in slower |
|---|
| 277 |
wildcard searches. The default now is to not compress the word data, |
|---|
| 278 |
but results in larger index files. Set to "YES" to get pre-2.4.3 index |
|---|
| 279 |
sizes. |
|---|
| 280 |
|
|---|
| 281 |
[This CHANGES entry was added after 2.4.3 was released] |
|---|
| 282 |
|
|---|
| 283 |
=item Improved error messsages when using incremental indexing |
|---|
| 284 |
|
|---|
| 285 |
There was a bit of confusion on how to use incremental indexing (still |
|---|
| 286 |
experimental) so added better logic for error messages. |
|---|
| 287 |
|
|---|
| 288 |
Also fixed a logic error when setting the incremental update mode. Caught by |
|---|
| 289 |
Paul Loner. |
|---|
| 290 |
|
|---|
| 291 |
=back |
|---|
| 292 |
|
|---|
| 293 |
=head2 Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004 |
|---|
| 294 |
|
|---|
| 295 |
=over 4 |
|---|
| 296 |
|
|---|
| 297 |
=item "Fixed" libxml2's change in UTF8Toisolat1() return value |
|---|
| 298 |
|
|---|
| 299 |
Bernhard Weisshuhn supplied a patch to parser.c for checking the return value of |
|---|
| 300 |
UTF8Toisolat1(). Seems that libxml2 now returns the number of characters converted |
|---|
| 301 |
instead of zero for success. |
|---|
| 302 |
|
|---|
| 303 |
http://bugzilla.gnome.org/show_bug.cgi?id=153937 |
|---|
| 304 |
|
|---|
| 305 |
=item Added swish-config and pkg-config |
|---|
| 306 |
|
|---|
| 307 |
Swish now provides a swish-config script and config file for the pkg-config |
|---|
| 308 |
utility. These tools help when building programs that link with the swish-e |
|---|
| 309 |
library. |
|---|
| 310 |
|
|---|
| 311 |
The SWISH::API Makefile.PL program uses swish-config to locate the installation |
|---|
| 312 |
directory of swish-e. This should make building SWISH::API easier when swish-e |
|---|
| 313 |
is installed in a non-standard location. |
|---|
| 314 |
|
|---|
| 315 |
=item Fixed rank bias in merge |
|---|
| 316 |
|
|---|
| 317 |
Peter van Dijk noticed that MetaNamesRank settings were not being copied to the output |
|---|
| 318 |
index when merging. |
|---|
| 319 |
|
|---|
| 320 |
=item Added SwishFuzzy function |
|---|
| 321 |
|
|---|
| 322 |
SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without first searching. |
|---|
| 323 |
This might be helpful for playing with queries prior to the search. |
|---|
| 324 |
|
|---|
| 325 |
|
|---|
| 326 |
=item Fixed translate character table |
|---|
| 327 |
|
|---|
| 328 |
Michael Levy found an error in the table used to translate 8859-1 to |
|---|
| 329 |
ascii7. Luckily, it was an upper case translation and the table is only used on lower |
|---|
| 330 |
case characters. |
|---|
| 331 |
|
|---|
| 332 |
=item MetaNamesRank documentation |
|---|
| 333 |
|
|---|
| 334 |
Changed the 'not yet implemented' caveat to 'implemented but experimental'. |
|---|
| 335 |
|
|---|
| 336 |
=item Added Continuation option to config processing |
|---|
| 337 |
|
|---|
| 338 |
You can now use continuation lines in the config file: |
|---|
| 339 |
|
|---|
| 340 |
IgnoreWords \ |
|---|
| 341 |
the \ |
|---|
| 342 |
am \ |
|---|
| 343 |
is \ |
|---|
| 344 |
are \ |
|---|
| 345 |
was |
|---|
| 346 |
|
|---|
| 347 |
There may not be any characters following the backslash. |
|---|
| 348 |
|
|---|
| 349 |
=item Fixed Buzzwords (and other word lists entered in the config) |
|---|
| 350 |
|
|---|
| 351 |
Words entered in config were not converted to lower case before storing in the index. |
|---|
| 352 |
|
|---|
| 353 |
|
|---|
| 354 |
=item Fixed metaname mapping problem in Merge |
|---|
| 355 |
|
|---|
| 356 |
Peter Karman found an error when merging indexes where the source indexes had the |
|---|
| 357 |
same metanames, but listed in a different order in their config files. Words |
|---|
| 358 |
would then be indexed under the wrong metaID number in the output index. |
|---|
| 359 |
|
|---|
| 360 |
|
|---|
| 361 |
=item SWISH::Filters and spider.pl updates |
|---|
| 362 |
|
|---|
| 363 |
The web spider F<spider.pl> was updated to work better with SWISH::Filter |
|---|
| 364 |
by default and also make it easier to use the spider default along with |
|---|
| 365 |
a spider config file. See spider.pl for details. |
|---|
| 366 |
|
|---|
| 367 |
SWISH::Filter was updated. The way filters are created has changed. |
|---|
| 368 |
If you created your own filters you will need to update them. Take a look |
|---|
| 369 |
at SWISH::Filter and the filters included in the distribution. |
|---|
| 370 |
|
|---|
| 371 |
=item Updates to Documentation |
|---|
| 372 |
|
|---|
| 373 |
Richard Morin submitted formatting and punctuation dates to the README and |
|---|
| 374 |
INSTALL docs. |
|---|
| 375 |
|
|---|
| 376 |
=item Added -R option to support IDF word weighting in ranking. (karman) |
|---|
| 377 |
|
|---|
| 378 |
Added Inverse Document Frequency calculation to the getrank() routine. |
|---|
| 379 |
This will allow the relative frequency of a word in relationship to other |
|---|
| 380 |
words in the query to impact the ranking of documents. |
|---|
| 381 |
|
|---|
| 382 |
Example: if 'foo' is present twice as often as 'bar' in the collection as a whole, |
|---|
| 383 |
a search for 'foo bar' will weight documents with 'bar' more heavily (i.e., higher |
|---|
| 384 |
rank) than those with 'foo'. |
|---|
| 385 |
|
|---|
| 386 |
The impact is greatest when OR'ing words in a query rather than |
|---|
| 387 |
AND'ing them (which is the default). |
|---|
| 388 |
|
|---|
| 389 |
Also added Rank discussion to the FAQ. |
|---|
| 390 |
|
|---|
| 391 |
|
|---|
| 392 |
=item Updates to the example scripts |
|---|
| 393 |
|
|---|
| 394 |
Updated PhraseHighlight.pm as suggested by Bill Schell for an optimization |
|---|
| 395 |
when all words in a document are highlighted. |
|---|
| 396 |
|
|---|
| 397 |
Updated search.cgi and PhraseHighlight.pm to use the internal stemmers via |
|---|
| 398 |
the SWISH::API module as suggested by Jonas Wolf. |
|---|
| 399 |
|
|---|
| 400 |
|
|---|
| 401 |
=item Leak when using C library |
|---|
| 402 |
|
|---|
| 403 |
David Windmueller found a memory leak when calling multiple searches |
|---|
| 404 |
on a swish handle. The problem was swish loading the pre-sorted |
|---|
| 405 |
property index on every search, even after the table had been loaded |
|---|
| 406 |
into memory. |
|---|
| 407 |
|
|---|
| 408 |
=item Swish.cgi now kills swish-e on time out |
|---|
| 409 |
|
|---|
| 410 |
The example script F<swish.cgi> uses an alarm (on platforms that support |
|---|
| 411 |
alarm) to abort processing after some number of seconds, but it was not |
|---|
| 412 |
killing the child process, swish-e. Bill Schell submitted a patch to kill |
|---|
| 413 |
the child when the alarm triggers. |
|---|
| 414 |
|
|---|
| 415 |
=item The template search.tt was renamed to swish.tt |
|---|
| 416 |
|
|---|
| 417 |
The template was renamed because it's used by F<swish.cgi>, not by |
|---|
| 418 |
F<search.cgi>, which was confusing. |
|---|
| 419 |
|
|---|
| 420 |
=item Updates to the search.cgi |
|---|
| 421 |
|
|---|
| 422 |
The example script F<search.cgi> was updated to work better with mod_perl |
|---|
| 423 |
and to use external template files and style sheets. |
|---|
| 424 |
|
|---|
| 425 |
|
|---|
| 426 |
=item New MS Word Filter |
|---|
| 427 |
|
|---|
| 428 |
James Job provided the SWISH::Filter::Doc2html filter that uses |
|---|
| 429 |
the wvWare (http://wvware.sourceforge.net/) program for filtering |
|---|
| 430 |
MS Word documents. If both catdoc and wvWare are installed then wvWare |
|---|
| 431 |
will be used. |
|---|
| 432 |
|
|---|
| 433 |
wvWare is reported to do a good job at converting MS Word docs |
|---|
| 434 |
to HTML. In a few tests it did work well, but other cases it |
|---|
| 435 |
failed to generate correct output. It was also much, much slower |
|---|
| 436 |
than catdoc. I tested with wvWare 0.7.3 on Debian Linux. Testing with |
|---|
| 437 |
both is recommended. |
|---|
| 438 |
|
|---|
| 439 |
=item Change in way symbolic links are followed |
|---|
| 440 |
|
|---|
| 441 |
John-Marc Chandonia pointed out that if a symlink is skipped |
|---|
| 442 |
by FileRules, then the actual file/directory is marked as |
|---|
| 443 |
"already seen" and cannot be indexed by other links or directly. |
|---|
| 444 |
|
|---|
| 445 |
Now, files and directories are not marked "already seen" until |
|---|
| 446 |
after passing FileRules (i.e after a file is actually indexed |
|---|
| 447 |
or a directory is processed). |
|---|
| 448 |
|
|---|
| 449 |
=item Could not set SwishSetSort() more than once |
|---|
| 450 |
|
|---|
| 451 |
David Windmueller found a problem when trying to set the sort |
|---|
| 452 |
order more than once on an existing search object. Memory was not |
|---|
| 453 |
correctly reset after clearing the previous sort values. |
|---|
| 454 |
|
|---|
| 455 |
=item Access MetaNames and PropertyNames from API |
|---|
| 456 |
|
|---|
| 457 |
Patch provided by Jamie Herre to access the MetaNames and PropertyNames |
|---|
| 458 |
via the C API and to test via the testlib program. Swish::API also updated |
|---|
| 459 |
to access this data. |
|---|
| 460 |
|
|---|
| 461 |
=item SwishResultPropertyULong() bug fixed |
|---|
| 462 |
|
|---|
| 463 |
David Windmueller reported that SwishResultPropertyULong() was |
|---|
| 464 |
returning ULONG_MAX on all calls. This was fixed. |
|---|
| 465 |
|
|---|
| 466 |
=item Null written to wrong location in file.c |
|---|
| 467 |
|
|---|
| 468 |
Bill Schell with the help of valgrind found a null written past the end of a |
|---|
| 469 |
buffer in file.c in the code that supports the old parsers. This resulted in a |
|---|
| 470 |
segfault while indexing a large set of XML documents. |
|---|
| 471 |
|
|---|
| 472 |
=item Fixed problem when indexing very large files |
|---|
| 473 |
|
|---|
| 474 |
Steve Harris reported a problem when indexing a very large document that |
|---|
| 475 |
caused an integer overflow. José Ruiz updated to used unsigned integers. |
|---|
| 476 |
|
|---|
| 477 |
=item Bump word position on block tags with HTML2 parser |
|---|
| 478 |
|
|---|
| 479 |
Peter Karman pointed out the the libxml2 HTML parser was allowing phrase |
|---|
| 480 |
matches across block level html elements. Swish now bumps the word |
|---|
| 481 |
position on these elements. |
|---|
| 482 |
|
|---|
| 483 |
|
|---|
| 484 |
=back |
|---|
| 485 |
|
|---|
| 486 |
=head2 Version 2.4.2 - March 09, 2004 |
|---|
| 487 |
|
|---|
| 488 |
=over 4 |
|---|
| 489 |
|
|---|
| 490 |
=item * UseStemming didn't take no for an answer |
|---|
| 491 |
|
|---|
| 492 |
UseStemming was coded as an alias for FuzzyIndexingMode when Snowball was |
|---|
| 493 |
compiled in (the default), but "no" doesn't always mean no when the Norwegian |
|---|
| 494 |
stemmer is available. |
|---|
| 495 |
|
|---|
| 496 |
=item * Fixed problem building incremental version |
|---|
| 497 |
|
|---|
| 498 |
Fixed compile problem with building incremental indexing mode. This is an |
|---|
| 499 |
experimental option with swish-e to allow adding files to an index. |
|---|
| 500 |
See configure --help for build option. Incremental indexes are not |
|---|
| 501 |
compatible with standard indexes. |
|---|
| 502 |
|
|---|
| 503 |
=item * Updated build instructions in INSTALL |
|---|
| 504 |
|
|---|
| 505 |
Added a few comments about use of CPPFLAGS and LDFLAGS. |
|---|
| 506 |
|
|---|
| 507 |
=item * Updated the index_hypermail.pl |
|---|
| 508 |
|
|---|
| 509 |
Updated to work with latest version of hypermail (pre-2.1.9). |
|---|
| 510 |
|
|---|
| 511 |
|
|---|
| 512 |
=item * Time zone in ResultPropertyStr() |
|---|
| 513 |
|
|---|
| 514 |
Format string for generating date did not include the time zone in location. |
|---|
| 515 |
Add strftime format string to config.h |
|---|
| 516 |
|
|---|
| 517 |
=item * Undefined and Blank Properties and (NULL) |
|---|
| 518 |
|
|---|
| 519 |
Fixed a few problems with printing properties: |
|---|
| 520 |
|
|---|
| 521 |
1) Using -p and -x showed different results if a bad property value was given: |
|---|
| 522 |
|
|---|
| 523 |
$ swish-e -w not dkdk -p badname -H0 |
|---|
| 524 |
err: Unknown Display property name "badname" |
|---|
| 525 |
. |
|---|
| 526 |
$ swish-e -w not dkdk -x '<badname>\n' -H0 |
|---|
| 527 |
(NULL) |
|---|
| 528 |
|
|---|
| 529 |
Now both return an error. |
|---|
| 530 |
|
|---|
| 531 |
2) Fixed bug where using a "fmt" string with -x output generated (bad) output |
|---|
| 532 |
if the result did not have the specified property. |
|---|
| 533 |
|
|---|
| 534 |
$ swish-e -w not dkdk -x '<somedate>\n' -H0 # undefined value |
|---|
| 535 |
|
|---|
| 536 |
$ swish-e -w not dkdk -x '<somedate fmt="%Y %B %d">\n' -H0 |
|---|
| 537 |
%Y %B 1075353525 |
|---|
| 538 |
|
|---|
| 539 |
Now nothing is printed if the property does not exist. |
|---|
| 540 |
|
|---|
| 541 |
3) Updated SWISH::API to croak() on invalid property names, and to return |
|---|
| 542 |
undefined values for missing properties. |
|---|
| 543 |
|
|---|
| 544 |
4) Updated swish.cgi and search.cgi to not generate warnings on undefined values |
|---|
| 545 |
return as properties. Note that swish.cgi will now die on undefined properties. |
|---|
| 546 |
Previously would just display (NULL). |
|---|
| 547 |
|
|---|
| 548 |
|
|---|
| 549 |
=item * Fixed segfault when generating warnings while parsing |
|---|
| 550 |
|
|---|
| 551 |
Parser.c was incorrectly calling warning() incorrectly. |
|---|
| 552 |
And -Wall was not catching this! |
|---|
| 553 |
|
|---|
| 554 |
=item * Added check for internal property names. |
|---|
| 555 |
|
|---|
| 556 |
Parser was not checking for use of Swish-e reserved property |
|---|
| 557 |
names. |
|---|
| 558 |
|
|---|
| 559 |
<swishrank>foo</swishrank> |
|---|
| 560 |
|
|---|
| 561 |
This will now generate a warning. |
|---|
| 562 |
|
|---|
| 563 |
=back |
|---|
| 564 |
|
|---|
| 565 |
=head2 Version 2.4.1 - December 17, 2003 |
|---|
| 566 |
|
|---|
| 567 |
=over 4 |
|---|
| 568 |
|
|---|
| 569 |
=item * Added new example CGI script |
|---|
| 570 |
|
|---|
| 571 |
search.cgi is a new skeleton CGI script that uses SWISH::API for searching. |
|---|
| 572 |
It is installed in the same location as swish.cgi. |
|---|
| 573 |
|
|---|
| 574 |
=item * Add Fuzzy access to C and Perl interfaces |
|---|
| 575 |
|
|---|
| 576 |
Added a number of functions to the C API (and SWISH::API) |
|---|
| 577 |
to access the stemmer used when indexing a given index. |
|---|
| 578 |
|
|---|
| 579 |
=item * Commas in numbers |
|---|
| 580 |
|
|---|
| 581 |
Added commas to summary display at end of indexing. |
|---|
| 582 |
|
|---|
| 583 |
=item * Insert whitespace between tags |
|---|
| 584 |
|
|---|
| 585 |
Parser.c was updated to flush the text buffer before and after |
|---|
| 586 |
every (non-inline HTML) tag. |
|---|
| 587 |
|
|---|
| 588 |
The problem was that: |
|---|
| 589 |
|
|---|
| 590 |
foo<tag>bar</tag>baz |
|---|
| 591 |
|
|---|
| 592 |
would index as a single word "foobarbaz". |
|---|
| 593 |
|
|---|
| 594 |
=item * DirTree.pl |
|---|
| 595 |
|
|---|
| 596 |
DirTree.pl was updated to work with SWISH::Filter and to work on Windows. |
|---|
| 597 |
DirTree.pl is a program to fetch files from the file system and works with |
|---|
| 598 |
the -S prog input method. |
|---|
| 599 |
|
|---|
| 600 |
=item * Problem with --enable-incremental option |
|---|
| 601 |
|
|---|
| 602 |
Fixed configure script to build incremental option. Note that this is still |
|---|
| 603 |
experimental. But testers are welcome. |
|---|
| 604 |
|
|---|
| 605 |
=item * headers.c bug |
|---|
| 606 |
|
|---|
| 607 |
Mark Fletcher with the help of valgrind found a bug in headers.c |
|---|
| 608 |
function SwishIndexHeaderNames used by the C API. |
|---|
| 609 |
|
|---|
| 610 |
=item * Clarify documentation regarding search order |
|---|
| 611 |
|
|---|
| 612 |
At the prompting of Doralyn Rossmann updated SEARCH.pod to |
|---|
| 613 |
try and make the explanation of searching clearer, and to fix an error |
|---|
| 614 |
in the description of nested searches. |
|---|
| 615 |
|
|---|
| 616 |
=back |
|---|
| 617 |
|
|---|
| 618 |
=head2 Version 2.4.0 - October 27, 2003 |
|---|
| 619 |
|
|---|
| 620 |
=over 4 |
|---|
| 621 |
|
|---|
| 622 |
=item * Note: Different Index Format |
|---|
| 623 |
|
|---|
| 624 |
Swish-e version 2.4.0 has a different index file format from previous |
|---|
| 625 |
versions of Swish-e. Upgrading will B<require> reindexing -- version 2.4.0 |
|---|
| 626 |
cannot read indexes created with previous versions. |
|---|
| 627 |
|
|---|
| 628 |
=back |
|---|
| 629 |
|
|---|
| 630 |
=head2 Version 2.4.0 (Release Candidate 4) September 26, 2003 |
|---|
| 631 |
|
|---|
| 632 |
=over 4 |
|---|
| 633 |
|
|---|
| 634 |
=item * robots.txt not closed correctly |
|---|
| 635 |
|
|---|
| 636 |
When using -S http method robots.txt was not closed and that caused |
|---|
| 637 |
the (last) .contents file to not be unlinked under Windows. Windows |
|---|
| 638 |
seems to think filenames are related to files. |
|---|
| 639 |
|
|---|
| 640 |
=item * SWISH::Filter and locating programs on Windows |
|---|
| 641 |
|
|---|
| 642 |
SWISH::Filter now scans $libexecdir in addition to the PATH for programs (such at catdoc and |
|---|
| 643 |
pdftotext), and also checks for programs by adding the extensions ".exe" and ".bat" to the |
|---|
| 644 |
program name. |
|---|
| 645 |
|
|---|
| 646 |
=item * Install sample templates |
|---|
| 647 |
|
|---|
| 648 |
The sample templates included with swish.cgi are now installed |
|---|
| 649 |
in $pkgdatadir (typically /usr/local/share/swish-e). |
|---|
| 650 |
|
|---|
| 651 |
=back |
|---|
| 652 |
|
|---|
| 653 |
=head2 Version 2.4.0 (Release Candidate 3) September 11, 2003 |
|---|
| 654 |
|
|---|
| 655 |
=over 4 |
|---|
| 656 |
|
|---|
| 657 |
=item * Fix parser bug meta=(foo*) |
|---|
| 658 |
|
|---|
| 659 |
Fixed bug in query parser caused in rc2's (pr2) attempt to catch wildcards |
|---|
| 660 |
errors. |
|---|
| 661 |
|
|---|
| 662 |
=back |
|---|
| 663 |
|
|---|
| 664 |
=head2 Version 2.4.0 (Release Candidate 2) September 10, 2003 |
|---|
| 665 |
|
|---|
| 666 |
=over 4 |
|---|
| 667 |
|
|---|
| 668 |
=item * Indexing HTML title |
|---|
| 669 |
|
|---|
| 670 |
Fixed a problem when these were used in combination: |
|---|
| 671 |
|
|---|
| 672 |
MetaNames swishtitle |
|---|
| 673 |
MetaNameAlias swishtitle title |
|---|
| 674 |
|
|---|
| 675 |
That failed to correctly reset the metaname stack and indexed text under |
|---|
| 676 |
the wrong metaID. |
|---|
| 677 |
|
|---|
| 678 |
=item * Single Wildcards |
|---|
| 679 |
|
|---|
| 680 |
Due to the way the query parser "works" a search of |
|---|
| 681 |
|
|---|
| 682 |
"foo *" |
|---|
| 683 |
|
|---|
| 684 |
would result in a search of "foo*". Now that results in: |
|---|
| 685 |
|
|---|
| 686 |
err: Single wildcard not allowed as word |
|---|
| 687 |
|
|---|
| 688 |
=item * Fixed search parsing bug |
|---|
| 689 |
|
|---|
| 690 |
Brad Miele reported that the word "andes" was not being found. It was being |
|---|
| 691 |
stemmed to "and" when was then considered an operator. [moseley] |
|---|
| 692 |
|
|---|
| 693 |
=item * Add new directive PropertyNamesSortKeyLength |
|---|
| 694 |
|
|---|
| 695 |
PropertyNamesSortKeyLength sets the sort key length to use when sorting |
|---|
| 696 |
string properties. The default is 100 characters. There was a hard-coded |
|---|
| 697 |
100 char limit before, but that was a problem where people were not building |
|---|
| 698 |
from source (Windows). The value of this is questionable -- it's intended to |
|---|
| 699 |
limit how much memory is used when sorting while indexing and searching. [moseley] |
|---|
| 700 |
|
|---|
| 701 |
=item * Fixed sorting issues with multiple indexes and reverse sorting |
|---|
| 702 |
|
|---|
| 703 |
Reworked much of the sorting code. Still to do is setting the character sort order. |
|---|
| 704 |
[moseley] |
|---|
| 705 |
|
|---|
| 706 |
=item * Fixed minor memory leak |
|---|
| 707 |
|
|---|
| 708 |
Fixed leak of not releasing memory of index file name and swish_handle |
|---|
| 709 |
destroy, and fixed SwishStemWord to default to the Stemmer_en. [moseley] |
|---|
| 710 |
|
|---|
| 711 |
Fixed libtest.c example program that was not cleaning up memory after an |
|---|
| 712 |
error condition. |
|---|
| 713 |
|
|---|
| 714 |
=item * Replaced Swish-e's Porter Stemmer with Snowball |
|---|
| 715 |
|
|---|
| 716 |
Swish-e now has support for Snowball stemmers (http://snowball.tartarus.org/). |
|---|
| 717 |
The stemmers are enabled for an index with FuzzyIndexingMode Stemming_* where "*" can be: |
|---|
| 718 |
|
|---|
| 719 |
de, dk, en1, en2, es, fi, fr, it, nl, no, pt, ru, se |
|---|
| 720 |
|
|---|
| 721 |
In addition, UseStemming yes or FuzzyIndexingMode Stemming_en will use the old stemmer. |
|---|
| 722 |
|
|---|
| 723 |
=back |
|---|
| 724 |
|
|---|
| 725 |
=head2 Version 2.4.0 (Release Candidate 1) May 21, 2003 |
|---|
| 726 |
|
|---|
| 727 |
=over 4 |
|---|
| 728 |
|
|---|
| 729 |
=item * Security Fix: swish.cgi |
|---|
| 730 |
|
|---|
| 731 |
The swish.cgi script was not correctly escaping HTML when searching by |
|---|
| 732 |
the right combination of metanames and highlighting module. This could |
|---|
| 733 |
lead to cross-site scripting if indexing un-trusted documents. [moseley] |
|---|
| 734 |
|
|---|
| 735 |
=item * Added Support for building a Debian Package |
|---|
| 736 |
|
|---|
| 737 |
To build as a .deb unpack the distribution and chdir then run |
|---|
| 738 |
|
|---|
| 739 |
$ fakeroot debian/build binary |
|---|
| 740 |
|
|---|
| 741 |
Then install the generated .deb file with dpkg -i |
|---|
| 742 |
|
|---|
| 743 |
=item * Use SWISH::Filter by default with spider.pl |
|---|
| 744 |
|
|---|
| 745 |
spider.pl is installed in the libexecdir directory as well as the SWISH::Filter modules. |
|---|
| 746 |
PDF, MS Word, MP3, and XML documents will be indexed automatically if the required helper |
|---|
| 747 |
applications (e.g. catdoc, pdftotext) or scripts (e.g. MP3::Tag) are installed. |
|---|
| 748 |
|
|---|
| 749 |
Swish also knows about libexecdir, so you you specify a relative path with -S prog |
|---|
| 750 |
swish-e will look for the program in libexecdir. This is mostly for spider.pl so |
|---|
| 751 |
indexing only requires: |
|---|
| 752 |
|
|---|
| 753 |
IndexDir spider.pl |
|---|
| 754 |
SwishProgParameters default http://localhost/index.html |
|---|
| 755 |
|
|---|
| 756 |
And swish-e will find spider.pl and SWISH::Filter will be used to convert docs. |
|---|
| 757 |
|
|---|
| 758 |
=item * Fixed Document-Type bug |
|---|
| 759 |
|
|---|
| 760 |
Document-Type was not being reset after set input from a -S prog program causing |
|---|
| 761 |
the wrong parser to be used. [moseley] |
|---|
| 762 |
|
|---|
| 763 |
=item * New Directive: PropertyNamesNoStripChars |
|---|
| 764 |
|
|---|
| 765 |
Swish replaces all series of low ASCII chars with a single space |
|---|
| 766 |
character. This option instructs swish to store all chars in the property. [moseley] |
|---|
| 767 |
|
|---|
| 768 |
=item * Change HTTP access defaults |
|---|
| 769 |
|
|---|
| 770 |
Defaults used with -S http access method were changed. |
|---|
| 771 |
|
|---|
| 772 |
|
|---|
| 773 |
Delay was reduced from one minute between start of each request to five seconds |
|---|
| 774 |
between requests. |
|---|
| 775 |
|
|---|
| 776 |
MaxDepth was changed from five to zero, meaning there is no limit to depth indexed by |
|---|
| 777 |
default. [moseley] |
|---|
| 778 |
|
|---|
| 779 |
=item * swishspider location and SpiderDirectory |
|---|
| 780 |
|
|---|
| 781 |
The swishspider program is now installed in $prefix/lib/swish-e by default. This can |
|---|
| 782 |
be changed by the --libexecdir option to configure. |
|---|
| 783 |
|
|---|
| 784 |
The SpiderDirectory option now defaults to the value of libexecdir instead of the current |
|---|
| 785 |
directory. [moseley] |
|---|
| 786 |
|
|---|
| 787 |
|
|---|
| 788 |
=item * Added libtool and automake support |
|---|
| 789 |
|
|---|
| 790 |
Replaces the build system with Autotools. Now builds libswish-e as |
|---|
| 791 |
a shared library on systems that support shared libraries. |
|---|
| 792 |
The swish-e binary links against this shared library. |
|---|
| 793 |
Can also build outside the source tree on platforms with GNU make. [moseley] |
|---|
| 794 |
|
|---|
| 795 |
=item * Updates to installation |
|---|
| 796 |
|
|---|
| 797 |
Running "make install" now installs additional files. |
|---|
| 798 |
Files include the swish-e binary, the libswish-e search library, swish-e.h |
|---|
| 799 |
header, documentation files, the swishspider program, and Perl modules used for the example |
|---|
| 800 |
swish.cgi search script. Directories will be created if they do not already exist. |
|---|
| 801 |
Installation directories can be specified at build time. |
|---|
| 802 |
|
|---|
| 803 |
=item * Fixed bug when searching at end of inverted index |
|---|
| 804 |
|
|---|
| 805 |
Swish was not correctly detecting the end of the inverted index |
|---|
| 806 |
when searching a wildcard word that was past the last word in the index. |
|---|
| 807 |
Caught by Frank Heasley. [moseley] |
|---|
| 808 |
|
|---|
| 809 |
|
|---|
| 810 |
=item * Increase sort key length from 50 to 100 characters |
|---|
| 811 |
|
|---|
| 812 |
The setting MAX_SORT_STRING_LEN in F<src/config.h> sets the max length used |
|---|
| 813 |
when sorting in swish-e. You may reduce this number to save memory while |
|---|
| 814 |
sorting, or increase it if you have very long properties to sort. |
|---|
| 815 |
|
|---|
| 816 |
=item * Remove " entity from -p output |
|---|
| 817 |
|
|---|
| 818 |
The -p option to print properties was escaping double quotes in properties |
|---|
| 819 |
with the &quot; entity. -x does not do that, so inconsistent. -p no longer |
|---|
| 820 |
converts double quotes. The user should pick a good delimiter with -d or preferably use |
|---|
| 821 |
the -x method for generating output. |
|---|
| 822 |
|
|---|
| 823 |
=item * XML parser and Windows |
|---|
| 824 |
|
|---|
| 825 |
The XML parser was being passed the incorrect buffer length when used on Windows |
|---|
| 826 |
platform causing the parser to abort with an error. |
|---|
| 827 |
|
|---|
| 828 |
=item * Version Numbering |
|---|
| 829 |
|
|---|
| 830 |
SWISH-E versions starting with 2.3.4 use kernel version numbering. Versions are |
|---|
| 831 |
in the form: Major.Minor.Build. Odd minor versions are development. Even minor |
|---|
| 832 |
versions are releases. 2.3.4 would be a development version. |
|---|
| 833 |
2.4.0 would be a release version. 2.3.20 would be the 20th build of 2.3. |
|---|
| 834 |
|
|---|
| 835 |
=item * Added RPM support |
|---|
| 836 |
|
|---|
| 837 |
RPMs can be built with: |
|---|
| 838 |
|
|---|
| 839 |
./configure |
|---|
| 840 |
make dist |
|---|
| 841 |
|
|---|
| 842 |
Copy the resulting tarball to RPM's SOURCES directory and then run as a superuser: |
|---|
| 843 |
|
|---|
| 844 |
rpmbuild -ba rpm/swish-e.spec |
|---|
| 845 |
|
|---|
| 846 |
|
|---|
| 847 |
You should have swish-e packages in your RPMS/$arch directory. [augur] |
|---|
| 848 |
|
|---|
| 849 |
=item * Changed default perl binary location |
|---|
| 850 |
|
|---|
| 851 |
Most perl scripts provided with SWISH-E now use /usr/bin/perl by default. |
|---|
| 852 |
Note that some scripts are generated at build time, so those will look in the |
|---|
| 853 |
path for the location of the perl binary. |
|---|
| 854 |
|
|---|
| 855 |
=item * New Feature: MetaNamesRank |
|---|
| 856 |
|
|---|
| 857 |
MetaNamesRank can be used to adjust the ranking for words based on |
|---|
| 858 |
the word's MetaName. |
|---|
| 859 |
|
|---|
| 860 |
=item * New Swish Library API and Perl Module |
|---|
| 861 |
|
|---|
| 862 |
The Swish-e C library interface was rewritten to provide |
|---|
| 863 |
better memory management and better separation of data. |
|---|
| 864 |
Most indexing related code has been removed from the library. |
|---|
| 865 |
A new header file is provided for the API: swish-e.h. |
|---|
| 866 |
|
|---|
| 867 |
The Perl module SWISHE was replaced with the SWISH::API module |
|---|
| 868 |
in the Swish-e distribution. |
|---|
| 869 |
|
|---|
| 870 |
B<Previous versions of the SWISHE module will not work with this version of Swish-e.> |
|---|
| 871 |
|
|---|
| 872 |
If you are using the SWISHE module from a previous version of Swish then you must |
|---|
| 873 |
either rewrite your code to use the new SWISH::API module (highly recommended) |
|---|
| 874 |
or use the replacement SWISHE module. The replacement SWISHE module is a thin |
|---|
| 875 |
interface to the SWISH::API module. It can be downloaded from |
|---|
| 876 |
|
|---|
| 877 |
http://swish-e.org/Download/old/SWISHE-0.03.tar.gz |
|---|
| 878 |
|
|---|
| 879 |
=item * NoContents not working with libxml2 parser |
|---|
| 880 |
|
|---|
| 881 |
Corrected problem when using NoContents with binary files and the HTML2 parser. |
|---|
| 882 |
|
|---|
| 883 |
Trying to index image file names with: |
|---|
| 884 |
|
|---|
| 885 |
IndexOnly .gif .jpeg |
|---|
| 886 |
NoContents .gif .jpeg |
|---|
| 887 |
|
|---|
| 888 |
failed to index the path names because the default parser |
|---|
| 889 |
(HTML2 when libxml2 is linked with swish-e) |
|---|
| 890 |
was not finding any text in the binary files. [moseley] |
|---|
| 891 |
|
|---|
| 892 |
=item * Updates to swish.cgi |
|---|
| 893 |
|
|---|
| 894 |
The example/swish.cgi script can now use the SWISH::API module |
|---|
| 895 |
for searching an index. Combined with mod_perl this module |
|---|
| 896 |
can improve search performance considerably. |
|---|
| 897 |
|
|---|
| 898 |
The Perl modules used with the swish.cgi script have all been moved into |
|---|
| 899 |
the SWISH::* namespace. Hence, files in the F<modules> directory were moved |
|---|
| 900 |
into the F<modules::SWISH> directory. |
|---|
| 901 |
|
|---|
| 902 |
=back |
|---|
| 903 |
|
|---|
| 904 |
=head2 Version 2.2.3 - December 11, 2002 |
|---|
| 905 |
|
|---|
| 906 |
Multiple -L options were ORing instead of ANDing. |
|---|
| 907 |
Catch by Patrick Mouret. [moseley] |
|---|
| 908 |
|
|---|
| 909 |
=head2 Version 2.2.2 - November 14, 2002 |
|---|
| 910 |
|
|---|
| 911 |
Pass non- text/* files onto indexing code IF there is a FileFilter |
|---|
| 912 |
associated with the *extension* of the URL. Fixes the problem of not |
|---|
| 913 |
being able to index, say, pdf files by using the FileFilter configuation |
|---|
| 914 |
option. |
|---|
| 915 |
|
|---|
| 916 |
Fixed bug where nulls were stripped when using FileFilter with -S prog. |
|---|
| 917 |
Catch by Greg Fenton. [moseley] |
|---|
| 918 |
|
|---|
| 919 |
=head2 Version 2.2.1 - September 26, 2002 |
|---|
| 920 |
|
|---|
| 921 |
=over 4 |
|---|
| 922 |
|
|---|
| 923 |
=item * NoContents with -S prog |
|---|
| 924 |
|
|---|
| 925 |
Failed to use the correct default parser when using the No-Contents header |
|---|
| 926 |
and libxml2 linked in. [moseley] |
|---|
| 927 |
|
|---|
| 928 |
=item * Add tests for IRIX and sparc machines |
|---|
| 929 |
|
|---|
| 930 |
8-byte alignment in mem_zones is is required for these machine [moseley] |
|---|
| 931 |
|
|---|
| 932 |
|
|---|
| 933 |
=item * Fixed code when removing files |
|---|
| 934 |
|
|---|
| 935 |
Was not correctly removing words from index when parser aborted [jmruiz] |
|---|
| 936 |
|
|---|
| 937 |
=item * Merge segfault |
|---|
| 938 |
|
|---|
| 939 |
Fixed segfault caused by trying to print null dates while merging |
|---|
| 940 |
duplicate files. [moseley] |
|---|
| 941 |
|
|---|
| 942 |
=item * Documentation patches |
|---|
| 943 |
|
|---|
| 944 |
Spelling corrections to the SWISH-CONFIG pod page [Steve Eckert] |
|---|
| 945 |
|
|---|
| 946 |
=item * Configure corrections |
|---|
| 947 |
|
|---|
| 948 |
Fixed a zlib test error that used "==" in a test [Steve Eckert] |
|---|
| 949 |
|
|---|
| 950 |
=item * Updates to VMS build |
|---|
| 951 |
|
|---|
| 952 |
The VMS build was updated [Jean-François PIÉRONNE] |
|---|
| 953 |
|
|---|
| 954 |
=item * MANIFEST corrections |
|---|
| 955 |
|
|---|
| 956 |
Added missing filters and vms build file into MANIFEST [moseley] |
|---|
| 957 |
|
|---|
| 958 |
=back |
|---|
| 959 |
|
|---|
| 960 |
=head2 Version 2.2 - September 18, 2002 |
|---|
| 961 |
|
|---|
| 962 |
|
|---|
| 963 |
=over 4 |
|---|
| 964 |
|
|---|
| 965 |
=item * Default parser |
|---|
| 966 |
|
|---|
| 967 |
Swish-e will now use the HTML2 (libxml2) parser by default if libxml2 is |
|---|
| 968 |
installed and DefaultContents or IndexContents is not used. |
|---|
| 969 |
|
|---|
| 970 |
=item * Selecting parsers |
|---|
| 971 |
|
|---|
| 972 |
Allow HTML*, XML*, and TXT* to automatically select the libxml2-based parsers |
|---|
| 973 |
if libxml2 is linked with Swish-e, otherwise fallback to the built-in parsers. |
|---|
| 974 |
|
|---|
| 975 |
=item * SwishSpider and Filters |
|---|
| 976 |
|
|---|
| 977 |
Filters (FileFilter directive) did not work correctly when spidering |
|---|
| 978 |
with the -S http method. A new filter system was developed and now |
|---|
| 979 |
filtering of documents (e.g. pdf-E<gt>html or MSWord-E<gt>text) is handled |
|---|
| 980 |
by the src/SwishSpider program. |
|---|
| 981 |
|
|---|
| 982 |
When indexing with the -S http method only documents of content-type "text/*" |
|---|
| 983 |
are indexed. Other documents must be converted to text by using the filter system. |
|---|
| 984 |
|
|---|
| 985 |
=item * Buffer overflow in xml.c |
|---|
| 986 |
|
|---|
| 987 |
Fixed bug in xml.c reported by Rodney Barnett when very long words |
|---|
| 988 |
were indexed. [moseley] |
|---|
| 989 |
|
|---|
| 990 |
=item * configure script updates |
|---|
| 991 |
|
|---|
| 992 |
Updated from _WIN32 checks to feature checks using autoconf [moseley, norris] |
|---|
| 993 |
|
|---|
| 994 |
=item * updates to run on Alpha (Linux 2.4 (Debian 3.0)) |
|---|
| 995 |
|
|---|
| 996 |
Fixed a cast error when calling zlib, and the calls to read/write a packed longs |
|---|
| 997 |
to disk. [jmruiz, moseley] |
|---|
| 998 |
|
|---|
| 999 |
=item * COALESCE_BUFFER_MAX_SIZE |
|---|
| 1000 |
|
|---|
| 1001 |
Some people were seeing the following error: |
|---|
| 1002 |
|
|---|
| 1003 |
err: Buffer too short in coalesce_word_locations. |
|---|
| 1004 |
Increase COALESCE_BUFFER_MAX_SIZE in config.h and rebuild. |
|---|
| 1005 |
|
|---|
| 1006 |
This was due to indexing binary data or files with very large number of words. |
|---|
| 1007 |
The best solution is to not index binary data or files with a very large number |
|---|
| 1008 |
of words. |
|---|
| 1009 |
|
|---|
| 1010 |
Swish-e will now automatically reallocate the buffer as needed. [jmruiz] |
|---|
| 1011 |
|
|---|
| 1012 |
|
|---|
| 1013 |
=back |
|---|
| 1014 |
|
|---|
| 1015 |
=head2 Version 2.2rc1 - August 29, 2002 |
|---|
| 1016 |
|
|---|
| 1017 |
Many large changes were made internally in the code, some for performance |
|---|
| 1018 |
reasons, some for feature changes and additions, and some to prepare |
|---|
| 1019 |
for new features in later versions of Swish-e. |
|---|
| 1020 |
|
|---|
| 1021 |
=over 4 |
|---|
| 1022 |
|
|---|
| 1023 |
=item * Documentation! |
|---|
| 1024 |
|
|---|
| 1025 |
Documentation is now included in the source distribution as .pod |
|---|
| 1026 |
(perldoc) files, and as HTML files. In addition, the distribution can now |
|---|
| 1027 |
generate PDF, postscript, and unix man pages from the source .pod files. |
|---|
| 1028 |
See L<README|README> for more information. |
|---|
| 1029 |
|
|---|
| 1030 |
=item * Indexing and searching speed |
|---|
| 1031 |
|
|---|
| 1032 |
The indexing process has been imporoved. Depending on a number of |
|---|
| 1033 |
factors, you may see a significant improvement in indexing speed, |
|---|
| 1034 |
especially if upgrading from version 1.x. |
|---|
| 1035 |
|
|---|
| 1036 |
Searching speed has also been improved. Properties are not loaded until |
|---|
| 1037 |
results are displayed, and properties are pre-sorted during indexing to |
|---|
| 1038 |
speed up sorting results by properties while searching. |
|---|
| 1039 |
|
|---|
| 1040 |
=item * Properties are written to a sepearte file |
|---|
| 1041 |
|
|---|
| 1042 |
Swish-e now stores document properties in a separate file. This means |
|---|
| 1043 |
there are now two files that make up a Swish-e index. The default files |
|---|
| 1044 |
are C<index.swish-e> and C<index.swish-e.prop>. |
|---|
| 1045 |
|
|---|
| 1046 |
This change frees memory while indexing, allowing larger collections to |
|---|
| 1047 |
be indexed in memory. |
|---|
| 1048 |
|
|---|
| 1049 |
=item * Internal data stored as Properties |
|---|
| 1050 |
|
|---|
| 1051 |
Pre 2.2 some internal data was stored in fixed locations within the |
|---|
| 1052 |
index, namely the file name, file size, and title. 2.2 introduced new |
|---|
| 1053 |
internal data such as the last modified date, and document summaries. |
|---|
| 1054 |
This data is considered I<meta data> since it is data about a document. |
|---|
| 1055 |
|
|---|
| 1056 |
Instead of adding new data to the internal structure of the index file, |
|---|
| 1057 |
it was decided to use the MetaNames and PropertyNames feature of Swish-e |
|---|
| 1058 |
to store this meta information. This allows for new meta data to be added |
|---|
| 1059 |
at a later time (e.g. Content-type), and provides an easy and customizable |
|---|
| 1060 |
way to print results with the C<-p> switch and the new C<-x> switch. |
|---|
| 1061 |
In addition, search results can now be sorted and limited by properties. |
|---|
| 1062 |
|
|---|
| 1063 |
For example, to sort by the rank and title: |
|---|
| 1064 |
|
|---|
| 1065 |
swish-e -w foo -s swishrank desc swishtitle asc |
|---|
| 1066 |
|
|---|
| 1067 |
|
|---|
| 1068 |
=item * The header display has been slightly reorganized. |
|---|
| 1069 |
|
|---|
| 1070 |
If you are parsing output headers in a program then you may need to |
|---|
| 1071 |
adjust your code. There's a new switch '-H' to control the level of |
|---|
| 1072 |
header output when searching. |
|---|
| 1073 |
|
|---|
| 1074 |
=item * Results are now combined when searching more than one index. |
|---|
| 1075 |
|
|---|
| 1076 |
Swish-e now merges (and sorts) the results from multiple indexes when |
|---|
| 1077 |
using C<-f> to specify more than one index. This change effects the way |
|---|
| 1078 |
maxhits (C<-m>) works. Here's a summary of the way it works for the |
|---|
| 1079 |
different versions. |
|---|
| 1080 |
|
|---|
| 1081 |
|
|---|
| 1082 |
1.3.2 - MaxHits returns first N results starting from the first index. |
|---|
| 1083 |
e.g. maxhits=20; 15 hits Index1, 40 hits Index2 |
|---|
| 1084 |
All 15 from Index1 plus first five from Index2 = 20 hits. |
|---|
| 1085 |
|
|---|
| 1086 |
2.0.0 - MaxHits returns first N results from each index. |
|---|
| 1087 |
e.g. Maxhits=20; 15 hits Index1, 40 hits Index2 |
|---|
| 1088 |
All 15 from Index1 plus 15 from Index2. |
|---|
| 1089 |
|
|---|
| 1090 |
2.2.0 - Results are merged and first N results are returned. |
|---|
| 1091 |
e.g. Maxhits=20; 15 hits Index1, 40 hits Index2 |
|---|
| 1092 |
Results are merged from each index and sorted |
|---|
| 1093 |
(rank is the default sort) and only the first |
|---|
| 1094 |
20 are returned. |
|---|
| 1095 |
|
|---|
| 1096 |
|
|---|
| 1097 |
=item * New B<prog> document source indexing method |
|---|
| 1098 |
|
|---|
| 1099 |
You can now use -S prog to use an external program to supply documents |
|---|
| 1100 |
to Swish-e. This external program can be used to spider web servers, |
|---|
| 1101 |
index databases, or to convert any type of document into html, xml, |
|---|
| 1102 |
or text, so it can be indexed by Swish-e. Examples are given in the |
|---|
| 1103 |
C<prog-bin> directory. |
|---|
| 1104 |
|
|---|
| 1105 |
=item * The indexing parser was rewritten to be more logical. |
|---|
| 1106 |
|
|---|
| 1107 |
TranslateCharacters now is done before WordCharacters is checked. For example, |
|---|
| 1108 |
|
|---|
| 1109 |
WordCharacters abcdefghijklmnopqrstuvwxyz |
|---|
| 1110 |
TranslateCharacters ñ n |
|---|
| 1111 |
|
|---|
| 1112 |
Now C<El Niño> will be indexed as El Nino (el and nino), even though C<ñ> |
|---|
| 1113 |
is not listed in WordCharacters. |
|---|
| 1114 |
|
|---|
| 1115 |
Previously, stopwords were checked after stemming and soundex conversions, |
|---|
| 1116 |
as well as most of the other word checks (WordCharacters, min/max length |
|---|
| 1117 |
and so on). This meant that the stopword list probably didn't work as |
|---|
| 1118 |
expected when using stemming. |
|---|
| 1119 |
|
|---|
| 1120 |
=item * The search parser was rewritten to be more logical |
|---|
| 1121 |
|
|---|
| 1122 |
The search parser was rewritten to correct a number of logic errors. |
|---|
| 1123 |
Swish-e did not differentiate between meta names, Swish-e operators |
|---|
| 1124 |
and search words when parsing the query. This meant, for example, |
|---|
| 1125 |
that metanames might be broken up by the WordCharacters setting, and |
|---|
| 1126 |
that they could be stemmed. |
|---|
| 1127 |
|
|---|
| 1128 |
Swish-e operator characters C<"*()=> can now be searched by escaping |
|---|
| 1129 |
with a backslash. For example: |
|---|
| 1130 |
|
|---|
| 1131 |
./swish-e -w 'this\=odd\)word' |
|---|
| 1132 |
|
|---|
| 1133 |
will end up searching for the word C<this=odd)word>. To search for a |
|---|
| 1134 |
backslash character preceed it with a backslash. |
|---|
| 1135 |
|
|---|
| 1136 |
Currently, searching for: |
|---|
| 1137 |
|
|---|
| 1138 |
./swish-e -w 'this\*' |
|---|
| 1139 |
|
|---|
| 1140 |
is the same as a wildcard search. This may be fixed in the future. |
|---|
| 1141 |
|
|---|
| 1142 |
Searching for buzzwords with those characters will still require |
|---|
| 1143 |
backslashing. This also may change to allow some un-escaped operator |
|---|
| 1144 |
characters, but some will always need to be escaped (e.g. the double-quote |
|---|
| 1145 |
phrase character). |
|---|
| 1146 |
|
|---|
| 1147 |
=item * Quotes and Backslash escapes in strings |
|---|
| 1148 |
|
|---|
| 1149 |
A bug was fixed in the C<parse_line()> function (in F<string.c>) where |
|---|
| 1150 |
backslashes were not escaping the next character. C<parse_line()> is used |
|---|
| 1151 |
to parse a string of text into tokens (words). Normally splitting is done |
|---|
| 1152 |
at whitespace. You may use quotes (single or double) to define a string |
|---|
| 1153 |
(that might include whitespace) as a single parameter. The backslash |
|---|
| 1154 |
can also be used to escape the following character when *within* quotes |
|---|
| 1155 |
(e.g. to escape an embedded quote character). |
|---|
| 1156 |
|
|---|
| 1157 |
ReplaceRules append "foo bar" <- define "foo bar" as a single word |
|---|
| 1158 |
ReplaceRules append "foo\"bar" <- escape the quotes |
|---|
| 1159 |
ReplaceRules append 'foo"bar' <- same thing |
|---|
| 1160 |
|
|---|
| 1161 |
|
|---|
| 1162 |
=item * Example C<user.config> file removed. |
|---|
| 1163 |
|
|---|
| 1164 |
Previous versions of Swish-e included a configuration file called |
|---|
| 1165 |
C<user.config> which contained examples of all directives. This has |
|---|
| 1166 |
been replaced by a series of example configuration files located in the |
|---|
| 1167 |
C<conf> directory. The configuration directives are now described in |
|---|
| 1168 |
L<SWISH-CONFIG|SWISH-CONFIG>. |
|---|
| 1169 |
|
|---|
| 1170 |
=item * Ports to Win32 and VMS |
|---|
| 1171 |
|
|---|
| 1172 |
David Norris has included the files required to build Swish-e under |
|---|
| 1173 |
Windows. See C<src/win32>. A self-extracting Windows version is |
|---|
| 1174 |
available from the Download page of the swish-e.org web site. |
|---|
| 1175 |
|
|---|
| 1176 |
Jean-François Piéronne has provided the files required to build Swish-e |
|---|
| 1177 |
under OpenVMS. See C<src/vms> for more information. |
|---|
| 1178 |
|
|---|
| 1179 |
=item * String properties are concatenated |
|---|
| 1180 |
|
|---|
| 1181 |
Multiple I<string> properties of the same name in a document are now |
|---|
| 1182 |
concatenated into one property. A space character is added between |
|---|
| 1183 |
the strings if needed. A warning will be generated if multiple numeric |
|---|
| 1184 |
or date properties are found in the same document, and the additional |
|---|
| 1185 |
properties will be ignored. |
|---|
| 1186 |
|
|---|
| 1187 |
Previously, properties of the same name were added to the index, but |
|---|
| 1188 |
could not be retrieved. |
|---|
| 1189 |
|
|---|
| 1190 |
To do: remove the C<next> pointer, and allow user-defined character to |
|---|
| 1191 |
place between properties. |
|---|
| 1192 |
|
|---|
| 1193 |
=item * regex type added to ReplaceRules |
|---|
| 1194 |
|
|---|
| 1195 |
A more general purpose pattern replacement syntax. |
|---|
| 1196 |
|
|---|
| 1197 |
|
|---|
| 1198 |
=item * New Parsers |
|---|
| 1199 |
|
|---|
| 1200 |
Swish-e's XML parser was replaced with James Clark's expat XML parser |
|---|
| 1201 |
library. |
|---|
| 1202 |
|
|---|
| 1203 |
Swish-e can now use Daniel Veillard's libxml2 library for parsing HTML and |
|---|
| 1204 |
XML. This requires installation of the library before building Swish-e. |
|---|
| 1205 |
See the L<INSTALL|INSTALL> document for information. libxml2 is not |
|---|
| 1206 |
required, but is strongly recommended for parsing HTML over Swish-e's |
|---|
| 1207 |
internal HTML parser, and provides more features for both HTML and |
|---|
| 1208 |
XML parsing. |
|---|
| 1209 |
|
|---|
| 1210 |
=item * Support for zlib |
|---|
| 1211 |
|
|---|
| 1212 |
Swish-e can be compiled with zlib. This is useful for compressing large |
|---|
| 1213 |
properties. Building Swish-e with zlib is stronly recommended if you |
|---|
| 1214 |
use its C<StoreDescription> feature. |
|---|
| 1215 |
|
|---|
| 1216 |
=item * LST type of document no longer supported |
|---|
| 1217 |
|
|---|
| 1218 |
LST allowed indexing of files that contained multiple documents. |
|---|
| 1219 |
|
|---|
| 1220 |
=item * Temporary files |
|---|
| 1221 |
|
|---|
| 1222 |
To improve security Swish-e now uses the C<mkstemp(3)> function to |
|---|
| 1223 |
create temporary files. Temporary files are used while indexing only. |
|---|
| 1224 |
This may result in some portability issues, but the security issues |
|---|
| 1225 |
were overriding. |
|---|
| 1226 |
|
|---|
| 1227 |
(Currently this does not apply to the -S http indexing method.) |
|---|
| 1228 |
|
|---|
| 1229 |
C<mkstemp> opens the temporary with O_EXCL|O_CREAT flags. This prevents |
|---|
| 1230 |
overwriting existing files. In addition, the name of the file created |
|---|
| 1231 |
is a lot harder to guess by attackers. The temporary file is created |
|---|
| 1232 |
with only owner permissions. |
|---|
| 1233 |
|
|---|
| 1234 |
Please report any portability issues on the Swish-e discussion list. |
|---|
| 1235 |
|
|---|
| 1236 |
=item * Temporary file locations |
|---|
| 1237 |
|
|---|
| 1238 |
Swish-e now uses the environment variables C<TMPDIR>, C<TMP>, and |
|---|
| 1239 |
C<TEMP> (in that order) to decide where to write temporary files. |
|---|
| 1240 |
The configuration setting of L<TmpDir|SWISH-CONFIG/"item_TmpDir"> will |
|---|
| 1241 |
be used if none of the environment variables are set. Swish-e uses the |
|---|
| 1242 |
current directory otherwise; there is no default temporary directory. |
|---|
| 1243 |
|
|---|
| 1244 |
Since the environment variables override the configuration settings, |
|---|
| 1245 |
a warning will be issued if you set L<TmpDir|SWISH-CONFIG/"item_TmpDir"> |
|---|
| 1246 |
in the configuration file and there's also an environment variable set. |
|---|
| 1247 |
|
|---|
| 1248 |
Temporary files begin with the letters "swtmp" (which can be changed in |
|---|
| 1249 |
F<config.h>), followed by two or more letters that indicate the type of |
|---|
| 1250 |
temporary file, and some random characters to complete the file name. |
|---|
| 1251 |
If indexing is aborted for some reason you may find these temporary |
|---|
| 1252 |
files left behind. |
|---|
| 1253 |
|
|---|
| 1254 |
=item * New Fuzzy indexing method Double Metaphone |
|---|
| 1255 |
|
|---|
| 1256 |
Based on Lawrence Philips' Metaphone algorithm, add two |
|---|
| 1257 |
new methods of creating a fuzzy index (in addition to Stemming and Soundex). |
|---|
| 1258 |
|
|---|
| 1259 |
|
|---|
| 1260 |
=back |
|---|
| 1261 |
|
|---|
| 1262 |
Changes to Configuration File Directives. Please see |
|---|
| 1263 |
L<SWISH-CONFIG|SWISH-CONFIG> for more info. |
|---|
| 1264 |
|
|---|
| 1265 |
=over 4 |
|---|
| 1266 |
|
|---|
| 1267 |
=item * New directives: IndexContents and DefaultContents |
|---|
| 1268 |
|
|---|
| 1269 |
The IndexContents directive assigns internal Swish-e document parsers |
|---|
| 1270 |
to files based on their file type. The DefaultContents directive |
|---|
| 1271 |
assigns a parser to be used on file that are not assigned a parser with |
|---|
| 1272 |
IndexContents. |
|---|
| 1273 |
|
|---|
| 1274 |
=item * New directive: UndefinedMetaTags [error|ignore|index|auto] |
|---|
| 1275 |
|
|---|
| 1276 |
This describes what to do when a meta tag is found in a document that |
|---|
| 1277 |
is not listed in the MetaNames directive. |
|---|
| 1278 |
|
|---|
| 1279 |
=item * New directive: IgnoreTags |
|---|
| 1280 |
|
|---|
| 1281 |
Will ignore text with the listed tags. |
|---|
| 1282 |
|
|---|
| 1283 |
=item * New directive: SwishProgParameters *list of words* |
|---|
| 1284 |
|
|---|
| 1285 |
Passes words listed to the external Swish-e program when running with |
|---|
| 1286 |
C<-S prog> document source method. |
|---|
| 1287 |
|
|---|
| 1288 |
=item * New directive: ConvertHTMLEntities [yes|no] |
|---|
| 1289 |
|
|---|
| 1290 |
Controls parsing and conversion of HTML entities. |
|---|
| 1291 |
|
|---|
| 1292 |
=item * New directive: DontBumpPositionOnMetaTags |
|---|
| 1293 |
|
|---|
| 1294 |
The word position is now bumped when a new metatag is found -- this is |
|---|
| 1295 |
to prevent phrases from matching across meta tags. This directive will |
|---|
| 1296 |
disable this behavior for the listed tags. |
|---|
| 1297 |
|
|---|
| 1298 |
This directive works for HTML and XML documents. |
|---|
| 1299 |
|
|---|
| 1300 |
=item * Changed directive: IndexComments |
|---|
| 1301 |
|
|---|
| 1302 |
This has been changed such that comments are not indexed by default. |
|---|
| 1303 |
|
|---|
| 1304 |
=item * Changed directive: IgnoreWords |
|---|
| 1305 |
|
|---|
| 1306 |
The builtin list of stopwords has been removed. Use of the SwishDefault |
|---|
| 1307 |
word will generate a warning, and no stop words will be used. You must |
|---|
| 1308 |
now specify a list of stopwords, or specify a file of stopwords. |
|---|
| 1309 |
|
|---|
| 1310 |
A sample file C<stopwords.txt> has been inclu |
|---|