| 1 |
[% META |
|---|
| 2 |
title = "Development Information" |
|---|
| 3 |
id = "development" |
|---|
| 4 |
author = '$Author$' |
|---|
| 5 |
%] |
|---|
| 6 |
|
|---|
| 7 |
|
|---|
| 8 |
<h1>Development</h1> |
|---|
| 9 |
|
|---|
| 10 |
<p> |
|---|
| 11 |
The current stable release is [% link_to_page('download', swish.current_version ) %]. |
|---|
| 12 |
</p> |
|---|
| 13 |
|
|---|
| 14 |
<p> |
|---|
| 15 |
Swish-e is continually under development. |
|---|
| 16 |
This page contains a laundry list of requested features planned for a future |
|---|
| 17 |
Swish-e release. To request new features, bug fixes, or (best of all) |
|---|
| 18 |
to submit code patches, send e-mail to the |
|---|
| 19 |
|
|---|
| 20 |
[% link_to_page('discuss' , 'Swish-e mailing list' ) %]. |
|---|
| 21 |
</p> |
|---|
| 22 |
|
|---|
| 23 |
<h3>Daily Builds</h3> |
|---|
| 24 |
|
|---|
| 25 |
<p> |
|---|
| 26 |
Swish-e source is available for anonymous public download from |
|---|
| 27 |
the [% link_to_page('cvs', 'swish-e subversion server' ) %]. |
|---|
| 28 |
</p> |
|---|
| 29 |
|
|---|
| 30 |
<p> |
|---|
| 31 |
The daring and adventurous can download the |
|---|
| 32 |
<a href="[% site.url.latest_snapshot %]">daily build snapshot</a> from |
|---|
| 33 |
the [% link_to_page('daily') %] page. |
|---|
| 34 |
This is <strong>not an official release</strong> of Swish-e, |
|---|
| 35 |
rather the current development version. |
|---|
| 36 |
There is no guarantee that these packages run. |
|---|
| 37 |
Please do not use this code in production. |
|---|
| 38 |
</p> |
|---|
| 39 |
|
|---|
| 40 |
<p> |
|---|
| 41 |
For Windows development binary (pre-compiled) snapshots, please |
|---|
| 42 |
see <a href="http://www.webaugur.com/wares/files/swish-e/daily/">http://www.webaugur.com/wares/files/swish-e/daily/</a>. |
|---|
| 43 |
|
|---|
| 44 |
<br /> |
|---|
| 45 |
The most current Windows <strong>development</strong> version is |
|---|
| 46 |
<a href="http://www.webaugur.com/wares/files/swish-e/daily/swish-latest.exe">here</a>. |
|---|
| 47 |
</p> |
|---|
| 48 |
|
|---|
| 49 |
<p> |
|---|
| 50 |
Questions regarding daily development builds, |
|---|
| 51 |
or about using Swish-e in general, should be directed to |
|---|
| 52 |
the [% link_to_page('discuss', 'Swish-e mailing list' ) %]. |
|---|
| 53 |
</p> |
|---|
| 54 |
|
|---|
| 55 |
|
|---|
| 56 |
<h3>Features planned for 2.6</h3> |
|---|
| 57 |
|
|---|
| 58 |
<p> |
|---|
| 59 |
<ul> |
|---|
| 60 |
<li>Remove expat and other older parsers. Libxml2 will be default (only) parser.</li> |
|---|
| 61 |
<li>Remove -S http method.</li> |
|---|
| 62 |
<li>Documentation overhaul.</li> |
|---|
| 63 |
|
|---|
| 64 |
</ul> |
|---|
| 65 |
</p> |
|---|
| 66 |
|
|---|
| 67 |
<h3>Features planned for 3.0</h3> |
|---|
| 68 |
|
|---|
| 69 |
<p> |
|---|
| 70 |
Swish-e 3.0 (abbreviated Swish3) will be a complete overhaul of the code. |
|---|
| 71 |
You can <a href="http://dev.swish-e.org/wiki/swish3">track development progress here</a>. |
|---|
| 72 |
Major feature improvements will include: |
|---|
| 73 |
|
|---|
| 74 |
<dl> |
|---|
| 75 |
<dt>Unicode support</dt> |
|---|
| 76 |
<dd>Unicode is the <a href='http://www.unicode.org/unicode/faq/'>international standard |
|---|
| 77 |
for character encodings</a>. Swish3 will implement |
|---|
| 78 |
support for the <a href='http://www.cl.cam.ac.uk/~mgk25/unicode.html'>UTF-8</a> |
|---|
| 79 |
<a href='http://czyborra.com/utf/'>character encoding</a>, |
|---|
| 80 |
which should handle all major languages in the world (UTF-8 handles up to |
|---|
| 81 |
2,147,483,648 unique characters). |
|---|
| 82 |
The Swish-e developers need input from non-English language experts. |
|---|
| 83 |
Please contribute to the discussion at the |
|---|
| 84 |
|
|---|
| 85 |
[% link_to_page('discuss' , 'Swish-e mailing list' ) %]. |
|---|
| 86 |
|
|---|
| 87 |
Some significant known issues include: |
|---|
| 88 |
<p /> |
|---|
| 89 |
<dl> |
|---|
| 90 |
<dt>lowercase vs. UPPERCASE</dt> |
|---|
| 91 |
<dd>Version 2.x uses <tt>tolower()</tt> to lowercase all characters |
|---|
| 92 |
before searching and indexing. Should the same approach be used for UTF-8? Will this have |
|---|
| 93 |
significant impact on usability for non-English languages? |
|---|
| 94 |
</dd> |
|---|
| 95 |
<dt>Wildcards</dt> |
|---|
| 96 |
<dd>Version 2.x uses an internal table to support wildcard searching with <tt>*</tt>. |
|---|
| 97 |
The table assumes 8-bit (non-Unicode) character encoding. That approach will likely need |
|---|
| 98 |
to be re-thought for multibyte encodings like UTF-8. |
|---|
| 99 |
</dd> |
|---|
| 100 |
<dt>Tokenizing</dt> |
|---|
| 101 |
<dd>Version 2.x uses 5 different configuration options to control how a |
|---|
| 102 |
'word' (token) is defined. The basic assumption is that a word is defined by which characters it |
|---|
| 103 |
<i>includes</i>. That assumption is based on a manageable character set of 256 characters. |
|---|
| 104 |
However, the sheer size of UTF-8 makes that system unworkable. Instead, some kind of |
|---|
| 105 |
regular expression library will likely be used. |
|---|
| 106 |
</dd> |
|---|
| 107 |
|
|---|
| 108 |
<dt>Stemming</dt><dd>The stemmers used will need full international support. |
|---|
| 109 |
</dd> |
|---|
| 110 |
<dt>Configuration format</dt> |
|---|
| 111 |
<dd>Since Swish-e depends on a configuration file for StopWords, Character |
|---|
| 112 |
definitions, etc., the parsing of the configuration file must support UTF-8 as well. |
|---|
| 113 |
The current idea is to switch to XML-style configuration files and use Libxml2 to parse |
|---|
| 114 |
them. |
|---|
| 115 |
</dd> |
|---|
| 116 |
</dl> |
|---|
| 117 |
|
|---|
| 118 |
</dd> |
|---|
| 119 |
|
|---|
| 120 |
<dt>Incremental indexing</dt> |
|---|
| 121 |
<dd>Swish3 will support true incremental indexing. This will allow for document records |
|---|
| 122 |
to be modified, added and deleted in an existing index. This feature may or may not build |
|---|
| 123 |
on the version 2.x experimental btree/incremental feature. |
|---|
| 124 |
</dd> |
|---|
| 125 |
|
|---|
| 126 |
<dt>Scaling</dt> |
|---|
| 127 |
<dd>Swish3 will reliably scale to larger (multimillion) document collections. |
|---|
| 128 |
</dd> |
|---|
| 129 |
|
|---|
| 130 |
<dt>Indexing API</dt> |
|---|
| 131 |
<dd>Swish3 will include an indexing API in addition to the current searching API.</dd> |
|---|
| 132 |
|
|---|
| 133 |
<dt>Streamlined feature set</dt> |
|---|
| 134 |
<dd>Swish3 will not contain several features in the current version: |
|---|
| 135 |
<ul> |
|---|
| 136 |
<li>Expat parsers</li> |
|---|
| 137 |
<li><tt>-S http</tt> indexing method and related configuration options</li> |
|---|
| 138 |
<li>Older stemmers</li> |
|---|
| 139 |
<li>Current native index format</li> |
|---|
| 140 |
</ul> |
|---|
| 141 |
</dd> |
|---|
| 142 |
|
|---|
| 143 |
<dt>Alternate index backends</dt> |
|---|
| 144 |
<dd>Swish3 will offer alternate index backends using available open source libraries, |
|---|
| 145 |
such as <a href='http://xapian.org/'>Xapian</a>, |
|---|
| 146 |
<a href='http://hyperestraier.sourceforge.net/'>HyperEstraier</a>, |
|---|
| 147 |
<a href='http://incubator.apache.org/lucene4c/'>Lucene</a>, or |
|---|
| 148 |
<a href='http://www.lemurproject.org/'>Lemur</a>. |
|---|
| 149 |
</dd> |
|---|
| 150 |
|
|---|
| 151 |
</dl> |
|---|
| 152 |
</p> |
|---|
| 153 |
|
|---|
| 154 |
|
|---|
| 155 |
<hr /> |
|---|
| 156 |
|
|---|
| 157 |
<h3>The Players</h3> |
|---|
| 158 |
|
|---|
| 159 |
<p> |
|---|
| 160 |
You can't tell the players without a program. And we wouldn't have a program without |
|---|
| 161 |
all these players! All these folks have made key contributions to Swish-e: |
|---|
| 162 |
If you are not listed here, and you should be, <a href="mailto:roy.tennant@ucop.edu">drop a line</a>. |
|---|
| 163 |
|
|---|
| 164 |
|
|---|
| 165 |
<h4><i>On the Field</i></h4> |
|---|
| 166 |
|
|---|
| 167 |
<DL> |
|---|
| 168 |
|
|---|
| 169 |
<DT><b>Bill Moseley</b><DT> |
|---|
| 170 |
<dd> |
|---|
| 171 |
The person leading the charge. Rewrote much of the documentation and bundled it with the distribution |
|---|
| 172 |
(you now know who to complain to), added the "prog" document source feature, |
|---|
| 173 |
added Expat and libxml2 parsers, redesigned properties, and added many new and exciting features. |
|---|
| 174 |
</dd> |
|---|
| 175 |
|
|---|
| 176 |
<dt><b>Jose Manuel Ruiz</b></dt> |
|---|
| 177 |
|
|---|
| 178 |
<dd> |
|---|
| 179 |
Jose added phrase searching and has made huge contributions toward speed and memory |
|---|
| 180 |
usage improvements. He added result sorting, improved metanames and properties, merging, and searching. |
|---|
| 181 |
Swish is the powerful program it is today because of Jose. And there's more coming! |
|---|
| 182 |
</dd> |
|---|
| 183 |
|
|---|
| 184 |
<dt><B>David Norris</B></dt> |
|---|
| 185 |
|
|---|
| 186 |
<dd> |
|---|
| 187 |
David has provided ports to all flavors of Windows, as well as a Swish-e interface script written in PHP3. |
|---|
| 188 |
The windows version is now bundled with a self installer, making instalation just a click away. |
|---|
| 189 |
</dd> |
|---|
| 190 |
|
|---|
| 191 |
|
|---|
| 192 |
<dt><b>Peter Karman</b></dt> |
|---|
| 193 |
<dd>Peter added improvements to the ranking code and a new website design. His main role is creating more |
|---|
| 194 |
work for Bill.</dd> |
|---|
| 195 |
|
|---|
| 196 |
<dt><b>Roy Tennant</b></dt> |
|---|
| 197 |
|
|---|
| 198 |
<dd>Roy was the one who originally rescued SWISH when Kevin Hughes, the |
|---|
| 199 |
original author, was no longer supporting it. He has remained active in |
|---|
| 200 |
the effort since the beginning, but can't code in C to save his life, |
|---|
| 201 |
and therefore must remain content with web site support and other such |
|---|
| 202 |
minor tasks.</dd> |
|---|
| 203 |
|
|---|
| 204 |
</dl> |
|---|
| 205 |
|
|---|
| 206 |
<H4><i>Hall of Fame</i></H4> |
|---|
| 207 |
|
|---|
| 208 |
|
|---|
| 209 |
<dl> |
|---|
| 210 |
|
|---|
| 211 |
|
|---|
| 212 |
<dt><b>Bill Meier</b></dt> |
|---|
| 213 |
<dd> |
|---|
| 214 |
Bill improved the ranking code, and provided much help in memory optimizations and indexing speed. |
|---|
| 215 |
</dd> |
|---|
| 216 |
|
|---|
| 217 |
<dt><B>Rainer Scherg</B></dt> |
|---|
| 218 |
<dd> |
|---|
| 219 |
Rainer has worked on Swish-e for many years. Rainer added Swish-e's filters providing ways to index many |
|---|
| 220 |
document types. Rainer also added the powerful "-x" feature to easily control Swish-e's output. |
|---|
| 221 |
<dd> |
|---|
| 222 |
|
|---|
| 223 |
|
|---|
| 224 |
<dt><b>Giulia Hill</b></dt> |
|---|
| 225 |
|
|---|
| 226 |
<dd>Giulia was the first programmer to tackle upgrading SWISH to |
|---|
| 227 |
Swish-e, back when it was a project of the UC Berkeley Library. |
|---|
| 228 |
Without her, we would not have gotten out of the starting gate.</dd> |
|---|
| 229 |
|
|---|
| 230 |
<dt><b>Ron Klatchko</b></dt> <dd>Ron added the crawling capability to |
|---|
| 231 |
Swish-e, subsequently enhanced by others.</dd> |
|---|
| 232 |
|
|---|
| 233 |
<dt><b>Kirk Hastings</b></dt> |
|---|
| 234 |
|
|---|
| 235 |
<dd>Kirk programmed a neat Perl-based tool, called "AutoSwish" that |
|---|
| 236 |
allowed anyone to easily set up and maintain indexes from a web page. |
|---|
| 237 |
Unfortunately, this program is no longer a part of the release due to |
|---|
| 238 |
security issues.</dd> |
|---|
| 239 |
|
|---|
| 240 |
<dt><b>Bas Meijer</b></dt> |
|---|
| 241 |
|
|---|
| 242 |
<dd> |
|---|
| 243 |
Bas has been an active member of the Swish-e team since 1999 providing code enhancements and user support. |
|---|
| 244 |
He converted Swish-e's build process to the GNU Auto Configure script and ported Swish-e to a number of |
|---|
| 245 |
platforms. Bas has also provided add-on scripts to the Swish-e user community. |
|---|
| 246 |
</dd> |
|---|
| 247 |
|
|---|
| 248 |
|
|---|
| 249 |
<dt><B>Marc Gaulin</B></dt> |
|---|
| 250 |
<dd> |
|---|
| 251 |
Marc added code to support the document properties and stemming features, among other things. |
|---|
| 252 |
</dd> |
|---|
| 253 |
|
|---|
| 254 |
<dt><B>Warren Jones</B></dt> |
|---|
| 255 |
|
|---|
| 256 |
<dt><B><A HREF="http://is.rice.edu/~riddle/">Prentiss Riddle</A></B>, <A |
|---|
| 257 |
HREF="http://www.rice.edu/">Rice University</A></dt> |
|---|
| 258 |
<dd> |
|---|
| 259 |
The source of a number of |
|---|
| 260 |
SWISH bug fixes that were implemented in the first Swish-e release |
|---|
| 261 |
</dd> |
|---|
| 262 |
|
|---|
| 263 |
<dt><B>Mark Seiden</B></dt> |
|---|
| 264 |
|
|---|
| 265 |
</dl> |
|---|
| 266 |
|
|---|
| 267 |
|
|---|
| 268 |
</p> |
|---|
| 269 |
|
|---|
| 270 |
<P> |
|---|
| 271 |
We owe a debt of gratitude to <a href="http://kevcom.com/"><B>Kevin |
|---|
| 272 |
Hughes</B></a>, without whom there would be no SWISH, and definitely no |
|---|
| 273 |
Swish-e. His dedication to building useful tools and making them widely |
|---|
| 274 |
available should be an inspiration to us all. |
|---|
| 275 |
|
|---|
| 276 |
</p> |
|---|
| 277 |
|
|---|
| 278 |
|
|---|