root/swish_website/src/devel/index.html

Revision 1918, 8.8 kB (checked in by karpet, 2 years ago)

sorry for the spam. just put swish3 back on devel page with link to wiki

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
Line 
1 [% META
2     title = "Development Information"
3     id    = "development"
4     author = '$Author$'
5 %]
6
7
8 <h1>Development</h1>
9
10 <p>
11 The current stable release is [% link_to_page('download', swish.current_version ) %].
12 </p>
13
14 <p>
15 Swish-e is continually under development.
16 This page contains a laundry list of requested features planned for a future
17 Swish-e release. To request new features, bug fixes, or (best of all)
18 to submit code patches, send e-mail to the
19
20 [% link_to_page('discuss' , 'Swish-e mailing list' ) %].
21 </p>
22
23 <h3>Daily Builds</h3>
24
25 <p>
26 Swish-e source is available for anonymous public download from
27 the [% link_to_page('cvs', 'swish-e subversion server' ) %].
28 </p>
29
30 <p>
31 The daring and adventurous can download the
32 <a href="[% site.url.latest_snapshot %]">daily build snapshot</a> from
33 the [% link_to_page('daily') %] page.
34 This is <strong>not an official release</strong> of Swish-e,
35 rather the current development version.
36 There is no guarantee that these packages run.
37 Please do not use this code in production.
38 </p>
39
40 <p>
41 For Windows development binary (pre-compiled) snapshots, please
42 see <a href="http://www.webaugur.com/wares/files/swish-e/daily/">http://www.webaugur.com/wares/files/swish-e/daily/</a>
43
44 <br />
45 The most current Windows <strong>development</strong> version is
46 <a href="http://www.webaugur.com/wares/files/swish-e/daily/swish-latest.exe">here</a>.
47 </p>
48
49 <p>
50 Questions regarding daily development builds,
51 or about using Swish-e in general, should be directed to
52 the [% link_to_page('discuss', 'Swish-e mailing list' ) %].
53 </p>
54
55
56 <h3>Features planned for 2.6</h3>
57
58 <p>
59 <ul>
60 <li>Remove expat and other older parsers. Libxml2 will be default (only) parser.</li>
61 <li>Remove -S http method.</li>
62 <li>Documentation overhaul.</li>
63
64 </ul>
65 </p>
66
67 <h3>Features planned for 3.0</h3>
68
69 <p>
70 Swish-e 3.0 (abbreviated Swish3) will be a complete overhaul of the code.
71 You can <a href="http://dev.swish-e.org/wiki/swish3">track development progress here</a>.
72 Major feature improvements will include:
73
74 <dl>
75  <dt>Unicode support</dt>
76  <dd>Unicode is the <a href='http://www.unicode.org/unicode/faq/'>international standard
77  for character encodings</a>. Swish3 will implement
78  support for the <a href='http://www.cl.cam.ac.uk/~mgk25/unicode.html'>UTF-8</a>
79  <a href='http://czyborra.com/utf/'>character encoding</a>,
80  which should handle all major languages in the world (UTF-8 handles up to
81  2,147,483,648 unique characters).
82  The Swish-e developers need input from non-English language experts.
83  Please contribute to the discussion at the
84  
85   [% link_to_page('discuss' , 'Swish-e mailing list' ) %].
86  
87  Some significant known issues include:
88  <p />
89  <dl>
90   <dt>lowercase vs. UPPERCASE</dt>
91   <dd>Version 2.x uses <tt>tolower()</tt> to lowercase all characters
92   before searching and indexing. Should the same approach be used for UTF-8? Will this have
93   significant impact on usability for non-English languages?
94   </dd>
95   <dt>Wildcards</dt>
96   <dd>Version 2.x uses an internal table to support wildcard searching with <tt>*</tt>.
97   The table assumes 8-bit (non-Unicode) character encoding. That approach will likely need
98   to be re-thought for multibyte encodings like UTF-8.
99   </dd>
100   <dt>Tokenizing</dt>
101   <dd>Version 2.x uses 5 different configuration options to control how a
102   'word' (token) is defined. The basic assumption is that a word is defined by which characters it
103   <i>includes</i>. That assumption is based on a manageable character set of 256 characters.
104   However, the sheer size of UTF-8 makes that system unworkable. Instead, some kind of
105   regular expression library will likely be used.
106   </dd>
107  
108   <dt>Stemming</dt><dd>The stemmers used will need full international support.
109   </dd>
110   <dt>Configuration format</dt>
111   <dd>Since Swish-e depends on a configuration file for StopWords, Character
112   definitions, etc., the parsing of the configuration file must support UTF-8 as well.
113   The current idea is to switch to XML-style configuration files and use Libxml2 to parse
114   them.
115   </dd>
116  </dl>
117  
118  </dd>
119
120  <dt>Incremental indexing</dt>
121  <dd>Swish3 will support true incremental indexing. This will allow for document records
122  to be modified, added and deleted in an existing index. This feature may or may not build
123  on the version 2.x experimental btree/incremental feature.
124  </dd>
125  
126  <dt>Scaling</dt>
127  <dd>Swish3 will reliably scale to larger (multimillion) document collections.
128  </dd>
129  
130  <dt>Indexing API</dt>
131  <dd>Swish3 will include an indexing API in addition to the current searching API.</dd>
132  
133  <dt>Streamlined feature set</dt>
134  <dd>Swish3 will not contain several features in the current version:
135  <ul>
136   <li>Expat parsers</li>
137   <li><tt>-S http</tt> indexing method and related configuration options</li>
138   <li>Older stemmers</li>
139   <li>Current native index format</li>
140  </ul>
141  </dd>
142  
143  <dt>Alternate index backends</dt>
144  <dd>Swish3 will offer alternate index backends using available open source libraries,
145  such as <a href='http://xapian.org/'>Xapian</a>,
146  <a href='http://hyperestraier.sourceforge.net/'>HyperEstraier</a>,
147  <a href='http://incubator.apache.org/lucene4c/'>Lucene</a>, or
148  <a href='http://www.lemurproject.org/'>Lemur</a>.
149  </dd>
150  
151 </dl>
152 </p>
153
154
155 <hr />
156
157 <h3>The Players</h3>
158
159 <p>
160 You can't tell the players without a program. And we wouldn't have a program without
161 all these players! All these folks have made key contributions to Swish-e:
162 If you are not listed here, and you should be, <a href="mailto:roy.tennant@ucop.edu">drop a line</a>.
163
164
165 <h4><i>On the Field</i></h4>
166
167 <DL>
168
169 <DT><b>Bill Moseley</b><DT>
170 <dd>
171 The person leading the charge.  Rewrote much of the documentation and bundled it with the distribution
172 (you now know who to complain to), added the "prog" document source feature,
173 added Expat and libxml2 parsers, redesigned properties, and added many new and exciting features.
174 </dd>
175
176 <dt><b>Jose Manuel Ruiz</b></dt>
177
178 <dd>
179 Jose added phrase searching and has made huge contributions toward speed and memory
180 usage improvements.  He added result sorting, improved metanames and properties, merging, and searching.
181 Swish is the powerful program it is today because of Jose.  And there's more coming!
182 </dd>
183
184 <dt><B>David Norris</B></dt>
185
186 <dd>
187 David has provided ports to all flavors of Windows, as well as a Swish-e interface script written in PHP3.
188 The windows version is now bundled with a self installer, making instalation just a click away.
189 </dd>
190
191
192 <dt><b>Peter Karman</b></dt>
193 <dd>Peter added improvements to the ranking code and a new website design. His main role is creating more
194 work for Bill.</dd>
195
196 <dt><b>Roy Tennant</b></dt>
197
198 <dd>Roy was the one who originally rescued SWISH when Kevin Hughes, the
199 original author, was no longer supporting it. He has remained active in
200 the effort since the beginning, but can't code in C to save his life,
201 and therefore must remain content with web site support and other such
202 minor tasks.</dd>
203
204 </dl>
205
206 <H4><i>Hall of Fame</i></H4>
207
208
209 <dl>
210
211
212 <dt><b>Bill Meier</b></dt>
213 <dd>
214 Bill improved the ranking code, and provided much help in memory optimizations and indexing speed.
215 </dd>
216
217 <dt><B>Rainer Scherg</B></dt>
218 <dd>
219 Rainer has worked on Swish-e for many years.  Rainer added Swish-e's filters providing ways to index many
220 document types.  Rainer also added the powerful "-x" feature to easily control Swish-e's output.
221 <dd>
222
223
224 <dt><b>Giulia Hill</b></dt>
225
226 <dd>Giulia was the first programmer to tackle upgrading SWISH to
227 Swish-e, back when it was a project of the UC Berkeley Library.
228 Without her, we would not have gotten out of the starting gate.</dd>
229
230 <dt><b>Ron Klatchko</b></dt> <dd>Ron added the crawling capability to
231 Swish-e, subsequently enhanced by others.</dd>
232
233 <dt><b>Kirk Hastings</b></dt>
234
235 <dd>Kirk programmed a neat Perl-based tool, called "AutoSwish" that
236 allowed anyone to easily set up and maintain indexes from a web page.
237 Unfortunately, this program is no longer a part of the release due to
238 security issues.</dd>
239
240 <dt><b>Bas Meijer</b></dt>
241
242 <dd>
243 Bas has been an active member of the Swish-e team since 1999 providing code enhancements and user support. 
244 He converted Swish-e's build process to the GNU Auto Configure script and ported Swish-e to a number of
245 platforms.  Bas has also provided add-on scripts to the Swish-e user community.
246 </dd>
247
248
249 <dt><B>Marc Gaulin</B></dt>
250 <dd>
251 Marc added code to support the document properties and stemming features, among other things.
252 </dd>
253
254 <dt><B>Warren Jones</B></dt>
255
256 <dt><B><A HREF="http://is.rice.edu/~riddle/">Prentiss Riddle</A></B>, <A
257 HREF="http://www.rice.edu/">Rice University</A></dt>
258 <dd>
259 The source of a number of
260 SWISH bug fixes that were implemented in the first Swish-e release
261 </dd>
262
263 <dt><B>Mark Seiden</B></dt>
264
265 </dl>
266
267
268 </p>
269
270 <P>
271 We owe a debt of gratitude to <a href="http://kevcom.com/"><B>Kevin
272 Hughes</B></a>, without whom there would be no SWISH, and definitely no
273 Swish-e. His dedication to building useful tools and making them widely
274 available should be an inspiration to us all.
275
276 </p>
277
278
Note: See TracBrowser for help on using the browser.