Show
Ignore:
Timestamp:
02/28/07 11:57:01 (2 years ago)
Author:
karpet
Message:

sorry for the spam. just put swish3 back on devel page with link to wiki

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • swish_website/src/devel/index.html

    r1917 r1918  
    6666 
    6767<h3>Features planned for 3.0</h3> 
    68 See the [% link_to_page('swish3' , 'Swish3 development page' ) %]. 
     68 
     69<p> 
     70Swish-e 3.0 (abbreviated Swish3) will be a complete overhaul of the code. 
     71You can <a href="http://dev.swish-e.org/wiki/swish3">track development progress here</a>. 
     72Major feature improvements will include: 
     73 
     74<dl> 
     75 <dt>Unicode support</dt> 
     76 <dd>Unicode is the <a href='http://www.unicode.org/unicode/faq/'>international standard  
     77 for character encodings</a>. Swish3 will implement 
     78 support for the <a href='http://www.cl.cam.ac.uk/~mgk25/unicode.html'>UTF-8</a> 
     79 <a href='http://czyborra.com/utf/'>character encoding</a>, 
     80 which should handle all major languages in the world (UTF-8 handles up to  
     81 2,147,483,648 unique characters). 
     82 The Swish-e developers need input from non-English language experts.  
     83 Please contribute to the discussion at the 
     84   
     85  [% link_to_page('discuss' , 'Swish-e mailing list' ) %]. 
     86   
     87 Some significant known issues include: 
     88 <p /> 
     89 <dl> 
     90  <dt>lowercase vs. UPPERCASE</dt> 
     91  <dd>Version 2.x uses <tt>tolower()</tt> to lowercase all characters 
     92  before searching and indexing. Should the same approach be used for UTF-8? Will this have 
     93  significant impact on usability for non-English languages?  
     94  </dd> 
     95  <dt>Wildcards</dt> 
     96  <dd>Version 2.x uses an internal table to support wildcard searching with <tt>*</tt>. 
     97  The table assumes 8-bit (non-Unicode) character encoding. That approach will likely need 
     98  to be re-thought for multibyte encodings like UTF-8. 
     99  </dd> 
     100  <dt>Tokenizing</dt> 
     101  <dd>Version 2.x uses 5 different configuration options to control how a  
     102  'word' (token) is defined. The basic assumption is that a word is defined by which characters it 
     103  <i>includes</i>. That assumption is based on a manageable character set of 256 characters. 
     104  However, the sheer size of UTF-8 makes that system unworkable. Instead, some kind of 
     105  regular expression library will likely be used. 
     106  </dd> 
     107   
     108  <dt>Stemming</dt><dd>The stemmers used will need full international support. 
     109  </dd> 
     110  <dt>Configuration format</dt> 
     111  <dd>Since Swish-e depends on a configuration file for StopWords, Character 
     112  definitions, etc., the parsing of the configuration file must support UTF-8 as well. 
     113  The current idea is to switch to XML-style configuration files and use Libxml2 to parse 
     114  them. 
     115  </dd> 
     116 </dl> 
     117  
     118 </dd> 
     119 
     120 <dt>Incremental indexing</dt> 
     121 <dd>Swish3 will support true incremental indexing. This will allow for document records 
     122 to be modified, added and deleted in an existing index. This feature may or may not build 
     123 on the version 2.x experimental btree/incremental feature. 
     124 </dd> 
     125  
     126 <dt>Scaling</dt> 
     127 <dd>Swish3 will reliably scale to larger (multimillion) document collections. 
     128 </dd> 
     129  
     130 <dt>Indexing API</dt> 
     131 <dd>Swish3 will include an indexing API in addition to the current searching API.</dd> 
     132  
     133 <dt>Streamlined feature set</dt> 
     134 <dd>Swish3 will not contain several features in the current version: 
     135 <ul> 
     136  <li>Expat parsers</li> 
     137  <li><tt>-S http</tt> indexing method and related configuration options</li> 
     138  <li>Older stemmers</li> 
     139  <li>Current native index format</li> 
     140 </ul> 
     141 </dd> 
     142  
     143 <dt>Alternate index backends</dt> 
     144 <dd>Swish3 will offer alternate index backends using available open source libraries, 
     145 such as <a href='http://xapian.org/'>Xapian</a>,  
     146 <a href='http://hyperestraier.sourceforge.net/'>HyperEstraier</a>, 
     147 <a href='http://incubator.apache.org/lucene4c/'>Lucene</a>, or  
     148 <a href='http://www.lemurproject.org/'>Lemur</a>. 
     149 </dd> 
     150  
     151</dl> 
     152</p> 
     153 
    69154 
    70155<hr />