Changeset 2047

Show
Ignore:
Timestamp:
03/07/08 22:33:37 (2 months ago)
Author:
karpet
Message:

document new config/header format

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • libswish3/trunk/doc/libswish3.3.pod.in

    r1955 r2047  
    225225For more details on any of these structures, see the SYNOPSIS. 
    226226 
     227=head2 swish_3 
     228 
     229The main data structure. A swish_3 object has a swish_Config, swish_Analyzer and swish_Parser 
     230object and delegates to eash as appropriate. 
     231 
     232This is typically the only object you need to create and use. 
     233 
    227234=head2 swish_Config 
    228235 
     
    234241A parser object. Required for executing any of the three C<swish_parse_*> functions. 
    235242 
    236 =head2 swish_ParseData 
     243=head2 swish_ParserData 
    237244 
    238245A parser data object. This object is passed around internally by the libxml2 
     
    245252You can iterate over the contents of the WordList like this: 
    246253 
    247  swish_debug_msg("%d words in list", list->nwords); 
     254 SWISH_DEBUG_MSG("%d words in list", list->nwords); 
    248255 list->current = list->head; 
    249256 while (list->current != NULL) 
    250257 { 
    251258        swish_debug_msg("   ---------- WORD ---------  "); 
    252         swish_debug_msg("word  : %s", list->current->word); 
    253         swish_debug_msg(" meta : %s", list->current->metaname); 
     259        swish_debug_msg("word     : %s", list->current->word); 
     260        swish_debug_msg(" meta    : %s", list->current->metaname); 
    254261        swish_debug_msg(" context : %s", list->current->context); 
    255         swish_debug_msg("  pos : %d", list->current->position); 
    256         swish_debug_msg("soffset: %d", list->current->start_offset); 
    257         swish_debug_msg("eoffset: %d", list->current->end_offset); 
     262        swish_debug_msg("  pos    : %d", list->current->position); 
     263        swish_debug_msg("soffset  : %d", list->current->start_offset); 
     264        swish_debug_msg("eoffset  : %d", list->current->end_offset); 
    258265             
    259266        list->current = list->current->next; 
     
    262269=head2 swish_Word 
    263270 
    264 An object representing one word or token in an object. The word's start and end offset, 
     271An object representing one word or token. The word's start and end offset, 
    265272position relative to other words, tag context and MetaName are all available in the object. 
    266273 
     
    278285 
    279286The I<handler> function pointer is the final link in the parsing chain. The function 
    280 pointer is set in the Parser object constructor, and is called by each of the  
     287pointer is set in the swish_Parser object constructor, and is called by each of the  
    281288swish_parse_* functions after the entire document has been parsed and (optionally) 
    282289tokenized. 
    283290 
    284 The I<handler> receives one argument: a swish_ParseData object containing all the metadata 
     291The I<handler> receives one argument: a swish_ParserData object containing all the metadata 
    285292and words in the document. 
    286293 
     
    289296 
    290297 void 
    291  my_handler( swish_ParseData * parse_data ) 
     298 my_handler( swish_ParserData * parse_data ) 
    292299 { 
    293300    swish_debug_docinfo( parse_data->docinfo ); 
     
    298305  
    299306B<IMPORTANT:> After the I<handler> function is called, all the structures referenced 
    300 by the swish_ParseData object are automatically freed, so if you intend to keep any of the 
     307by the swish_ParserData object are automatically freed, so if you intend to keep any of the 
    301308data for storing in an index, you will need to strdup() words, properties, docinfo, etc. 
    302309as part of your indexing code. 
    303310 
    304311See the example C<swish_lint.c> file for how to create and pass in a I<handler> 
    305 function pointer to the swish_Parser constructor. 
     312function pointer to the swish_init_swish3() constructor. 
    306313 
    307314=head1 Configuration API 
     
    316323 
    317324Since B<libswish3> already has a powerful XML parser built-in, it's much easier to  
    318 parse a configuration file written in XML than to port the Swish-e config-style parser 
     325parse a configuration file written in XML than to port the Swish-e config parser 
    319326to B<libswish3>. 
    320327 
     
    351358  <FollowSymLinks>yes</FollowSymLinks> 
    352359   
    353   <Meta name="foo" bias="+10" /> 
    354   <Meta name="bar" bias="-5" /> 
    355   <Meta name="swishtitle" bias="+50" alias="title" /> 
    356   <Meta name="other">color size weight</Meta> 
     360  <MetaNames> 
     361   <foo bias="+10" /> 
     362   <bar bias="-5" /> 
     363   <swishtitle bias="+50" alias="title" /> 
     364   <other>color size weight</other> 
     365  </MetaNames> 
    357366   
    358   <Prop name="foo" type="text" ignorecase="1" /> 
    359   <Prop name="bar" type="int" /> 
    360   <Prop name="lastmod" type="date" /> 
    361   <Prop name="bing" comparecase="1" /> 
    362   <Prop name="description" verbatim="1" max="10000" alias="body" length="20" /> 
    363   <Prop name="notsorted" sort="0" /> 
     367  <PropertyNames> 
     368   <foo type="text" ignorecase="1" /> 
     369   <bar type="int" /> 
     370   <lastmod type="date" /> 
     371   <bing comparecase="1" /> 
     372   <description verbatim="1" max="10000" alias="body" length="20" /> 
     373   <notsorted sort="0" /> 
     374  </PropertyNames> 
    364375   
    365376  <Tokenize>1</Tokenize> 
     
    389400attributes. 
    390401 
    391  <Meta name="foo" bias="10" /> 
     402 <foo bias="10" /> 
    392403 
    393404is the same thing as (in Swish-e style): 
     
    398409while: 
    399410 
    400  <Meta name="swishtitle" bias="50" alias="title" /> 
     411 <swishtitle bias="50" alias="title" /> 
    401412 
    402413is equivalent to: 
     
    409420You can still assign multiple aliases to a single MetaName: 
    410421 
    411  <Meta name="other">color size weight</meta
     422 <other>color size weight</other
    412423 
    413424is equivalent to: 
     
    418429In addition, there are some special features intended for use with HTML documents. 
    419430 
    420  <Meta name="links" html="1" alias="href" />      # same as HTMLLinksMetaName 
    421  <Meta name="images" html="1" alias="src" />      # same as ImageLinksMetaName 
    422  <Meta name="alttext" html="1" alias="alt" />     # same as IndexAltTagMetaName 
    423  <Meta name="as-text" html="1" alias="alt" />     # same as IndexAltTagMetaName 
     431 <links html="1" alias="href" />      # same as HTMLLinksMetaName 
     432 <images html="1" alias="src" />      # same as ImageLinksMetaName 
     433 <alttext html="1" alias="alt" />     # same as IndexAltTagMetaName 
     434 <as-text html="1" alias="alt" />     # same as IndexAltTagMetaName 
    424435 
    425436=head3 PropertyNames 
     
    432443Here's the example from above with equivalent Swish-e directives annotated: 
    433444 
    434  <Prop name="foo" ignorecase="1" /> 
     445 <foo ignorecase="1" /> 
    435446 # PropertyNamesIgnoreCase foo 
    436447 
    437  <Prop name="bar" type="int" /> 
     448 <bar type="int" /> 
    438449 # PropertyNamesNumeric bar 
    439450  
    440  <Prop name="lastmod" type="date" /> 
     451 <lastmod type="date" /> 
    441452 # PropertyNamesDate lastmod 
    442453  
    443  <Prop name="bing" comparecase="1" /> 
     454 <bing comparecase="1" /> 
    444455 # PropertyNamesCompareCase bing 
    445456  
    446  <Prop name="description" verbatim="1" max="10000" alias="body" length="20" /> 
     457 <description verbatim="1" max="10000" alias="body" length="20" /> 
    447458 # PropertyNamesNoStripChars description 
    448459 # PropertyNamesMaxLength 10000 description 
     
    450461 # PropertyNamesSortKeyLength 20 description 
    451462 
    452  <Prop name="notsorted" sort="0" /> 
     463 <notsorted sort="0" /> 
    453464 # PreSortedIndex foo bar lastmod bind description 
    454465 
     
    457468and might generate an error or unexpected behaviour: 
    458469 
    459  <Prop name="foo" ignorecase="1" type="int" />      # wrong 
    460  <Prop name="foo" comparecase="1" type="date" />    # wrong 
    461  <Prop name="foo" verbatim="1" type="int" />        # wrong 
    462  <Prop name="foo" sort="0" length="20" />           # wrong 
     470 <foo ignorecase="1" type="int" />      # wrong 
     471 <foo comparecase="1" type="date" />    # wrong 
     472 <foo verbatim="1" type="int" />        # wrong 
     473 <foo sort="0" length="20" />           # wrong 
    463474 
    464475=head2 Directives