Changeset 2047 for libswish3/trunk/doc/libswish3.3.pod.in
- Timestamp:
- 03/07/08 22:33:37 (6 months ago)
- Files:
-
- libswish3/trunk/doc/libswish3.3.pod.in (modified) (16 diffs)
Legend:
- Unmodified
- Added
- Removed
- Modified
- Copied
- Moved
libswish3/trunk/doc/libswish3.3.pod.in
r1955 r2047 225 225 For more details on any of these structures, see the SYNOPSIS. 226 226 227 =head2 swish_3 228 229 The main data structure. A swish_3 object has a swish_Config, swish_Analyzer and swish_Parser 230 object and delegates to eash as appropriate. 231 232 This is typically the only object you need to create and use. 233 227 234 =head2 swish_Config 228 235 … … 234 241 A parser object. Required for executing any of the three C<swish_parse_*> functions. 235 242 236 =head2 swish_Parse Data243 =head2 swish_ParserData 237 244 238 245 A parser data object. This object is passed around internally by the libxml2 … … 245 252 You can iterate over the contents of the WordList like this: 246 253 247 swish_debug_msg("%d words in list", list->nwords);254 SWISH_DEBUG_MSG("%d words in list", list->nwords); 248 255 list->current = list->head; 249 256 while (list->current != NULL) 250 257 { 251 258 swish_debug_msg(" ---------- WORD --------- "); 252 swish_debug_msg("word : %s", list->current->word);253 swish_debug_msg(" meta : %s", list->current->metaname);259 swish_debug_msg("word : %s", list->current->word); 260 swish_debug_msg(" meta : %s", list->current->metaname); 254 261 swish_debug_msg(" context : %s", list->current->context); 255 swish_debug_msg(" pos : %d", list->current->position);256 swish_debug_msg("soffset : %d", list->current->start_offset);257 swish_debug_msg("eoffset : %d", list->current->end_offset);262 swish_debug_msg(" pos : %d", list->current->position); 263 swish_debug_msg("soffset : %d", list->current->start_offset); 264 swish_debug_msg("eoffset : %d", list->current->end_offset); 258 265 259 266 list->current = list->current->next; … … 262 269 =head2 swish_Word 263 270 264 An object representing one word or token in an object. The word's start and end offset,271 An object representing one word or token. The word's start and end offset, 265 272 position relative to other words, tag context and MetaName are all available in the object. 266 273 … … 278 285 279 286 The I<handler> function pointer is the final link in the parsing chain. The function 280 pointer is set in the Parser object constructor, and is called by each of the287 pointer is set in the swish_Parser object constructor, and is called by each of the 281 288 swish_parse_* functions after the entire document has been parsed and (optionally) 282 289 tokenized. 283 290 284 The I<handler> receives one argument: a swish_Parse Data object containing all the metadata291 The I<handler> receives one argument: a swish_ParserData object containing all the metadata 285 292 and words in the document. 286 293 … … 289 296 290 297 void 291 my_handler( swish_Parse Data * parse_data )298 my_handler( swish_ParserData * parse_data ) 292 299 { 293 300 swish_debug_docinfo( parse_data->docinfo ); … … 298 305 299 306 B<IMPORTANT:> After the I<handler> function is called, all the structures referenced 300 by the swish_Parse Data object are automatically freed, so if you intend to keep any of the307 by the swish_ParserData object are automatically freed, so if you intend to keep any of the 301 308 data for storing in an index, you will need to strdup() words, properties, docinfo, etc. 302 309 as part of your indexing code. 303 310 304 311 See the example C<swish_lint.c> file for how to create and pass in a I<handler> 305 function pointer to the swish_ Parserconstructor.312 function pointer to the swish_init_swish3() constructor. 306 313 307 314 =head1 Configuration API … … 316 323 317 324 Since B<libswish3> already has a powerful XML parser built-in, it's much easier to 318 parse a configuration file written in XML than to port the Swish-e config -styleparser325 parse a configuration file written in XML than to port the Swish-e config parser 319 326 to B<libswish3>. 320 327 … … 351 358 <FollowSymLinks>yes</FollowSymLinks> 352 359 353 <Meta name="foo" bias="+10" /> 354 <Meta name="bar" bias="-5" /> 355 <Meta name="swishtitle" bias="+50" alias="title" /> 356 <Meta name="other">color size weight</Meta> 360 <MetaNames> 361 <foo bias="+10" /> 362 <bar bias="-5" /> 363 <swishtitle bias="+50" alias="title" /> 364 <other>color size weight</other> 365 </MetaNames> 357 366 358 <Prop name="foo" type="text" ignorecase="1" /> 359 <Prop name="bar" type="int" /> 360 <Prop name="lastmod" type="date" /> 361 <Prop name="bing" comparecase="1" /> 362 <Prop name="description" verbatim="1" max="10000" alias="body" length="20" /> 363 <Prop name="notsorted" sort="0" /> 367 <PropertyNames> 368 <foo type="text" ignorecase="1" /> 369 <bar type="int" /> 370 <lastmod type="date" /> 371 <bing comparecase="1" /> 372 <description verbatim="1" max="10000" alias="body" length="20" /> 373 <notsorted sort="0" /> 374 </PropertyNames> 364 375 365 376 <Tokenize>1</Tokenize> … … 389 400 attributes. 390 401 391 < Meta name="foo"bias="10" />402 <foo bias="10" /> 392 403 393 404 is the same thing as (in Swish-e style): … … 398 409 while: 399 410 400 < Meta name="swishtitle"bias="50" alias="title" />411 <swishtitle bias="50" alias="title" /> 401 412 402 413 is equivalent to: … … 409 420 You can still assign multiple aliases to a single MetaName: 410 421 411 < Meta name="other">color size weight</meta>422 <other>color size weight</other> 412 423 413 424 is equivalent to: … … 418 429 In addition, there are some special features intended for use with HTML documents. 419 430 420 < Meta name="links"html="1" alias="href" /> # same as HTMLLinksMetaName421 < Meta name="images"html="1" alias="src" /> # same as ImageLinksMetaName422 < Meta name="alttext"html="1" alias="alt" /> # same as IndexAltTagMetaName423 < Meta name="as-text"html="1" alias="alt" /> # same as IndexAltTagMetaName431 <links html="1" alias="href" /> # same as HTMLLinksMetaName 432 <images html="1" alias="src" /> # same as ImageLinksMetaName 433 <alttext html="1" alias="alt" /> # same as IndexAltTagMetaName 434 <as-text html="1" alias="alt" /> # same as IndexAltTagMetaName 424 435 425 436 =head3 PropertyNames … … 432 443 Here's the example from above with equivalent Swish-e directives annotated: 433 444 434 < Prop name="foo"ignorecase="1" />445 <foo ignorecase="1" /> 435 446 # PropertyNamesIgnoreCase foo 436 447 437 < Prop name="bar"type="int" />448 <bar type="int" /> 438 449 # PropertyNamesNumeric bar 439 450 440 < Prop name="lastmod"type="date" />451 <lastmod type="date" /> 441 452 # PropertyNamesDate lastmod 442 453 443 < Prop name="bing"comparecase="1" />454 <bing comparecase="1" /> 444 455 # PropertyNamesCompareCase bing 445 456 446 < Prop name="description"verbatim="1" max="10000" alias="body" length="20" />457 <description verbatim="1" max="10000" alias="body" length="20" /> 447 458 # PropertyNamesNoStripChars description 448 459 # PropertyNamesMaxLength 10000 description … … 450 461 # PropertyNamesSortKeyLength 20 description 451 462 452 < Prop name="notsorted"sort="0" />463 <notsorted sort="0" /> 453 464 # PreSortedIndex foo bar lastmod bind description 454 465 … … 457 468 and might generate an error or unexpected behaviour: 458 469 459 < Prop name="foo"ignorecase="1" type="int" /> # wrong460 < Prop name="foo"comparecase="1" type="date" /> # wrong461 < Prop name="foo"verbatim="1" type="int" /> # wrong462 < Prop name="foo"sort="0" length="20" /> # wrong470 <foo ignorecase="1" type="int" /> # wrong 471 <foo comparecase="1" type="date" /> # wrong 472 <foo verbatim="1" type="int" /> # wrong 473 <foo sort="0" length="20" /> # wrong 463 474 464 475 =head2 Directives
