alternate utf8-savvy tokenizer with iterator. initial naive benchmark shows it is about as fast, with far fewer malloc/free calls. could like speed it up some by refactoring how "context" is stored internally