Standard tokenizer
# isa Plucene::Analysis::CharTokenizer
This is the standard tokenizer.
This should be a good tokenizer for most European-language documents.
The regular expression for tokenising.
Remove 's and .