Class AbstractWordTokenizer

  • All Implemented Interfaces:
    WordTokenizer
    Direct Known Subclasses:
    FileWordTokenizer, StringWordTokenizer

    public abstract class AbstractWordTokenizer
    extends java.lang.Object
    implements WordTokenizer
    This class tokenizes a input string.

    It also allows for the string to be mutated. The result after the spell checking is completed is available to the call to getFinalText

    Author:
    Jason Height(jheight@chariot.net.au), Anthony Roy (ajr@antroy.co.uk)
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected Word currentWord
      The word being analyzed
      protected WordFinder finder
      The word finder used to filter out words which are non pertinent to spell checking
      protected java.text.BreakIterator sentenceIterator
      An iterator to work through the sentence
      protected int wordCount
      The cumulative word count that have been processed
    • Method Summary

      All Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String getContext()
      Returns the current text that is being tokenized (includes any changes that have been made)
      int getCurrentWordCount()
      Returns the current number of words that have been processed
      int getCurrentWordEnd()
      Returns the end of the current word in the text
      int getCurrentWordPosition()
      Returns the index of the start of the current word in the text
      boolean hasMoreWords()
      Returns true if there are more words that can be processed in the string
      boolean isNewSentence()
      returns true if the current word is at the start of a sentence
      java.lang.String nextWord()
      Returns searches for the next word in the text, and returns that word.
      abstract void replaceWord​(java.lang.String newWord)
      Replaces the current word token
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • currentWord

        protected Word currentWord
        The word being analyzed
      • finder

        protected WordFinder finder
        The word finder used to filter out words which are non pertinent to spell checking
      • sentenceIterator

        protected java.text.BreakIterator sentenceIterator
        An iterator to work through the sentence
      • wordCount

        protected int wordCount
        The cumulative word count that have been processed
    • Constructor Detail

      • AbstractWordTokenizer

        public AbstractWordTokenizer​(java.lang.String text)
        Creates a new AbstractWordTokenizer object.
        Parameters:
        text - the text to process.
      • AbstractWordTokenizer

        public AbstractWordTokenizer​(WordFinder wf)
        Creates a new AbstractWordTokenizer object.
        Parameters:
        wf - the custom WordFinder to use in searching for words.
    • Method Detail

      • getCurrentWordCount

        public int getCurrentWordCount()
        Returns the current number of words that have been processed
        Specified by:
        getCurrentWordCount in interface WordTokenizer
        Returns:
        number of words so far iterated.
      • getCurrentWordEnd

        public int getCurrentWordEnd()
        Returns the end of the current word in the text
        Specified by:
        getCurrentWordEnd in interface WordTokenizer
        Returns:
        index in string of the end of the current word.
        Throws:
        WordNotFoundException - current word has not yet been set.
      • getCurrentWordPosition

        public int getCurrentWordPosition()
        Returns the index of the start of the current word in the text
        Specified by:
        getCurrentWordPosition in interface WordTokenizer
        Returns:
        index in string of the start of the current word.
        Throws:
        WordNotFoundException - current word has not yet been set.
      • hasMoreWords

        public boolean hasMoreWords()
        Returns true if there are more words that can be processed in the string
        Specified by:
        hasMoreWords in interface WordTokenizer
        Returns:
        true if there are further words in the text.
      • nextWord

        public java.lang.String nextWord()
        Returns searches for the next word in the text, and returns that word.
        Specified by:
        nextWord in interface WordTokenizer
        Returns:
        the string representing the current word.
        Throws:
        WordNotFoundException - search string contains no more words.
      • replaceWord

        public abstract void replaceWord​(java.lang.String newWord)
        Replaces the current word token
        Specified by:
        replaceWord in interface WordTokenizer
        Parameters:
        newWord - replacement word.
        Throws:
        WordNotFoundException - current word has not yet been set.
      • getContext

        public java.lang.String getContext()
        Returns the current text that is being tokenized (includes any changes that have been made)
        Specified by:
        getContext in interface WordTokenizer
        Returns:
        the text being tokenized.
      • isNewSentence

        public boolean isNewSentence()
        returns true if the current word is at the start of a sentence
        Specified by:
        isNewSentence in interface WordTokenizer
        Returns:
        true if the current word starts a sentence.
        Throws:
        WordNotFoundException - current word has not yet been set.