Wintertree Software Inc.

Sentry Spelling Checker Engine for Java

Home Site index Contact us Catalog Shopping Cart Products Support Search

You are here: Home > Support > Sentry Spelling Checker Engine > Java SDK > Explanation of Mixed Digits and Mixed Case options


Explanation of Mixed Digits and Mixed Case options

Product: Sentry Spelling Checker Engine Java SDK

This topic contains an explanation of how IGNORE_MIXED_CASE_OPT, IGNORE_MIXED_DIGITS_OPT, REPORT_MIXED_CASE_OPT, and REPORT_MIXED_DIGITS_OPT operate.

When the Sentry engine checks spelling, it performs the following steps for each word:

  1. It determines if the word matches the criteria for any enabled "IGNORE" options, and if so, it skips the word and proceeds to the next word without performing the next step.

  2. It determines if the word matches the criteria for any enabled "REPORT" options, and if so, it reports the word through a bit set in the value returned by the SpellingSession.check method.

The two steps above are performed in the order listed. Thus, a word that has been ignored is essentially filtered out, so no further tests (including spelling) are performed on it. For this reason, enabling both options IGNORE_MIXED_CASE_OPT and REPORT_MIXED_CASE_OPT is pointless. A word containing mixed case will be filtered out (because IGNORE_MIXED_CASE_OPT is enabled), and will therefore never be reported as containing mixed case. IGNORE options take precedence over similarly named REPORT options.

The tests performed in Step 2 above are independent of each other. The tests for spelling, doubled words, mixed-case words, or words containing digits can all be enabled and disabled independently by enabling and disabling the corresponding REPORT options. All tests indicated by any enabled REPORT options are carried out, even if one or more of the other tests proves positive. Thus, the result returned by the check() method may indicate several conditions (problems) with a single word: The word may be misspelled, doubled, contain mixed case, and contain mixed digits all at once. This is why the value returned by the check() method is a bit-mask, so several conditions can be reported at once.

The presence of a word in a dictionary or lexicon affects only its spelling. If a word exists in a dictionary with the default IGNORE_ACTION, the Sentry engine will not report it as misspelled. However, if the word matches the criteria for a test indicated by one of the enabled REPORT options, the word will still be reported even if it exists in a dictionary. For example, assume both REPORT_MIXED_CASE_OPT and REPORT_SPELLING_OPT are enabled, and that the word "PrintScreen" does not exist in any open dictionary. The check() method will return both MISSPELLED_WORD_RSLT and MIXED_CASE_WORD_RSLT for "PrintScreen". If "PrintScreen" is then added to an open dictionary, the check() method will still stop when it encounters "PrintScreen", but this time it will return only MIXED_CASE_WORD_RSLT. Adding the word to a dictionary does not change the fact that it contains mixed case, and because REPORT_MIXED_CASE_OPT is enabled, the Sentry engine will still report to your application that the word contains mixed case.

The same situation exists when "Ignore All" is selected as the disposition for a reported word. Usually, selecting "Ignore All" for a reported word adds the word to a temporary dictionary, which prevents it from being reported as misspelled (i.e., prevents the check() method from returning MISSPELLED_WORD_RSLT for the word). However, as we have seen, adding a word to a dictionary (including a temporary dictionary) does not alter the fact that the word contains mixed case, so selecting "Ignore All" does not prevent the check() method from subsequently reporting the word with MIXED_CASE_WORD_RSLT.

Note that all of the above is true for REPORT_MIXED_DIGITS_OPT as well, when a word contains embedded digits, such as "Win2000".

You might consider not setting REPORT_MIXED_CASE_OPT or REPORT_MIXED_DIGITS_OPT. As stated above, these options cause the Sentry engine to perform separate, special tests against each word, and to report the results of those tests through bits set in the return values from the check() method. But unless you have some specific need to know if a word contains mixed case or embedded digits, the tests may be superfluous. Even if these two REPORT options are disabled, words such as "PrintScreen" or "Win2000" will still be reported as misspelled if they are not contained within an open dictionary. Presumably if they are contained in a dictionary, the intention is that they are correct and should not be reported for any reason. Case errors such as "TUesday" and missing-space errors such as "June5" will still be reported as misspellings (unless "TUesday" and "June5" happen to exist in an open dictionary).


Home Site index Contact us Catalog Shopping Cart Products Support Search


Copyright © 2015 Wintertree Software Inc.