Wintertree Software Inc.

Sentry Spelling Checker Engine for Java

Home Site index Contact us Catalog Shopping Cart Products Support Search

You are here: Home > Support > Sentry Spelling Checker Engine > Java SDK > "Curly apostrophe" not recognized


"Curly apostrophe" from Microsoft Word not recognized

Product: Sentry Spelling Checker Engine Java SDK Version 5.7 - 5.10

Description: Microsoft Word automatically replaces the ASCII apostrophe character (') with a "curly apostrophe." When text copied from Microsoft Word is submitted to the Sentry engine for a spelling check, contracted words containing the curly apostrophe are reported as misspelled.

Discussion: The curly apostrophe, also known as Right Single Quotation Mark, is Unicode character '\u2019'. For a variety of reasons, Wintertree Software's English dictionaries (including American, British and Canadian variants) use only characters from the ASCII character set (which happens to be identical to the lowest 127 characters in the Unicode character set). The Right Single Quotation Mark character is not a member of the ASCII character set, so words containing it are not included in the English dictionaries. However, a supplemental dictionary file named accent.tlx is included with Sentry Java SDK. Prior to Sentry Java SDK version 5.10, this dictionary file contains words used in English with non-ASCII characters, including foreign words (such as naïve and soufflé) and English contractions using the curly apostrophe. However, the pre-5.10 accent.tlx uses single-byte characters mainly from the ISO-8859-1 character set (which happens to be identical to the lowest 256 characters in the Unicode character set). The curly apostrophe (character code 146) is not in fact a member of the ISO-8859-1 character set and therefore is not a member of the Unicode character set. The curly apostrophe character with code 146 is specific to Western versions of Microsoft Windows. Java software, including the Sentry spelling engine, use the Unicode character set. Therefore, the contracted words containing curly apostrophes with code 146 in accent.tlx are not compatible with Sentry Java SDK.

(Beginning in Sentry Java SDK version 5.10, the included accent.tlx file uses the u2019 character. Any text containing curly apostrophes must use the u2019 character for the apostrophe, or the Sentry engine will report any words using the apostrophe as misspelled.)

Solution:

If you are using Sentry Java SDK 5.10 or later, add accent.tlx (located in ssce/runtime/lex) to any properties files as a main lexicon.

If you are using Sentry Java SDK 5.9 or earlier, you can download a Unicode form of accent.tlx (accentu.tlx) by clicking here (be sure to save the URL). Save accentu.tlx to the ssce/runtime/lex directory. This lexicon uses u2019 to represent the curly apostrophe character, so be sure to use that encoding for curly apostrophes. To use accentu.tlx, replace accent.tlx with accentu.tlx in any properties files.


Home Site index Contact us Catalog Shopping Cart Products Support Search


Copyright © 2015 Wintertree Software Inc.