Sentry Spelling Checker Engine for Java

Home Site index Contact us Catalog Shopping Cart Products Support Search

You are here: Home > Support > Sentry Spelling Checker Engine > Java SDK > Misspelled words not detected after "&" or "<"

Misspelled words not detected after "&" or "<"

Product: Sentry Spelling Checker Engine Java SDK, version 5.8 and later

Problem: When I use HTMLStringWordParser, any misspelled words after a "&" or "<" character in the text are not detected.

Discussion: This behavior is by design. HTMLStringWordParser expects the string to contain correctly formatted HTML. In correctly formatted HTML, the "&" and "<" are special characters (known as meta-characters). The "&" character is used to signal the beginning of an HTML character entity, such as "©" for a copyright symbol. The "<" is used to signal the beginning of an HTML markup, such as "<b>" for boldface. When HTMLStringWordParser sees these characters, it expects either a character entity or markup to follow. More accurately, it begins skipping text until it encounters the terminator of the character entity or markup. The terminator of a character entity is ";", and the terminator of a markup is ">". Thus, the appearance of "&" in the text acts like a switch that causes text to be skipped until a ";" appears (similarly for "<" and ">"). This behavior is documented in the JavaDoc documentation for HTMLStringWordParser.

Beginning in version 5.10, HTMLStringWordParser will skip text when a "&" or "<" character is encountered until the corresponding terminator or any white space character appears, whichever comes first.

Solution: Either user StringWordParser instead of HTMLStringWordParser, or ensure the text contains correctly formatted HTML. A literal "&" character should be entered as "&", and a literal "<" character should be entered as "<".

Home Site index Contact us Catalog Shopping Cart Products Support Search