Sentry Spelling Checker Engine Java SDK |
You are here: Home > Support > Sentry Spelling Checker Engine > Java SDK
Contents:
A new example, BackgroundDemo, demonstrates how the Sentry engine can be used to check spelling in JTextComponents in the background (a.k.a. "as-you-type," "on-the-fly," or passive spell-checking). This is a feature your users will really appreciate, since it is used in virtually all major word processors. Misspelled words are highlighted by underlining with a red zigzag line (the line color is customizable, as is the method of highlighting). Highlighting is removed from misspelled words as soon as they are corrected, giving the user immediate feedback when a word changes from misspelled to correct. A popup menu can be displayed over a misspelled word. The popup menu contains suggested replacements for the misspelled word, plus optionally Ignore All and Add items. Selecting a suggestion in the popup menu replaces the misspelled word with the suggestion. Background checking is implemented by the BackgroundChecker class. BackgroundChecker works with any component derived from JTextComponent, including JTextField, JTextArea, and JTextPane.
A new example, JTextPaneInteractiveDemo, demonstrates use of the Sentry engine with JTextPane components. Text in JTextPanes may be formatted (different font faces, size, attributes); the demo preserves formatting when misspelled words are corrected. The new example uses JTextComponentWordParser, which replaces JTextAreaWordParser, and is usable with any class derived from JTextComponent, including JTextField, JTextArea, and JTextPane. JTextComponentWordParser is also used in JTextAreaInteractiveDemo, a replacement for SwingDemo.
A set of six servlet examples is now included with Sentry Java SDK. The servlet examples demonstrate various typical ways a spelling checker can be employed on the server side of a web-based application. See the Sentry programmer's guide for more information on the servlet examples.
The SpellingSession, PropSpellingSession, Lexicon, WordComparator, and WordParser classes and interfaces (and any classes which extend or implement them) now implement the java.io.Serializable interface.
Improvements in the handling of misspelled words with doubled letters or single letters that should be doubled have been made in TypographicalComparator.
Examples included with Sentry Java SDK that can be run from a command line now accept the license key on the command line. The license key is specified following the "-k" switch, and should be entered as a hexadecimal constant (e.g., 0x1234ABCD).
TextAreaWordParser and JTextComponentWordParser (formerly known as JTextAreaWordParser) now strip '\r' characters from text obtained from text components. Some text components on some platforms represent newlines internally as a single character ('\n') but externally (in the value returned by getText()) as "\r\n". This throws off position calculations, making highlighting of misspelled words inaccurate. By stripping the '\r' character, the positions calculated by the WordParser derivatives remain synchronized with the contents of the text component.
LexCompressor now sorts word list files before compression, meaning it is no longer necessary to sort them externally. This also applies to the SqLex program.
When IGNORE_DOMAIN_NAMES_OPT is enabled, underscores and hyphens are now included wherever alphanumerics can be included, in accordance with the syntax rules for domain names.
The Sentry Java SDK programmer's guide is now included in PDF format for easier printing and searching. JavaDoc documentation for the classes in the Sentry class library is still included.
JavaDoc documentation for generally useful example classes, such as JTextComponentWordParser and BackgroundChecker, is now included.
The Sentry programmer's guide now contains information on the examples included with Sentry Java SDK to better demonstrate how those examples provide typical solutions for spell-checking requirements in applications.
The spelling dialog boxes included in the examples now disable the "Ignore All" button when problems other than misspelled words are encountered. This prevents confusion when, for example, a word would be reported as containing mixed case, and clicking the "Ignore All" button would not prevent it from being reported as having mixed case again.
A lexicon of technical and Internet-related terms such as "blog" and "spammer" is now included in ssce/runtime/lex/tech.tlx.
The Validate program now accepts the name of an existing lexicon file on its command line following the "-t" switch. When "-t filename" is specified, Validate performs some additional tests to ensure the existing lexicon file can be read.
PropSpellingSession and SpellingSession now implement the Clonable interface, which facilitates creating a private clone from a template object.
PropSpellingSession now includes new methods (getTempLexicon and setTempLexicon) which permit the temporary lexicon used by PropSpellingSession to be changed at run-time. This makes it easier to create a shared, read-only PropSpellingSession template object and create clones of it as needed, each with private temporary lexicons.
New get/set property access methods for tempLexicon, comparator, userLexicons, and minSuggestDepth have been defined in PropSpellingSession. The tempLexicon, comparator, userLexicons, and minSuggestDepth field members of PropSpellingSession have been documented as deprecated and will likely be changed from "public" to "protected" in a future version.
The PropSpellingSession constructor now includes mainLexPath and userLexPath parameters. If the file name specified in MainLexiconN entries in the Properties collection contains no path, PropSpellingSession will now prefix the file name with the value of the mainLexPath parameter (similarly for UserLexiconN and userLexPath). This is helpful in situations where the full path to the directory containing the lexicons is not known until run-time. If the file name specified in the Properties collection includes a path, then that path is used. If the mainLexPath parameter is set to an empty string, the lexicons will be opened in the current directory.
The PropSpellingSession constructor now includes a prefix parameter. The prefix is prepended to property names accessed in the Properties collection. This allows, for example, spelling-related properties to be stored with general application properties (prefixed with "Spelling."), or properties for a number of users to be stored in a the same properties collection (prefixed with the user's name). For example, if the prefix is "Spelling.", PropSpellingSession will search for properties name "Spelling.MainLexicon1" and "Spelling.CASE_SENSITIVE_OPT".
PropSpellingSession now supports a new access method, "stream". The "stream" access method can be used to open lexicons as files via FileInputStream. Lexicons opened as streams may be safely shared among threads (provided the lexicon is not modified at run-time). The "stream" access method is therefore useful in multi-threaded applications such as servlets.
Examples included with Sentry Java SDK are now better organized by purpose, and some existing examples have been renamed for clarity. Examples are now included under directories applet (demonstrating use of the spelling engine in an applet), background (demonstrating background, as-you-type, or on-the-fly spell-checking), interactive (demonstrating spelling checks where the user interacts with a dialog box to dispose of spelling errors), and servlet (demonstrating use of the spelling engine in servlets).
The toString() member in SuggestionSet now returns a simple comma-separated list of words.
The language id specified in the build file entered on the SqLex command line can now be specified in either decimal or hexadecimal (previously it was always hexadecimal). If it is specified in hexadecimal, it must be preceded by "0x", as in "0xa1b2".
Bug fixes:
AWTDemoApplet did not call LicenseKey.setKey when the "Check Spelling (TextArea)" menu item was selected. This problem has been corrected.
When the "Check Spelling (String)" menu items were selected in various examples, hyphenated terms were incorrectly reported as misspelled, even if they were contained in a dictionary. This problem has been corrected.
HTMLStringWordParser no longer treats an ampersand (&) appearing in a markup (<...>) as a character entity. For example, if the URL in the "HREF" part of an <A> tag contained an ampersand, HTMLStringWordParser would treat it like a character entity, and ignore all text until a semicolon (;) was encountered. HTMLStringWordParser now ignores ampersands contained within markups.
A problem where HTMLStringWordParser would incorrectly report words as doubled has been corrected.
HTMLStringWordParser now stops ignoring text in a character entity when whitespace is encountered. If the text being checked incorrectly contains an ampersand that does not mark the beginning of an HTML character entity, HTMLStringWordParser will no longer skip over vast amounts of text until the next semicolon is encountered.
StreamTextLexicon and FileTextLexicon will no longer words to be added with actions other than IGNORE_ACTION if the lexicon is in "external" format (i.e., a format other than the one used by the Sentry engine). Also, these classes validate words added to ensure they do not contain characters that could corrupt the lexicon format.
A problem where the "s" in a possessive initialism such as "D.A.'s" would be incorrectly reported as a misspelled word has been corrected.
A problem where unusual suggestions for capitalized words would be offered when IGNORE_MIXED_CASE_WORD_OPT was enabled (e.g., POtimization would be suggested for Optimization) has been corrected.
WordCatcher, ContainsWordCatcher, and SuggestWordCatcher, which were used only by CompressedLexicon, have now been made sub-classes of that class rather than separate classes.
For lexicons specified using the "file" access method, PropSpellingSession will now use the lexicon format specifier if one is provided (even if the one provided is incorrect), and will attempt to determine the format of the lexicon from its contents only if a format specifier is not provided.
Canadian English lexicons are now included with the Sentry Java SDK.
Documentation for the example dialog boxes used in AWTDemo and SwingDemo has been added to the readme.html file in each example's directory.
A getLanguage method has been added to the CompressedLexicon class.
URLs and e-mail addresses are now ignored when the IGNORE_DOMAIN_NAMES_OPT option is enabled.
Performance of the check method has been improved when the SPLIT_WORDS_OPT option is enabled.
A "Frequently Asked Questions" section has been added to the programmer's guide.
The HTMLStringWordParser class now considers HTML character entity references for alphabetic characters (e.g., å)to be part of a word. In addition, HTMLStringWordParser now includes convenience methods which convert Strings containing character entity references to text, and vice versa.
A new example which measures the performance of the Sentry engine in words checked per minute is now included in ssce/examples/PerformanceTest.
The PropSpellingSession class now writes an error message to System.err when a lexicon cannot be opened. The error message should aid in diagnosing run-time configuration problems.
Examples demonstrating techniques outlined in the Sentry programmer's guide are now included in a separate directory. Each example is compilable and runnable.
A problem where the suggestion set produced by the suggest method would sometimes contain duplicate words has been corrected.
A problem where EOFException was raised in SqLex when compressing a small word list has been corrected.
A problem where ArrayIndexOutOfBounds was raised in SwingDemo has been corrected.
A problem where a word beginning with an apostrophe was incorrectly reported as a doubled word has been corrected.
A new class, HTMLStringWordParser, has been added to the Sentry class library. This class works like StringWordParser, but will ignore any HTML markups (such as <FONT> and ©) contained within the string.
A new example has been added showing how to use the Sentry engine in a Java servlet.
Problems with suggestions for words with plural possessives (e.g., workers') were corrected.
The example programs will now replace words marked as "Change All" with consistent case. Previously, if the replacement word contained mixed case (such as foobar replaced with Foo Bar), the case pattern of the first replacement would be different than that of the subsequent replacements.
A StringIndexOutOfBounds exception raised when the REPORT_UNCAPPED_OPT option was set has been corrected.
StringWordParser will no longer accept a trailing period in a word terminated by a single quote (unless the word is an initialism).
StringWordParser will no longer accept a trailing period in a word containing embedded periods and three or more consecutive alpha-numeric characters. For example, StringWordParser will include the trailing periods in "U.S.A." and "Ph.D." but not in "wintertree-software.com."
The mechanism used to handle misspelled sub-words in compound words (e.g., hyphenated terms) has been improved. The SpellingSession class now includes two new methods: getMisspelledWord and getMisspelledWordOffset. If the SPLIT_HYPHENATED_WORDS_OPT or SPLIT_CONTRACTED_WORDS_OPT options are true, and part of a hyphenated or contracted term is misspelled, these new methods can be used to access the misspelled parts. For example, if the compound word bright-bluex was checked, the check method would return MISSPELLED_WORD_RSLT, getMisspelledWord would return bluex, and getMisspelledWordOffset would return 7.
The TextAreaWordParser and JTextAreaWordParser classes in AWTDemo and SwingDemo now extend from class StringWordParser. This means that changes and improvements made to StringWordParser will automatically be reflected in the example programs, and that the example programs can easily be modified to use classes which extend StringWordParser, such as the new HTMLStringWordParser class.
The save method in StreamTextLexicon is now declared public rather than protected.
References to the Swing package in SwingDemo have been changed from the old "com.sun.java.swing" to the more current "javax.swing".
Problems with compilation of the Sentry source code resulting from a conflict with com.wintertree.util.Comparable and java.lang.Comparable in JDK 2 have been corrected.
A problem where an exception would sometimes be raised when AWTDemoApplet was run under Internet Explorer has been corrected.
A problem were words were incorrectly reported as misspelled when ALLOW_ACCENTED_CAPS_OPT was false has been corrected.
StringWordParser will no longer throw StringIndexOutOfBounds exception when it encounters a word ending in a hyphen.
When IGNORE_NON_ALPHA_WORDS_OPT is true, the same string of digits appearing twice in a row will no longer be reported as a doubled word.
A constructor which takes a java.io.InputStream parameter was added to the CompressedLexicon class. This constructor permits compressed lexicons to be initialized from stream sources such as resources and URLs, rather than files. When a CompressedLexicon is constructed from a stream, the entire lexicon is loaded into memory when the lexicon is constructed. When a CompressedLexicon is constructed from a file, parts of the lexicon are loaded during construction and other parts are loaded as needed.
A new class, StreamTextLexicon, was added. StreamTextLexicon is similar to FileTextLexicon except that it is loaded initially from the contents of a stream and includes a save method which writes the contents of the lexicon to a stream.
FileTextLexicon and StreamTextLexicon will now read lexicon files or streams which contain either native characters or Unicode characters. When writing files or streams, these classes will use the native character set if possible and the Unicode character set only if the lexicon contains characteres which cannot be represented using the native character set.
The text lexicons shipped with the SSCE Java SDK now use single-byte native characters. The file name of the text lexicons have changed from ".utlx" to ".tlx".
A new class, PropSpellingSession, was added. PropSpellingSession is derived from SpellingSession. It initializes the spelling session by setting options and opening lexicons specified in a property set (java.util.Properties).
A demonstration program showing how to add a spelling checker to a Java applet was added. The new demo applet is named AWTDemoApplet and is located in the examples/AWTDemo directory. See readme.html in that directory for more information.
The word list files and suffix files read by the SqLex program and LexCompressor class can now contain native characters or Unicode characters. If the files contain Unicode characters, the first character in the file must be a Unicode byte-order mark (BOM).
The Sentry class library is now packaged in a JAR file rather than a ZIP file.
The SqLex utility program will now construct a temporary suffix file if no suffix file is specified. The Suffix program is no longer included with the SSCE Java SDK.
A problem where TextDemo would loop when doubled words were ignored was corrected.
Suggestions for misspelled plural possessives (e.g., girls') are now handled correctly.
The phonetic suggestion algorithm was modified to work better with words which contain several misspellings.
A new option, ALLOW_ACCENTED_CAPS_OPT, has been added to the
SpellingSession class. This option is intended to support checking
French Canadian text with Wintertree Software's French dictionary.
Characters ` and ^ are no longer considered invalid by the
LexCompressor class.
Fixed a problem with Latin1 (single-byte) compressed lexicons
Fixed a problem where MIXED_CASE_WORD_RSLT was being returned for
words with non-letters in the 2nd position ("i.e." for example)
SpellingSession.check: Fixed an exception sometimes raised when the
otherWord parameter was empty.
Version 5.1 is the first release of the SSCE Java SDK. The major release number is "5" for consistency with Wintertree Software's other Sentry Spelling Checker Engine SDKs.
Copyright © 2015 Wintertree Software Inc.