Adding a spell checker to a Java applet using Sentry Spelling Checker Engine Java SDK

Home Site index Contact us Catalog Shopping Cart Products Support Search

You are here: Home > Products > Developer tools > Sentry Spelling Checker Engine > Java > Applet White Paper


Introduction

This document describes how to use the Sentry Spelling Checker Engine Java SDK to create a Java applet that can check text entered in Web pages. The technique presented here

Source code for a working example applet is included with the Sentry Spelling Checker Engine Java SDK. Click here to see and use the example applet.

This document assumes familiarity with the Java language, developing Java applets, adding applets to Web pages, etc. See http://java.sun.com for a good source of information on Java and applets.

Why spell checking in an applet isn't as easy as you might think

Spelling checkers work by comparing words being checked against a dictionary of works known to be spelled correctly. Any words not found in the dictionary are reported as misspelled. To avoid annoying the user with spurious error reports, the dictionary should contain most common words in a given language (see How many words should be in the spell checker's dictionary? for a discussion of dictionary size). This requirement means the dictionary must contain a large number of words -- typically, 100,000 or more. For efficiency, the dictionary should be compressed to reduce memory and disk space usage and indexed for fast access. As a result, dictionaries are implemented as large, complex data structures that are typically stored in disk files.

Most Web browsers prevent unsigned applets from accessing disk files on the local computer for security reasons. One solution to this restriction is to digitally sign the applet. However, this approach introduces complications (see "Creating signed, persistent Java applets," Dr. Dobb's Journal, Feb. 1999 for more information):

Two alternative approaches for accessing dictionary files exist which do not have these complications:

  1. Store the dictionary files in the archive (JAR or ZIP file) containing the applet, and access them as resources

  2. Store the dictionary file on the same Web server as the applet and access them as URLs.

Both of these approaches require that the dictionary files be accessed as InputStreams. Beginning with version 5.7, Wintertree Software's Sentry Spelling Checker Engine Java SDK allows lexicons (dictionaries) to be constructed from InputStreams.

One further complication exists: Netscape allows applets to access file resources in JAR or ZIP archives only if the file has an extension included in a list of acceptable extensions (see http://developer.netscape.com/docs/technote/java/getresource/getresource.html for more information). Wintertree Software's dictionaries use "clx" for compressed lexicons and "tlx" for text lexicons, neither of which are included in Netscape's list of allowed extensions. New extensions can be added to the list, but this requires Netscape-specific code which contradicts the design goal of a single solution for all browsers. A simpler solution is to rename the dictionary files to use an allowed extension, such as "t" in place of "clx" and "txt" in place "tlx".

Adding a spell checker to an applet

We will assume that the applet spell-checks text contained in a Java TextArea component, and that it has a button or some other event source to start the spelling check. We will use the SpellingDialog class from Sentry's AWTDemo to interact with the user when spelling errors are detected. (We used an AWT-based applet, but JFC/Swing could be used just as well.) We'll also use the PropSpellingSession class, which is part of the Sentry class library, to construct a spelling session and initialize it from settings contained within a properties (java.util.Properties) file. A spelling session is an instance of the spell-check engine. It contains methods for checking the spelling of text, looking up suggestions for misspelled words, etc. When the applet is deployed, we will store its classes and the properties file in a JAR file.

The properties file lists the spelling options (e.g., "ignore capitalized words" or "report doubled words") and the dictionaries used by the spelling checker. More importantly, it specifies the location of the dictionaries, and the method used to access them. In this design, the dictionary files will be located on the Web server in the same directory as the Web page containing the applet. They could also be located in sub-directories, but cannot be located in higher-level directories because some browsers won't allow this. The dictionary files will be accessed through URL streams for reasons that will be given shortly. The properties-file lines that specify the location and access method for dictionaries (lexicons) might look like the following:

UserLexicon1=correct.tlx,url,t
MainLexicon2=ssceam.tlx,url,t
MainLexicon3=ssceam2.clx,url,c

The properties file lines specify the name of the dictionary file (e.g., correct.tlx), the method of accessing the file ("url", meaning the files are accessed as URL streams), and the format of the dictionary ("t" for text lexicons and "c" for compressed). Note that Netscape's restriction on file extensions does not apply when files are accessed as URL streams.

We could have elected to store the dictionary files in the JAR file containing the applet. The PropSpellingSession class supports this, and the JAR-file approach does have the advantage of keeping the applet and its files together in one place. However, compressed main dictionary files tend to be large (ssceam2.clx, the American English dictionary, is over 300K). If the applet's JAR file is large, the Web page containing the applet will take a long time to load on computers with slow Internet connections. If the dictionaries are accessed as URL streams, loading of them can be deferred until the spelling check starts.

The user enters some text in the applet's TextArea, then clicks the button to start the spelling check. In response to the button press, the applet creates a PropSpellingSession object, which initializes the spelling-checker engine by setting options and opening dictionaries specified in the properties file. Because the properties file is stored in the applet's JAR file, we use getResourceAsStream, which is a method of java.lang.Class. The getResourceAsStream method locates a file in the applet's code base (the JAR file), opens it, and returns an InputStream object. The InputStream is used to load properties into the java.util.Properties object. PropSpellingSession takes care of the details required to load the dictionary files as URLs. See http://java.sun.com/products//jdk/1.1/docs/guide/misc/resources.html for more information on accessing files as resources.

Because we will be checking the contents of a TextArea component, we can use the TextAreaWordParser class which is part of Sentry's AWTDemo program. This class implements Sentry's WordParser interface, which is used by the engine to enumerate individual words in a text source. WordParser-derived classes like TextAreaWordParser also allow misspelled words to be corrected.

The next and final step for the applet is to construct a SpellingDialog object. SpellingDialog takes over from this point. It calls on the TextAreaWordParser object to obtain words from the TextArea one by one and passes them to the spelling-checker engine for checking. When it encounters a misspelled word, it displays the word and asks the engine for a set of suggested replacements, which it also displays. SpellingDialog also asks TextAreaWordParser to highlight the misspelled word in the TextArea so the user can see the word in context. The user can dispose of misspelled words by ignoring them or replacing them. Any replacements are made directly in the TextArea. When all words have been checked, the SpellingDialog closes. The TextArea contains the checked and possibly corrected text at this point.

Installing the applet

Once the applet has been compiled and tested locally (using AppletViewer), it is ready for deployment. The applet doesn't have to be signed to support the spell-check features; of course, you can sign the applet if necessary for other purposes. The following steps are required to deploy the applet in a Web page:

  1. Create a JAR file containing the applet's classes and properties file.

  2. Upload the JAR file to the Web site directory where the Web page which uses the applet will reside.

  3. Upload any dictionary files to the same directory on the Web site as the JAR file.

  4. Upload the ssce.jar file (the Sentry class library) to the same directory on the Web site as the JAR file.

  5. Create a Web page with an APPLET tag similar to the following:

    <APPLET
    CODE="AWTDemoApplet.class"
    ARCHIVE="AWTDemoApplet.jar,ssce.jar"
    WIDTH=463 HEIGHT=315>
    </APPLET>
  6. Upload the Web page to the same directory on the Web site as the JAR file.

Open the Web page in a browser, and you should be able to enter text in the TextArea and check its spelling.

POSTing the text

At this point, we've described how to create and deploy an applet that can check the spelling of some text, but not much else. If you need to check spelling of text entered into an existing applet, or a new applet you plan to develop, then the technique described so far will be useful to you. Presumably your applet does something useful with the text entered by the user.

Many Web pages accept text entry from the user in HTML forms. These forms typically contain "Submit" buttons that send data entered in the form via a POST operation to a CGI script on the Web server. An applet can implement the entire form as AWT (or JFC) components, and submit the text to the CGI script on the server within the applet. This is a general Java programming technique, so we will let the Java experts explain it: See http://java.sun.com/docs/books/tutorial/networking/urls/readingWriting.html.

Checking text in HTML forms

An alternative to the approach presented here involves checking text entered into HTML forms. In this approach, the applet provides public methods and properties that can be used by JavaScript code contained in the Web page to check spelling.

For example, an applet could provide a public method named "check" that takes a String as a parameter. When this method is called, the applet invokes the SpellingDialog to check the text passed in the String. Another public method called "getText" returns the corrected text when the spelling check is complete. Text in a textArea component could be checked using JavaScript code similar to the following:

    document.spellingApplet.check(document.emailForm.body.value);
document.emailForm.body.value = document.spellingApplet.getText();

This code would be invoked as the "onclick" attribute of a button in the form.


Home Site index Contact us Catalog Shopping Cart Products Support Search


Copyright © 2015 Wintertree Software Inc.

Wintertree Software Inc.