String handling is an important part of most applications. While Strings are a type of Collection, they have a number of unique features and behavior.
Characters and Unicode
Describes Characters.
String classes
Introduces the GemStone Smalltalk objects that store collections of Characters.
String Sorting and Collation
Describes collation, including traditional string collation and collation using the ICU libraries and Unicode strings.
Encrypting Strings
Explains how to encrypt strings.
A Character is a special object: an object whose value is encoded in the OOP. Literal Characters are formed with a leading $.
Each Character has a code or codePoint, which, for lower order Characters, is the ASCII value. Either of these terms may be used, though ASCII is an incorrect term for the higher code points. GemStone supports Characters with values from 0 to 16r10FFFF, the full Unicode range, except for the Unicode reserved range.
The Unicode range of codePoints from 16rD800-16rDFFF is reserved for encoding leading/trailing surrogate pairs for UTF-16 encoding. These can never be legal Unicode characters, and as such can never be present in Unicode strings.
To get the Character for a given codePoint, use the Character class methods withValue: or codePoint:.
Characters have “type”, and know if they are a digit, letter, separator, or other similar kind. This information is defined in the Unicode database as the Unicode general category, and a variety of testing methods are available. The Unicode database also defines the upper and lower case equivalents, and case conversion methods are available. See the image for a full list of available protocol.
$Z isUppercase
true
$u isDigit
false
Characters are ordered (collated) using internal character tables, which provide a Unicode-like collation order for Characters up to code point 255. Characters above that are collated by code point. Character collation can be modified by installing character data tables, although this use is deprecated.
Character collation is used in collating instances of Traditional string classes, in Legacy String Comparison Mode. This character-based string collation has limitations outside the ASCII range; the ICU-library based string collation should be used if the default collation is not sufficient. For more on collation, see String Sorting and Collation.
An empty String literal, that is, a String or Unicode7 created by evaluating '', is canonicalized to a predefined kernel OOP; these do not use additional OOPs in the repository, do not require addiional space, nor affect garbage collection.
If you will be appending to a new empty string, you should start with String new, if the empty string will remain empty, using a literal String is more efficient.
The Unicode Consortium is an international standards organization that produces the Unicode Database. Unicode is a commonly used standard which provides unique codes for all Characters in all Character sets, in the range 0 to 0x10FFFF. It also describes the category of each Character and relationship between it and other Characters, and provides a default collation order with the Default Unicode Collation Element Table (DUCET).
For more information on this database, see http://www.unicode.org/Public/UNIDATA/UCD.html
The Unicode Consortium provides code charts by script as well as a single master list of all characters, presented in an ASCII-only, comma-delimited version. The current version of this database can be found at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
GemStone’s Unicode strings and collation in Unicode Comparison Mode uses external libraries that support Unicode. Tradtional Strings in Legacy Comparison Mode use a historic native GemStone implementation that broadly conforms to Unicode, but has not been updated as the standard has evolved.
A string is a sequence of Characters, implemented as a subclass of CharacterCollection.
Each element in a CharacterCollection is a Character. Since characters may require more than one byte of storage, the class of string may be transparently converted to an instance of the class with the appropriate capacity for that Character. The semantics of the CharacterCollection remain the same; access by index will return the Character at the given index, regardless of how many bytes the Character actually requires.
A fundamental quality of strings is collation. Since the scope of collation includes equality, the collation of strings affects a repository in many ways, such as dictionary lookups. Collation in GemStone has historically been handled using character-based tables. Unicode string-based collation using ICU open source libraries is included in recent releases and provides a much richer set of collation features. To ensure that legacy applications function correctly, GemStone supports both of these encoding/collation schemes.
In Legacy String Comparison Mode, Traditional strings collate using internal character-based collation tables. When the repository is in Unicode Comparison mode, however, Traditional strings use ICU-based Unicode collation.
Traditional strings are implemented in three classes:
String
Strings hold Characters with codepoints in the range 0..255 (8 bits).
DoubleByteString
DoubleByteStrings are required when one or more Characters in a string needs more than one byte of storage. DoubleByteStrings hold Characters with codepoints in the range 0...16rFFFF (64K).
QuadByteString
QuadByteStrings are required when one or more Characters in a string needs more than two bytes of storage. QuadByteStrings hold Characters with codepoints in the range 0...16r10FFFF.
While Traditional strings normally hold human-readable text characters, this is not a requirement. Generally, raw byte data would be held in an instance of ByteArray, but it may be more convenient to use a String. In particular, there are cases when an instance of String will be used to hold raw UTF-8 encoded bytes.
Unicode strings always use ICU string-based collation. Like Traditional strings, there are three classes based on range, but note that the codePoint range is different than Traditional strings.
Unicode7
A subclass of String, limited to holding Characters with codepoints in the range 0..127 that are represented in 7 bits.
Unicode16
A subclass of DoubleByteString, holding Characters with codepoints in the range 0...16rFFFF (64K), excluding the range 16rD800-16rDFFF. This range is reserved for surrogates that allow encoding into UTF-16.
Unicode32
A subclass of QuadByteString, holding Characters with codepoints in the range 0..16r10FFFF. Again, this excludes the range 16rD800-16rDFFF.
In Legacy String Comparison Mode, Traditional strings and symbols are compared for equality and ordered using character-based comparison, and equality includes non-printing characters as well as printing characters.
Unicode strings use the ICU string-based string collation, in which equality does not consider non-printing characters.
Since Traditional and Unicode string equality rules are different, Traditional strings and symbols (when the repository is in Legacy String Comparison Mode) may produce inconsistent results. In this mode it is an error to mix Unicode strings with Traditional strings or symbols, either for comparison or equality.
Also note that Unicode comparison of Symbols using = uses identity, while >= and <= compare according to the ICU rules for UnicodeString. Symbol comparison using = may produce inconsistent results for Symbols containing special characters, such as nuls, that are not counted by ICU comparison.
A symbol is similar to a string, but each symbol with a unique set of Characters is guaranteed to have only one canonical instance in GemStone. Symbols are created by a special process, the SymbolGem, to ensure this uniqueness. Creating a new symbol will return an existing symbol, if one exists; a new symbol is only created if it has not been previously defined. Existing symbols cannot be modified.
Like strings, symbols may also contain Characters with values that require more than a byte of storage, and will convert from class Symbol into DoubleByteSymbols or QuadByteSymbols as needed. Since symbols are canonical, the class of a symbol always depends on the contents. While you can create a DoubleByteString with only characters in the range of String, you cannot create a DoubleByteSymbol that does not contain at least one character in the DoubleByte range, and the same is true for QuadByteString.
All symbols may be viewed by all users. Private information should be maintained in strings, not in symbols.
Symbols, DoubleByteSymbols, and QuadByteSymbols are restricted to 1024 or fewer characters.
Symbols that have no references from anywhere in the system may eventually be garbage collected, if the system is configured to do so. See the System Administration Guide for more information on symbol garbage collection.
Symbols, like strings, collate using character-based tables in Legacy String Comparison Mode and using ICU string-based collation in Unicode Comparison Mode. As a result, they cannot be compared to Unicode strings in Legacy String Comparison Mode.
Symbol equality comparisons with = compare identity, while other comparisons (> >= < <= ) use the ICU string-based collation. GemStone symbols that contain the same printing characters but different non-printing characters will return false for = but compare true for >= and <=.
The literal form of a Symbol is specified using a leading #. The body of the symbol may additionally include single quotes. This is optional for symbols that are legal identifiers and keywords, but required for symbols that start with a number, include punctuation/spaces, etc. For example:
#'22 skidoo'
#fooBar
ByteArray is a specialized collection that is restricted to holding Integers between 0 and 255 (inclusive). While ByteArray is not a kind of String, the contents may be interpreted as a String.
Instances of ByteArray can be creating using literal syntax #[]. For example:
#[ 1 2 3 4 ]
Utf8 is a subclass of ByteArray. It is not a kind of String, but may easily be converted back and forth from a traditional or Unicode string. A Utf8 holds the UTF-8 encoded bytes created by sending encodeAsUTF8 to a string, or by reading encoded data from a GsFile using contentsAsUTF8. Utf8 instances should not be directly created or edited.
'šamas' encodeAsUTF8
anUtf8( 197, 161, 97, 109, 97, 115)
Instances of Utf8 can be read from and written to instance of GsFile, which cannot directly handle characters with codePoints over 256. FileSystem by default reads and writes in UTF-8 encoding, transparently encoding and decoding to string instances.
Strings created as literals, that is, in text encased in single quotes, are invariant; they cannot be modified after they are created.
In addition to creating strings as literals, you can use the inherited instance creation methods, such as new: and withAll:. For example:
String withAll: #($a $z $u $r $e).
azure
A string responds to the comma operator by returning a new string in which the argument to the comma has been appended to the string’s original contents. For example:
'String ' , 'con' , 'catenation'
String concatenation
Although this technique is handy, it’s not very efficient; each #, message send creates a new instance of String, so this example creates three Strings, returning the final one.
To build a string efficiently, by appending onto the original object, you can use add:, which modifies the original string. Note that you cannot start with a literal string, since a literal string is invariant.
| resultString |
resultString := String new.
resultString add: 'String ';
add: 'con';
add: 'catenation'.
resultString
%
String concatenation
To convert between UTF-8 encoded bytes and the various kinds of string classes, there are a number of methods:
CharacterCollection and its subclasses define messages that let you perform various conversions.
Strings can be converted in case:
'abcde' asUppercase
ABCDE
You can remove leading and/or trailing whitespace separators using methods such as trimSeparators. There are a number of variants; see the image for details.
' abcde ' trimSeparators
'abcde'
Strings can be split using the subStrings: method, which allows you to specify one or more characters to use as markers.
For example, to split a text into lines with /:
'owa/tagu/siam' subStrings: '/'
anArray( 'owa', 'tagu', 'siam')
Strings can be converted to numbers and other types of objects as well. For example:
'15' asFloat
15.0
Note that not all Strings can be converted to all kinds of other objects; if the String does not contain the representation of a number, for example, it’s meaningless to convert it to an Integer, so this will return an error.
Traditional strings are equal to each other if they contain the exact same Characters in the same case; equality is case-sensitive.
Unicode strings compared using = follow the ICU library comparison rules for equality, which are similar, although any non-whitespace control characters (such as null) are ignored for the comparison.
As mentioned above, Traditional strings and Unicode strings cannot be compared to each other for equality using =, when the repository is in Legacy String Comparison Mode. To compare traditional and Unicode strings in any combination, use compareTo:collator:, specifying nil for the collator to indicate the default collator.
Strings can be compared for case-insensitive equality using the methods isEquivalent: or equalsNoCase:.
Literal and nonliteral Strings behave differently in identity comparisons. Each nonliteral String (created, for example, with new, withAll:, or asString) has a unique identity. That is, two Strings that are equal are not necessarily identical.
| nonlitString1 nonlitString2 |
nonlitString1 := String withAll: #($a $b $c).
nonlitString2 := String withAll: #($a $b $c).
(nonlitString1 == nonlitString2)
false
However, literal strings that contain the same character sequences and are compiled at the same time are both equal and identical:
| litString1 litString2 |
litString1 := 'abc'.
litString2 := 'abc'.
(litString1 == litString2)
true
This distinction can become significant in building sets. If you add both litString1 and litString2 to the same IdentitySet, the set will contain only one instance of 'abc'; however, an IdentitySet would include both nonlitString1 and nonlitString2.
CharacterCollection and its subclasses define methods that can tell you whether a string contains a particular sequence of characters and, if so, where the sequence begins. This search can be case sensitive, case insensitive, and may include wild cards.
Below are some common methods; see the image for further methods.
Pattern matching arguments (patternArray) consist of an Array containing combinations of Strings and the wildcard characters $* and $?. The character $? matches any single character in the receiver, and $* matches any sequence of characters in the receiver.
This is an example of the use of wildcard characters in pattern matching.
'weimaraner' matchPattern: #('w' $* 'r')
true
Since $* is interpreted as “any sequence of characters”, this returns true.
Similarly, The following example returns the index at which a sequence of characters beginning and ending with $r occurs in the receiver.
'weimaraner' findPattern: #('r' $* 'r') startingAt: 1
6
If a wildcard character $* or $? occurs in the receiver or within a string in the argument array, it is interpreted literally.
The following expressions illustrate what happens when the * is within the string and interpreted literally:
'w*r' matchPattern: #('weimaraner')
false
'weimaraner' findPattern: #('w*r') startingAt: 1
0
While strings clearly have a natural sort order (collation), the details of that order are complex. Different languages may sort the same set of strings differently, according to the particular rules in that language. Even within one language, different applications may want to order string data differently. To complicate matters, some languages may treat certain sequences of characters as a unit when sorting strings.
Collation depends on the results of a comparison between two strings, which in turn depends on how the Characters within the string are collated. While this simple view breaks down with some sorting requirements and linguistic rules, basic string comparison is adequate for many uses and is faster than the more complete external collation.
The Comparison Mode of a repository controls the way comparisons are done between instance of Traditional strings. The modes are:
In Legacy String Comparison Mode, Traditional strings and symbols cannot be compared to Unicode strings without using special protocol. Collation of Traditional strings and symbols is using character-based collation.
In Unicode Comparison Mode, Traditional strings and Symbols use ICU string-based collation, and can interoperate easily with Unicode strings.
A new repository can be easily switched to Unicode Comparison Mode. Since the collation rules may be subtly different, and affect system operations such as looking up class names in SymbolDictionaries, changing the mode for existing applications should be done with great care and thorough testing. To be safe, all indexes and sorted collections should be rebuilt, and all hashed collections re-hashed. The mode of a repository must be managed as part of System Administration, not by individual developers on a shared repository.
The Comparison Mode is controlled by the Global #StringConfiguration. By default, StringConfiguration is set to String, and the repository is therefore in Legacy String Comparison Mode.
To enable Unicode Comparison Mode, as SystemUser, execute:
StringConfiguration enableUnicodeComparisonMode
This returns the previous setting for Unicode Comparison Mode. Note that this commits, but the current session is not affected; the new mode will take effect for all subsequent logins.
To enable Legacy String Comparison Mode, as SystemUser, execute:
StringConfiguration disableUnicodeComparisonMode
Again, note that this operation commits, but the change does not affect the current session; the new mode will take effect for all subsequent logins.
To verify the mode in this repository, execute:
StringConfiguration isInUnicodeComparisonMode
When you create or update a kind of Traditional or Unicode string with a Character that requires more bits than the specific class of string can hold, it is transparently auto-converted to the appropriate class.
For example, if you add if you add the Euro character (code point 8364) to an instance of String, which can only hold codePoints up to 255, it will auto-convert to an instance of DoubleByteString. Likewise, if you add an Yen symbol (codePoint 177) to an instance of Unicode7, which can only hold codePoints up to 127, it is auto-converted to an instance of Unicode16.
When the repository is in Unicode Comparison Mode, an instance of String that would otherwise auto-convert to an instance of DoubleByteString is converted to an instance of Unicode16, for improved comparison and collation since comparisons use the ICU libraries.
Traditional strings (String, DoubleByteString, and QuadByteString) and symbols (Symbol, DoubleByteSymbol, and QuadByteSymbol) are collated, in Legacy String Comparison Mode, by individual character. The comparison of characters with values up to 255 are done according to the Default Unicode Collation Element Table (DUCET), and Character 256 and above are sorted by codePoint, the Unicode numeric value.
Legacy applications may have installed non-default internal character tables, which modified the character-based collation. This is no longer recommended; if the default character-based collation is not sufficient for your application, you should integrate the ICU string-based collation.
Enabling Unicode Comparison Mode (see Comparison Mode) causes Traditional strings and symbols to collate following the same rules as Unicode strings. This section only applies when in Legacy String Comparison Mode, not in Unicode Comparison Mode.
String ordering using <= (as well as <, >, and >=) is not case-sensitive. When instances of String, DoubleByteString, and QuadByteString are compared using <= or related operations, the comparison first is done case-insensitive. If they are found to be equal other than with respect to case—if the only difference is case—then they are collated according to the Character Data Table, which specifies uppercase comes before lowercase.
#( 'MM' 'c' 'Mm' 'mb' 'mM' 'x' 'mm' )
sortAscending
anArray( 'c' 'mb' 'MM' 'Mm' 'mM' 'mm' 'x' )
Since ordering is by character, with only case being excluded, the default ordering is sensitive to accents and other diacritical marks on characters. Characters with diacritical marks are not related to the base character.
For example, all words beginning with 'Co' and 'co' would sort before all words beginning with 'Có' and 'có':
#('Cór' 'COz' 'Coa' 'cóa')
sortAscending
anArray( 'Coa', 'COz', 'cóa', 'Cór')
Unicode strings, and all strings when in Unicode Comparison Mode, use the ICU (International Components for Unicode) libraries to provide string-based collation. The ICU libraries are a widely-used, open-source implementation of language-specific sorting and collation.
For a complete explanation of the features and subtleties of language-specific collation, you should refer to documentation on the ICU website, http://icu-project.org/.
The classes IcuLocale and IcuCollator provide an interface to the ICU libraries. Unicode strings (instance of Unicode7, Unicode16, and Unicode32) and instances of Utf8 use IcuCollator and IcuLocale to perform sorting operations using the ICU libraries. The collation is performed by considering the entire string, not on a character-by-character basis, and requires a specific language and locale to determine the rules for the comparison.
In addition to specific language rules, ICU sorting is highly configurable for other application-specific sorting requirements.
While collation will vary according to specific language and locale, in general ICU collation orders characters with diacritical marks with the base character, and sorts lowercase before uppercase.
For example, using the sorting examples in the previous section and the default collator for the US, a different sort ordering is produced from that of legacy collation:
#( 'MM' 'c' 'Mm' 'mb' 'mM' 'x' 'mm' )
sortAscending
anArray( 'c', 'mb', 'mm', 'mM', 'Mm', 'MM', 'x')
#('Cór' 'COz' 'Coa' 'cóa')
sortAscending
anArray( 'Coa', 'cóa', 'Cór', 'COz')
This is the default US collation; by configuring the IcuCollator, however, many other orderings may be produced.
Instances of IcuLocale represent a specific language, country, and language variant. The available IcuLocales are in the shared library and can be listed using IcuLocale class >> availableLocales.
A default instance of IcuLocale is instantiated on first reference, and stored in session state. The default IcuLocale is based on the operating system locale setting for the gem. The default IcuLocale affects collation, so some care should be taken in configuring the operating system locale for the gem processes. In applications with distributed locales, it may be safer to set a default IcuLocale on login, using UserProfile >> loginHook: (see the System Administration Guide).
To set a specific default IcuLocale, use the method IcuLocale class >> default:. This sets the default locale for the session executing this code. While the instance of IcuLocale can be made persistent, the default IcuLocale does not persist from session to session.
To determine what IcuLocale is currently in use, use the method IcuLocale >> default.
IcuLocale default
IcuLocale en_US
An IcuCollator encapsulates the rules involved in collation for a specific IcuLocale. A default instance of IcuCollator is instantiated on first reference, based on the default IcuLocale, and stored in session state.
When comparing instances of Unicode string classes, the comparison always uses an IcuCollator, using the method compareTo:collator:. If an IcuCollator is not specified, such as when Unicode string classes are compared using >, the IcuCollator default is used; which in turn uses IcuLocale default.
You can also create an instance of IcuCollator for a specific locale, if you need to use specific collation rules other than the default. You can do this using IcuCollator class methods forLocale: anIcuLocale or forLocaleNamed: aString. For example, to create an IcuCollator for the German language as used in Germany:
IcuCollator forLocaleNamed: 'de_DE'
The actual string comparison is done by the ICU libraries, and follows the ICU comparison rules for that locale. Collation rules are similar in most western languages, but there are differences in specific languages.
For example, in the Hungarian language, ’cs’ is considered a single letter, so words that start with ’cs’ are sorted together and follow other words beginning with ’c’. The following example sets up a collection that is sorted according to Hungarian rules:
| hungarianWords collator |
collator := IcuCollator forLocaleNamed: 'hu_HU'.
hungarianWords := IcuSortedCollection newUsingCollator: collator.
hungarianWords
add: 'csak' asUnicodeString;
add: 'cukor' asUnicodeString;
add: 'comb' asUnicodeString.
hungarianWords
a IcuSortedCollection
sortBlock a ExecBlock2
collator a IcuCollator
#1 comb
#2 cukor
#3 csak
IcuCollator includes a number of attributes that can be used to customize the sort. These attributes work within the specific language rules of the associated IcuLocale.
Keep in mind that while the default values and the descriptions listed in Table 5.2 apply to most locales, particularly with non-Western scripts, the defaults may be different in different locales, and the attribute may have different behaviors.
See the ICU site, particularly the pages under http://userguide.icu-project.org/collation, for more precise descriptions and more detailed documentation.
Strength allows degrees of sort, to consider or not consider things like accent characters and case when performing the sort. The default strength is TERTIARY for most locales (the main exception being Japanese). The following are the sort strengths:
The default sort strength is TERTIARY. As an example, when two strings are compared using TERTIARY strength, characters in the strings are compared first by the base character, ignoring any case or diacritical marks. If the base characters are the same, they are compared by diacritical mark, ignoring case. If both base characters and diacritical marks are the same, then case is considered. Note that unlike GemStone’s Strings or ASCII ordering, the default sorts places lowercase before uppercase.
Keep in mind that with lower sort strengths, when a factor such as case is not used, the relative position in the results of similar strings is not deterministic; the strings compare as the same, and so their position will depend on the order of the input.
By using the IcuCollator sort attributes, you have a great deal of control over your specific sorting.
For example, using the alternative handling example, you can sort strings that include spaces, dashes and other punctuation without considering the punctuation characters when doing the comparison:
| blues collator|
collator := IcuCollator forLocale: IcuLocale default.
collator alternateHandling: true.
blues := IcuSortedCollection newUsingCollator: collator.
blues add: (Unicode7 withAll: 'blue berry').
blues add: (Unicode7 withAll: 'blue moon').
blues add: (Unicode7 withAll: 'bluebird').
blues add: (Unicode7 withAll: 'blue bird').
blues add: (Unicode7 withAll: 'blue-bird').
blues add: (Unicode7 withAll: 'bluetooth').
blues
%
a IcuSortedCollection
sortBlock a ExecBlock2
collator a IcuCollator
#1 blue berry
#2 bluebird
#3 blue bird
#4 blue-bird
#5 blue moon
#6 bluetooth
An IcuSortedCollection is a specialized subclass of SortedCollection for which you do not set the sortBlock. An IcuSortedCollection may only hold instances of subclasses of CharacterCollection. It is associated with a IcuCollator, which in turn is associated with an IcuLocale, and the sorting behavior is specific to the configuration of these instances. IcuSortedCollections rely on the open-source ICU libraries to perform the comparisons and produce correctly collated results.
Using IcuSortedCollection is recommended if you will have sorted collections containing Unicode strings. This avoids lookup failures if a different collator is used to lookup than was used to sort the elements in the collection.
The Unicode Consortium periodically releases new versions of the Unicode Standard, with (usually minor) changes in collation and the addition of new characters. The ICU organization then periodically releases new versions of their libraries reflecting these changes in the standard. Major GemStone releases include the latest version of the ICU libraries.
The indexing structures depend on collation encodings from ICU that may change between versions, even if the collation changes would not otherwise affect the application. So even in cases where the Unicode differences are minor, the ICU library version loaded in an application must match the ICU version used to build indexes.
To accommodate the (generally) low value of upgrading to a new ICU library, and the potentially high cost of rebuilding structures in your application that depend on collation, GemStone preserves the existing ICU library version over upgrade.
The version of the ICU library that is used in a repository is stored under (Globals at: #IcuLibraryVersion). This is a string, which must correspond to one of the versions of the ICU libraries in the product distribution. When a session logs in, it will select the ICU shared libraries to load based on the IcuLibraryVersion value.
As with StringConfiguration, IcuLibraryVersion is a global, repository-wide setting that can be only changed by SystemUser, to avoid the risk of lookup failures and incorrect query results. It should be managed as part of System Administration, not by individual developers on a shared repository.
To update the version of ICU libraries in your repository, you will need to follow this procedure:
1. Ensure no other users are on the system
2. Login as SystemUser and execute
Globals at: #IcuLibraryVersion put: newVersionString
3. Shut down and restart the Stone.
4. Login as DataCurator, or a user with the appropriate object access rights. If you are using a linked session, you may need to restart the application to allow the new version of the ICU shared library to be loaded
5. Update any persistent data structures that may be affected. This involves dropping and rebuilding indexes that involve Unicode strings, resorting SortedCollections, and resorting any application data structures that depend on Unicode string collation.
6. When this is complete and all changes have been committed, other users may be allowed to login.
There are times when you may which to encrypt strings in your repository or for transmittal to other systems. GemStone provides an interface to Advanced Encryption Standard (AES) encryption/decryption, provided by the OpenSSL open source libraries included with GemStone.
The AES specification is available at: http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf.
All encryptions/decryptions are in cipher block chaining (CBC) mode; see the AES specification document for further details.
Encryption and decryption API methods are provided for 128-bit/16-byte keys, 192-bit/24-byte keys, and 256-bit/32-byte keys, using the following methods.
Encryption can be done on instances of ByteArray or Uft8, or subclasses of CharacterCollection. For encryption, you must provide a key that is a ByteArray of the appropriate size (16, 24, or 32 bytes) containing key bytes, and a salt that is a 16-byte ByteArray containing salt values.
The following methods encrypt or decrypt using the specified key and salt, return the encrypted or decrypted result:
aesEncryptWith128BitKey: aKey salt: aSalt
aesDecryptWith128BitKey: aKey salt: aSalt
aesEncryptWith192BitKey: aKey salt: aSalt
aesDecryptWith192BitKey: aKey salt: aSalt
aesEncryptWith256BitKey: aKey salt: aSalt
aesDecryptWith256BitKey: aKey salt: aSalt
These methods place the encrypted or decrypted result into aByteObjOrNil, starting at offset 1, and resizing if necessary. If aByteObjOrNil is nil, a new instance of the same class as the receiver will be created containing the results.
aesEncryptWith128BitKey: aKey salt: aSalt into: aByteObjOrNil
aesDecryptWith128BitKey: aKey salt: aSalt into: aByteObjOrNil
aesEncryptWith192BitKey: aKey salt: aSalt into: aByteObjOrNil
aesDecryptWith192BitKey: aKey salt: aSalt into: aByteObjOrNil
aesEncryptWith256BitKey: aKey salt: aSalt into: aByteObjOrNil
aesDecryptWith256BitKey: aKey salt: aSalt into: aByteObjOrNil
You may use ByteArray withRandomBytes: N to produce pseudo-random key and salt values for encryption. For example: