Although you designed your schema with care and thought, after using it for a while you will probably find a few things you would like to improve. Furthermore, even if your design was perfect, real-world changes usually require changes to the schema sooner or later.
This chapter discusses the mechanisms GemStone Smalltalk provides to allow you to make changes in your schema and manage the migration of existing objects to the new schema.
Versions of Classes
defines the concept of a class version and describes two different approaches you can take to specify one class as a version of another.
ClassHistory
describes the GemStone Smalltalk class that encapsulates the notion of class versioning.
Migrating Objects
explains how to migrate either certain instances, or all of them, from one version of a class to another while retaining the data that these instances hold, including transforming the data as needed.
Multi-threaded instance migration
describes how to perform simple migrations more quickly by creating mappings and performing the migration in a single operation.
In order to create instances of a class, the class must be invariant, and invariant classes cannot be modified, except in some specific ways. While you defined your schema to be as complete as you could at the time you created the classes, inevitably further changes are needed. You may now have instances of invariant classes populating your database and a need to modify your schema by redefining certain of these classes.
To support this schema modification, GemStone allows you to define different versions of classes. Every class in GemStone has a class history—an object that maintains a list of all versions of the class—and every class is listed in at least one class history, the class history for the class itself. You can define as many different versions of a class as required, and declare that the different versions belong to the same class history. You can migrate some or all instances of one version of a class to another version when you need to. The values of the instance variables of the migrating instances are retained if you have defined the new version to do so.
In GemStone Smalltalk classes have versions. Each version is a unique and independent class object, but the versions are related to each other through a common class history. The classes need not share a similar structure, nor even a similar implementation. The classes need not even share a name, although it is less confusing if they do, or if you establish and adhere to some naming convention.
If you define a new class in a SymbolDictionary that already contains an existing class with the same name, it automatically becomes a new version of the previously existing class. This is the most common way of creating new class versions. Instances that predate the creation of the new version remain unchanged, and continue to access the old class’s methods, although tools such as GemBuilder may provide options to automatically migrate instances to the new class. Instances created after the redefinition have the new class’s structure and access to the new class’s methods.
When you define a class, the class creation protocol includes an option to specify the existing class of which the new class is a version. See the keyword newVersionOf:.
When you create a new version of a class—for example, Animal—subclasses of the old version of Animal still point to the old version of Animal as their superclass (unless you are using a tool which provides the option to automatically version and recompile subclasses). If you wish these classes to become subclasses of the new version, you need to recompile the subclass definitions to make new versions of the subclasses, specifying the new version of Animal as their superclass.
One way to do this is to file in the subclasses of Animal after making the new version of Animal (assuming the new version of the superclass has the same name).
When you create a new version of a class (such as Animal) you typically want your existing code to use the new version rather than the old version. That is , without being recompiled, existing methods containing code like the following should create an instance of the new version rather than of the old version of Animal class:
pet := Animal new.
As long as the new class version replaces an existing class in the same SymbolDictionary, then references from existing methods will be automatically updated to the new class version.
This works because a compiled method does not directly reference a global (e.g., the class Animal), but references a SymbolAssociation in a SymbolDictionary. When you originally compile the method, it resolves the name using an expression similar to the following:
System myUserProfile resolveSymbol: #theClassName
The compiled method includes the resulting SymbolAssociation, whose key is the name of the global and whose value is the class (or other object). The value can be updated at any time, for example when you create a new version of a class.
This tiny performance penalty is what allows global variables to vary. If you have a global that you know will be constant, then you can reference the value directly from a compiled method by making the SymbolAssociation invariant before compiling the method.
While the SymbolAssociation is updated with the new value by versioning the class within the same SymbolDictionary, keep in mind that under some circumstances you may have a SymbolAssociation that does not reference the latest version, or the version you expect. If you have a newer class with the same name in a different SymbolDictionary, or if you delete and recreate the class, the SymbolAssociation will continue to point to the older class.
Adding a Class Variable does not require a new version of your class, but adding a class instance variable does.
When you create a new version of a class, the values in any Class variables or Class Instances variables in the old class are referenced by the new class as well. By default, all versions of a class refer to the same objects referenced from Class or Class instance variables.
When you define a class with the same name and variables, but with a different set of options (passed in via the options: keyword to the class creation method), it does not always need to create a new version of the class.
See here for a description of the class options and how they can be used for specific class behavior.
Class options are automatically inherited by new version or by an updated version, unless the first element in options: array is #noInheritOptions.
If the new definition includes #instancesNonPersistent, then the existing class will be modified to add #instancesNonPersistent.
If the new definition includes #subclassesDisallowed, #disallowGciStore, #traverseByCallback, #dbTransient, or #instancesInvariant, and if #noInheritOptions is not the first element, the options will not be removed.
To remove an unwanted option, you may need to include #noInheritOptions as the first element of the options: array.
In GemStone Smalltalk, every class has a class history, represented by the system as an instance of the class ClassHistory. A class history is an array of classes that are meant to be different versions of each other. While they often have the same class name, this is not a requirement; you can rename classes as well as change their structure.
When you define a new class in the same symbol dictionary as an existing class with the same name, it is by default created as the latest version of the existing class and shares its class history.
When you define a new class by a name that is new to a symbol dictionary, the class is by default created with a unique class history. If you use a class creation message that includes the keyword newVersionOf:, you can specify an existing class whose history you wish the new class to share. This is useful if you want to create a version of a class with a different name or in a different symbol dictionary. If the new class version has the same name and is in the same symbol dictionary, it is not necessary to use newVersionOf:, since the new class will become a version of the existing class automatically.
For example, suppose your existing class Animal was defined like this:
Object subclass: 'Animal'
instVarNames: #('habitat' 'name' 'favoriteFood' 'predator')
classVars: #()
classInstVars: #()
poolDictionaries: {}
inDictionary: UserGlobals.
Animal compileMissingAccessingMethods.
Example 11.2 creates a class named NewAnimal and specifies that the class shares the class history used by the existing class Animal.
Object subclass: 'NewAnimal'
instVarNames: #('name' 'diet' 'predator' 'species')
classVars: #()
classInstVars: #()
poolDictionaries: {}
inDictionary: UserGlobals
newVersionOf: Animal
description: nil
options: #().
NewAnimal compileMissingAccessingMethods.
If you wish to define a new class Animal with its own unique class history—in other words, the new class Animal is not a version of the old class Animal—you can add it to a different symbol dictionary, and specify the argument nil to the keyword newVersionOf:. However, this can easily create confusion, and make it difficult to diagnose problems. It is only recommended when the two symbol dictionaries will not normally both be loaded and in use at the same time.
If you try to define a new class with the same name as an existing class that you did not create, you will most likely get an error, because you are trying to modify the class history of that class — an object which you are probably not permitted to modify.
You can access the class history of a given class by sending the message classHistory to the class. For example, the following expression returns the class history of the class Employee:
Animal classHistory
This is a collection that includes all older versions of the class, in order, with the most recent version being the last one on the collection. For example:
Animal classHistory last == NewAnimal
true
Animal classHistory first == Animal
true
In the usual case, in which all classes in the class history have the same name, the compiler will resolve that name as the last one in the classHistory.
In some cases (such as in GemBuilder for Smalltalk) a class with a class history size that is larger than one these will be displayed with the position, for example:
Animal [1]
NewAnimal [2]
You can assign a class to a class history by sending the message addNewVersion: to the class whose class history you wish to amend; the argument to this message is the class whose history is to be reassigned.
For example, suppose that we created NewAnimal using the regular class creation protocol, and did not use the method with the keyword newVersionOf:. To later specify that it is a new version of Animal, execute the following expression:
Animal addNewVersion: NewAnimal
Once you have defined a new version of your class, you may want to migrate your existing instances from the old class version to the new version. Migration in GemStone Smalltalk is a flexible, configurable operation.
You can establish the default destination class for migration, for use in later instance migration operations. To do so, send a message of the form:
OldClass migrateTo: NewClass
This configures the old class so it knows to migrate its instances to become instances of the new class. Migration does not occur as a result of sending the above message; this only sets the destination of migration.
It is not necessary to set a migration destination ahead of time; other protocol will allow you to specify the migration class for a specific instance migration. If you use methods that includes a specific migration destination class, the default destination is ignored.
Once you have set the migration destination, you can migrate a single specific instance, using a message of the form:
anInstanceOfOldClass migrate
Provided the object is an instance of a class for which a migration destination has been defined, the object becomes an instance of the new class. If no destination has been defined, no change occurs.
The following series of expressions, for example, creates a new instance of Animal, sets Animal’s migration destination to be NewAnimal, and then causes the new instance of Animal to become an instance of NewAnimal.
You can bypass the migration destination, or migrate instances of classes for which no migration destination has been specified. To do so, specify the destination directly in the message that performs the migration.
OldClass migrateInstances: setOfInstancesOfOldClass to: NewClass
Migrate the specific instances in setOfInstancesOfOldClass, which are instances of OldClass, to become instances of NewClass. Ignores any existing migration destination, and does not set the default migration destination.
OldClass migrateInstancesTo: NewClass
Migrate all instances of OldClass to become instances of NewClass. Ignores any existing migration destination, and does not set the default migration destination.
Example 11.4 uses migrateInstances:to: to migrate all instances of all versions of a class, except the latest version, to the latest version.
For more on listing instances, see Finding Instances.
| animalHist allAnimals |
animalHist := Animal classHistory.
allAnimals := SystemRepository listInstances: animalHist.
1 to: animalHist size-1 do: [:index |
(animalHist at: index)
migrateInstances:(allAnimals at: index)
to: Animal currentVersion].
When you migrate a set of instances, it is possible that not all instances in the set can be successfully migrated.
The migration methods migrateInstancesTo: and migrateInstances:to: return an array of collections:
1. The set of objects for which the current user does not have read permission.
2. The set of objects for which the current user has read permission but not write permission.
3. The set of objects that could not be migrated due to index incompatibilities.
4. The set of objects whose class was not identical to the receiver (presumably, incorrectly gathered instances) and therefore were not migrated; this will always be empty if using migrateInstancesTo:.
5. The set of object that failed migration by signalling a MigrationError. This error may be signalled, for example, by customized migration code that encounters an object that cannot be migrated.
If all these collections are empty, all requested migrations have occurred.
If an instance participates in an index (for example, because it is part of the path on which that index was created), then the indexing structure can, under certain circumstances, cause migration to fail. Three scenarios are possible:
You can commit your transaction, if you have done other meaningful work since you last committed, and then follow these steps:
1. Remove the index in which the instance participates.
3. Modify the indexing code as appropriate for the new class version and re-create the index.
The purpose of migration is to retain the data contained in the old object, while updating the object to a possibly entirely new structure.
In most cases, it makes sense for the object contents at particular instance variable names to have the same values as before the migration, and this is the default behavior.
There are cases in which you might want to change the use of a particular instance variable name, and relocate the data to a new instance variable slot. This is described under Customized Instance Variable Mappings.
The default migration behavior:
Suppose, for example, you create two instances of class Animal and initialize their instance variables as shown in Example 11.5.
| aLemming |
aLemming := Animal new
name: 'Leopold';
favoriteFood: 'grass';
habitat: 'tundra';
predator: 'owl';
yourself.
UserGlobals at: #aLemming put: aLemming.
You then decide that class Animal needs an additional instance variable, predator. You create the class called NewAnimal (as described in Example 11.2, with four instance variables: name, favoriteFood, habitat, and predator. You then migrate aLemming.
Example 11.6 performs the migration, and then shows the results of printing the values of the instance variables.
To initialize an instance variable with the value of a variable that has a different name, you must provide an explicit mapping from the instance variable names of the older class to the instance variable names of the migration destination.
This can be done by overriding the implementation of the method migrateFrom:instVarMap: in the destination class.
When you migrate an object, migrateFrom:instVarMap: is sent to the new instance with the argument of the old instance. The instVarMap: argument is a mapping structure that can be further customized, but is usually left at the default.
In our example, the class Animal has the instance variables: habitat, name, favoriteFood, and predator, and NewAnimal has variables: name, diet, predator, and species.
When instances of Animal migrate to NewAnimal, the value of diet ought to be initialized with the value presently held in favoriteFood.
Also, we will keep habitat, by making it a dynamic instance variable on the new instance.
To accomplish this, NewAnimal implements migrateFrom:instVarMap:, as in Example 11.7.
method NewAnimal
migrateFrom: anOldInstance instVarMap: aMap
super migrateFrom: anOldInstance instVarMap: aMap.
self diet: anOldInstance favoriteFood.
self dynamicInstVarAt: #habitat put: anOldInstance habitat.
%
Animal migrationDestination: NewAnimal.
aLemming migrate
%
a NewAnimal
name Leopold
diet grass
predator owl
species nil
t1 habitat
t2 tundra
Another kind of customization is required when the format of data changes. For example, suppose that you have a class named Point, which defines two instance variables x and y. These instance variables define the position of the point in Cartesian two-dimensional coordinate space.
Suppose that you define a class named NewPoint to use polar coordinates. The class has two instance variables named radius and angle. Obviously the default mapping strategy is not going to be helpful here; migrating an instance of Point to become an instance of NewPoint loses its data (the actual position) completely. Nor is it correct to map x to radius and y to angle. Instead, what is needed is a method that implements the appropriate trigonometric function to transform the point to its appropriate position in polar coordinate space.
In this case, the method to override is migrateFrom:instVarMap:, which you implement as an instance method of the class NewPoint. Then, when you request an instance of Point to migrate to an instance of NewPoint, the migration code that calls migrateFrom:instVarMap: executes the method in NewPoint instead of in Object.
Object subclass: #OldPoint
instVarNames: #(x y )
classVars: #()
classInstVars: #()
poolDictionaries: {}
inDictionary: UserGlobals.
OldPoint compileMissingAccessingMethods.
Object subclass: #Point
instVarNames: #(radius angle )
classVars: #()
classInstVars: #()
poolDictionaries: {}
inDictionary: UserGlobals.
Point compileMissingAccessingMethods.
method: Point
migrateFrom: oldPoint instVarMap: aMap
| x y |
super migrateFrom: oldPoint instVarMap: aMap.
x := oldPoint x.
y := oldPoint y.
radius := ((x*x) + (y*y)) asFloat sqrt.
angle := (y/x) asFloat arcTan.
^self
OldPoint migrationDestination: Point.
(OldPoint new x: 123; y: 456) migrate.
%
a Point
radius 472.2975756871932
angle 1.307329785759979
Preparing the set of objects that needs to be migrated can be done in a number of ways. You may, for example, have application collections of the instances that need to be migrated.
Alternatively, there are several methods available to allow you to find instances of one or more classes.
Finding instances requires scanning the entire repository, which can take significant time for very large repositories. Likewise, in a large repository there may be many instances in the result set, potentially more than can fit into memory. The choice of methods to use to locate objects for migration depends on the size of the repository and the number of instances.
The following tables list the methods and some considerations for use. Other methods are available; see the image for details.
Your session is configured to have a certain amount of object memory, defined by the configuration parameter GEM_TEMPOBJ_CACHE_SIZE. The collection containing the instances you are going to migrate must fit into memory, as well as the instances themselves. If the number of instances you are going to migrate is large, you will likely not have enough memory to hold everything.
For a large migration, you will want to:
For large result sets, it may be helpful to use Repository >> allInstances:, which returns one or more instances of GsBitmap. A GsBitmap uses heap memory, not object memory, and the result can be arbitrarily large.
Once you have the GsBitmap or GsBitmaps, you may enumerate them using do:, or retrieve objects from them using methods such as removeCount:, and perform the migration. For more on how to use GsBitmaps, see the section GsBitmap.
While a GsBitmap cannot be committed, it can be saved to a file and reloaded later. However, note that if any of the objects become dereferenced in the period after this file is written, and becomes garbage collected, this cannot be detected. When you read in the GsBitmap file, the oopNumber may reference an entirely different object. It is important to read and process the file as soon as possible after it is created.
Repository-wide scans such as the ones described above use a multi-threaded scan that can be tuned to use more or less resources of the system, thereby impacting performance of anything else running on this system to a greater or lesser degree.
The regular methods use a conservative amount of system resources, while the "fast" variants allow the scan to complete faster and use a greater percentage of system resources. If you are performing a scan in an offline or single-user system, the fast variants may be more appropriate. It is also possible to tune the scan to use fewer resources.
For details on tuning the multi-threaded scan, see the System Administration Guide.
You will, of course, always commit or abort immediately before starting the migration and commit at the end, to ensure all your migrations are committed successfully.
For larger migrations, you will likely need to perform periodic commits to avoid running out of memory, since the migrated objects must be kept in memory until they are committed.
Example 11.10 brings instances into memory and commits in chunks of 10000 objects. The appropriate chunk size will depend on your memory and result set size; much larger chunks may be more appropriate.
| searchResults startIndex limit |
searchResults := (SystemRepository fastAllInstances: { Animal } ).
searchResults do: [:entry | | bm |
bm := entry last.
[bm isEmpty] whileFalse:
[ | chunk |
chunk := bm removeCount: 10000.
entry first migrateInstances: chunk to: NewAnimal.
System commitTransaction
ifFalse: [self error: 'commit failed'].
]
]
For the most efficient migration of large sets of objects of multiple classes, you should perform the migration in page order—the same order as the objects are stored on disk. This allows multiple objects of several different classes on the same page in the repository to be migrated at the same time.
If the repository is in active use, the objects will move from page to page (reclaim moves the live objects off of a page, in order to reclaim the page). If there has been movement such as this between the time the file was written and read, then the page-order efficiency will be lost.
Getting the collection of objects to migrate in page order uses GsBitmap file protocol:
GsBitMap >> writeToFileInPageOrder: aFileName
GsBitMap >> readFromFile: aFileName withLimit: int startingAt: startIndex
For example, to migrate all instances of Animal in page order, in chunks of 2000;
| searchResults bm startIndex limit |
searchResults := (SystemRepository allInstances: { Animal }) first.
searchResults last writeToFileInPageOrder: 'animal_instances.bm'.
limit := bm size.
startIndex := 1.
Animal migrationDestination: NewAnimal.
[ startIndex <= limit ] whileTrue:
[bm := GsBitmap new.
(bm readFromFile: 'animal_instances.bm' withLimit: 2000
startingAt: startIndex).
bm do: [:ea | ea migrate].
startIndex := startIndex + 2000.
System commitTransaction
ifFalse: [self error: 'commit failed'].
].
For simple migrations, the migration can be considerably faster using multi-threaded instance migration. This allows you to migrate all instances of up to 2000 classes in a single operation.
Only simple migration operations on non-collection classes are supported.
The operation commits if successful, so this should be done when no other users would be committing, to avoid concurrency conflicts.
Multithreaded migration requires the MigrateObjects privilege.
Multi-threaded migration is configured using an instance of InstVarMappingArray, which maps an old class instance to the migration destination new class instance, and specifies the changes to be made during migration.
To ensure that your migration produces the results you expect and there are no unforeseen issues, it is recommended that prior to the actual migration, you test the migration with a more limited number of test objects. Repository >> testMigrateMt:with: allows you to perform the same migration as Repository >> migrateMt:, on a limited set of objects, and does not commit; after running this method commits are disallowed, so you must abort after running this test.
The input to multithreaded migration is an array of instance of InstVarMappingArray. You will create an individual instance of InstVarMappingArray for each class that you want to migrate. This instance specifies migratation of instance variables defined on that class and inherited instance variables in the same way.
Create the instance using mappingFrom:to:,
InstVarMappingArray class >> mappingFrom: originalClass to: newClass
By default, instance varialves values and dynamic instance variable values are preserved in the new instance. You may configure some specific changes using the following methods:
InstVarMappingArray >> mapInstVarNamed: originalClassInstVarName to: newClassInstVarName
InstVarMappingArray >> mapInstVarToNil: originalClassInstVarName
InstVarMappingArray >> preserveDynamic: aBoolean
For example, to create a mapping from OrigClass to NewClass that moves the value of the instance variable oldName to the location of the instance variable newName in the NewClass:
| im |
im := InstVarMappingArray mappingFrom: OrigClass to: NewClass.
im mapInstVarNamed: #oldName to: #newName.
im mapInstVarToNil: #oldName.
im preserveDynamic: false.
SystemRepository migrateMt: { im }
Note that if you do not specify to set #oldName to nil, the value at oldName will be retained in the migrated instance (now a NewClass) in #oldName, as well as being set in the instance variable #newName.