10. Class versions and Instance Migration

Previous chapter

Next chapter

Although you designed your schema with care and thought, after using it for a while you will probably find a few things you would like to improve. Furthermore, even if your design was perfect, real-world changes usually require changes to the schema sooner or later.

This chapter discusses the mechanisms GemStone Smalltalk provides to allow you to make changes in your schema and manage the migration of existing objects to the new schema.

Versions of Classes
defines the concept of a class version and describes two different approaches you can take to specify one class as a version of another.

ClassHistory
describes the GemStone Smalltalk class that encapsulates the notion of class versioning.

Migrating Objects
explains how to migrate either certain instances, or all of them, from one version of a class to another while retaining the data that these instances hold.

10.1 Versions of Classes

In order to create instances of a class, the class must be invariant, and invariant classes cannot be modified. While you defined your schema to be as complete as you could at the time you created the classes, inevitably further changes are needed. You may now have instances of invariant classes populating your database and a need to modify your schema by redefining certain of these classes.

To support this schema modification, GemStone allows you to define different versions of classes. Every class in GemStone has a class history—an object that maintains a list of all versions of the class—and every class is listed in at least one class history, the class history for the class itself. You can define as many different versions of a class as required, and declare that the different versions belong to the same class history. You can migrate some or all instances of one version of a class to another version when you need to. The values of the instance variables of the migrating instances are retained if you have defined the new version to do so.

Defining a New Version

In GemStone Smalltalk classes have versions. Each version is a unique and independent class object, but the versions are related to each other through a common class history. The classes need not share a similar structure, nor even a similar implementation. The classes need not even share a name, although it is probably less confusing if they do, or if you establish and adhere to some naming convention.

If you define a new class in a SymbolDictionary that already contains an existing class with the same name, it automatically becomes a new version of the previously existing class. This is the most common way of creating new class versions. Instances that predate the creation of the new version remain unchanged, and continue to access the old class’s methods, although tools such as GemBuilder or GemTools may provide options to automatically migrate instances to the new class. Instances created after the redefinition have the new class’s structure and access to the new class’s methods.

When you define a class, the class creation protocol includes an option to specify the existing class of which the new class is a version. See the keyword newVersionOf:.

New Versions and Subclasses

When you create a new version of a class—for example, Animal—subclasses of the old version of Animal still point to the old version of Animal as their superclass (unless you are using a tool which provides the option to automatically version and recompile subclasses). If you wish these classes to become subclasses of the new version, you need to recompile the subclass definitions to make new versions of the subclasses, specifying the new version of Animal as their superclass.

One way to do this is to file in the subclasses of Animal after making the new version of Animal (assuming the new version of the superclass has the same name).

New Versions and References in Methods

When you create a new version of a class (such as Animal) you typically want your existing code to use the new version rather than the old version. That is, without being recompiled, existing methods containing code like the following should create an instance of the new version rather than of the old version of Animal class:

pet := Animal new.

As long as the new class version replaces an existing class in the same SymbolDictionary, then references from existing methods will be automatically updated to the new class version.

This works because a compiled method does not directly reference a global (e.g., the class Animal), but references a SymbolAssociation in a SymbolDictionary. When you originally compile the method, it resolves the name using an expression similar to the following:

System myUserProfile resolveSymbol: #theClassName

The compiled method includes the resulting SymbolAssociation, whose key is the name of the global and whose value is the class (or other object). The value can be updated at any time, for example when you create a new version of a class.

This tiny performance penalty is what allows global variables to vary. If you have a global that you know will be constant, then you can reference the value directly from a compiled method by making the SymbolAssociation invariant before compiling the method.

While the SymbolAssociation is updated with the new value by versioning the class within the same SymbolDictionary, keep in mind that under some circumstances you may have a SymbolAssociation that does not reference the latest version, or the version you expect. If you have a newer class with the same name in a different SymbolDictionary, or if you delete and recreate the class, the SymbolAssociation will continue to point to the older class.

Class Variable and Class Instance Variables

When you create a new version of a class, the values in any Class variables or Class Instances variables in the old class are referenced by the new class as well. By default, all versions of a class refer to the same objects referenced from Class or Class instance variables.

10.2 ClassHistory

In GemStone Smalltalk, every class has a class history, represented by the system as an instance of the class ClassHistory. A class history is an array of classes that are meant to be different versions of each other. While they often have the same class name, this is not a requirement; you can rename classes as well as change their structure.

Defining a Class as a new version of an existing Class

When you define a new class in the same symbol dictionary as an existing class with the same name, it is by default created as the latest version of the existing class and shares its class history.

When you define a new class by a name that is new to a symbol dictionary, the class is by default created with a unique class history. If you use a class creation message that includes the keyword newVersionOf:, you can specify an existing class whose history you wish the new class to share. This is useful if you want to create a version of a class with a different name or in a different symbol dictionary. If the new class version has the same name and is in the same symbol dictionary, it is not necessary to use newVersionOf:, since the new class will become a version of the existing class automatically.

For example, suppose your existing class Animal was defined like this:

Example 10.1
Object subclass: 'Animal'
	instVarNames: #('habitat' 'name' 'predator')
	classVars: #()
	classInstVars:  #()
	poolDictionaries:  {}
	inDictionary: UserGlobals
 

Example 10.2 creates a class named NewAnimal and specifies that the class shares the class history used by the existing class Animal.

Example 10.2
Object subclass: 'NewAnimal'
	instVarNames: #('diet' 'favoriteFood' 'habitat' 'name'
		'predator')
	classVars: #()
	classInstVars:  #()
	poolDictionaries: {}
	inDictionary: UserGlobals
	description: nil
	newVersionOf: Animal
	options: #()
 

If you wish to define a new class Animal with its own unique class history—in other words, the new class Animal is not a version of the old class Animal—you can add it to a different symbol dictionary, and specify the argument nil to the keyword newVersionOf:. See Example 10.3.

Example 10.3
Object subclass: 'Animal'
	instVarNames: #('favoriteFood' 'habitat' 'name'
		'predator')
	classVars: #()
	classInstVars:  #()
	poolDictionaries:  {}
	inDictionary: Published
	description: nil
	newVersionOf: nil
	options: #()
 

If you try to define a new class with the same name as an existing class that you did not create, you will most likely get an error, because you are trying to modify the class history of that class — an object which you are probably not permitted to modify. By specifying a newVersionOf: of nil, you can still create this class.

However, we recommend against creating multiple unrelated versions of classes with the same name; this can be confusing and it may be difficult to diagnose problems.

Accessing a Class History

You can access the class history of a given class by sending the message classHistory to the class. For example, the following expression returns the class history of the class Employee:

Employee classHistory

Assigning a Class History

You can assign a class history by sending the message addNewVersion: to the class whose class history you wish to use; the argument to this message is the class whose history is to be reassigned. For example, suppose that we created NewAnimal using the regular class creation protocol, and did not use the method with the keyword newVersionOf:. To later specify that it is a new version of Animal, execute the following expression:

Animal addNewVersion: NewAnimal 

10.3 Migrating Objects

Once you define two or more versions of a class, you may wish to migrate instances of the class from one version to another. Migration in GemStone Smalltalk is a flexible, configurable operation:

Migration Destinations

If you know the appropriate class to which you wish to migrate instances of an older class, you can set a migration destination for the older class. To do so, send a message of the form:

OldClass migrateTo: NewClass

This message configures the old class to migrate its instances to become instances of the new class, but only when it is instructed to do so. Migration does not occur as a result of sending the above message.

It is not necessary to set a migration destination ahead of time. You can specify the destination class when you decide to migrate instances. It is also possible to set a migration destination, and then migrate the instances of the old class to a completely different class, by specifying a different migration destination in the message that performs the migration.

You can erase the migration destination for a class by sending it the message cancelMigration. For example:

OldClass cancelMigration

If you are in doubt about the migration destination of a class, you can query it with an expression of the form:

MyClass migrationDestination

The message migrationDestination returns the migration destination of the class, or nil if it has none.

Migrating Instances

A number of mechanisms are available to allow you to migrate one instance, or a specified set of instances, to either the migration destination, or to an alternate explicitly specified destination.

No matter how you choose to migrate your data, however, you should migrate data in its own transaction. That is, as part of preparing for migration, commit your work so far. In this way, if migration should fail because of some error, you can abort your transaction and you will lose no other work; your database will be in a consistent state from which you can try again.

Moreover, many of the methods discussed below — allInstances, listInstances:, migrateInstancesTo:, and others — abort your current view and thus must be executed in a separate transaction.

After migration succeeds, commit your transaction again before you do any further work. Again, this technique ensures a consistent database from which to proceed.

If you need to migrate many instances of a class, break your work into multiple transactions.

Finding Instances and References

To prepare for instance migration, several methods are available to help you find instances of specified classes or references to such instances. An expression of the form:

SystemRepository listInstances: anArray

takes as its argument an array of classes, and returns an array of sets. Each set contains all instances whose class is equal to the corresponding element in the argument anArray.

NOTE
The above method searches the database once for all classes in the array. Executing allInstances for each class would require searching the database once per class.

An expression of the form:

SystemRepository listReferences: anArray

takes as its argument an array of objects, and returns an array of sets. Each set contains all instances that refer to the corresponding element in the argument anArray.

NOTE
Executing either listInstances: or listReferences: causes an abort. However, if the abort would cause any modifications to persistent objects to be lost, the method will signal a TransactionError instead.

Repository-wide scans such as listInstances: use a multi-threaded scan that can be tuned to use more or less resources of the system, thereby impacting performance of anything else running on this system to a greater or lesser degree. For details on tuning the multi-threaded scan, see the System Administration Guide.

What If the Result Set Is Very Large?

If Repository>>listInstances: returns a very large result set, there is a risk of out of memory errors. To avoid the need to have the entire result set in memory, the following methods are provided:

Repository >> listInstances: anArray limit: aSmallInteger

This method is similar to listInstances:, but returns just the first aSmallInteger instances of each of the classes in anArray.

Repository >> listInstancesToHiddenSet: aClass

This method puts the set of all instances of aClass in a new hidden set (an internal memory structure that, while not an object, is treated as one).

To enumerate the hidden set, you can use this method:

System Class >> hiddenSetEnumerate: hiddenSetId limit: maxElements

using a hiddenSetId of 1, which is the number of the “ListInstancesResult” hidden set in GemStone/S 64 Bit v3.3. This is the hidden set in which listInstances results are placed. This hidden set number is subject to change in new releases. To determine which hidden sets are in a particular release, use the GemStone Smalltalk method System Class >> HiddenSetSpecifiers.

For more on how to use hidden sets, see the section Hidden Sets.

You can also list instances to an external binary file, which can later be read into a hidden set. To do this, use the method:

Repository >> listInstances: anArray toDirectory: aString

This method scans the repository for the instances of classes in anArray and writes the results to binary bitmap files in the directory specified by aString. Binary bitmap files have an extension of .bm and may be loaded into hidden sets using class methods in System.

Bitmap files are named:

className-classOop-instances.bm 

where className is the name of the class and classOop is the object ID of the class.

The result is an Array of pairs. For each element of the argument anArray, the result array contains aClass, numberOfInstances. The numberOfInstances is the total number written to the output bitmap file.

List Instances in Page Order

For even more efficient migration of large sets of objects of multiple classes, you can list all the instances of all the classes in page order - the same order as the objects are stored on disk. This allows multiple objects of several different classes on the same page in the repository to be migrated at the same time.

If migration performance is an issue for your application, the following methods can be used to write the list of instances to a file, and open, read, and process the instances from the file.

Repository >> listInstancesInPageOrder: anArray toFile: aString
Repository >> openPageOrderOopFile: aString.
Repository >> readObjectsFromFileWithId: fileId 
	startingAt: startIndex upTo: endIndex into: anArray.
Repository >> closePageOrderOopFileWithId: fileId
Repository >> auditPageOrderOopFileWithId: fileId

For details on these methods and how to use them, refer to the method comments in the image.

Since the normal operation of the repository, where objects are added, removed, and modified, will cause objects to move from page to page, over time the actual ordering of the objects by page will diverge from the order of the results. When the file is read later, it will (of course) not contain any references to objects that were created since the listInstances was run. During the read, if any of the instances have been garbage collected, the Array of results will contain a nil. Given these issues, it is important to read and process the file as soon as possible after it is created.

Using the Migration Destination

The simplest way to migrate an instance of an older class is to send the instance the message migrate. If the object is an instance of a class for which a migration destination has been defined, the object becomes an instance of the new class. If no destination has been defined, no change occurs.

The following series of expressions, for example, creates a new instance of Animal, sets Animal’s migration destination to be NewAnimal, and then causes the new instance of Animal to become an instance of NewAnimal.

Example 10.4
| aLemming |
aLemming := Animal new.
Animal migrateTo: NewAnimal.
aLemming migrate.
 

Other instances of Animal remain unchanged until they, too, receive the message to migrate.

If you have collected the instances you wish to migrate into a collection named allAnimals, execute:

allAnimals do: [:each | each migrate]

Bypassing the Migration Destination

You can bypass the migration destination, if you wish, or migrate instances of classes for which no migration destination has been specified. To do so, you can specify the destination directly in the message that performs the migration. Two methods are available to do this.

Neither of these messages changes the class’s persistent migration destination. Instead, they specify a one-time-only operation that migrates the specified instances, or all instances, to the specified class, ignoring any migration destination that has been defined for the class.

The message migrateInstances:to: takes a collection of instances as the argument to the first keyword, and a destination class as the argument to the second. The following example migrates the specified instances of Animal to instances of NewAnimal:

Animal migrateInstances: #{aDugong . aLemming} to: NewAnimal.

Alternatively, the message migrateInstancesTo: migrates all instances of the receiver to the specified destination class. The following example migrates all instances of Animal to instances of NewAnimal:

Animal migrateInstancesTo: NewAnimal.
 

NOTE
Executing either migrateInstances:to: or migrateInstancesTo: causes an abort. To avoid loss of work, always commit your transaction before you begin data migration.

Example 10.5 uses migrateInstances:to: to migrate all instances of all versions of a class, except the latest version, to the latest version.

Example 10.5
| animalHist allAnimals |
animalHist := Animal classHistory.
allAnimals := SystemRepository listInstances: animalHist.
"Returns an array of the same size as the class history.
 Each element in the array is a set corresponding to one 
 version of the class.  Each set contains all the  
 instances of that version of the class."
 
1 to: animalHist size-1 do: [:index | 
	(animalHist at: index) 
		migrateInstances:(allAnimals at: index)
		to: Animal currentVersion].
 

The migration methods migrateInstancesTo: and migrateInstances:to: return an array of four collections. The first two collections in the array are always empty.

  • The third collection is a set of objects that are instances of indexed collections, and were not migrated. See the following discussion, Migration Errors.
  • The fourth collection is a set of objects whose class was not identical to the receiver—presumably, incorrectly gathered instances—and thus, were not migrated. See Instance Variable Mappings.

If all four of these collections are empty, all requested migrations have occurred.

Migration Errors

Several problems can occur with migration:

  • You may be trying to migrate an object that the interpreter needs to remain in a constant state (migrating to self).
  • You may be trying to migrate an instance that is indexed, or participates in an index.
Migrating self

Sometimes a requested migration operation can cause the interpreter to halt and display an error message of the following form:

The object <anObject> is present on the GemStone Smalltalk
stack, and cannot participate in a become.

This error occurs when you try to send the message migrate (or one of its variants) to self. Migration can change the structure of an object. If the interpreter was already accessing the object whose structure you are trying to change, the database can become corrupted. To avoid this undesirable consequence, the interpreter checks for the presence of the object in its stack before trying to migrate it, and notifies you if it finds it.

If you receive such a notifier, rewrite the method that sends the migration message to self, so as to accomplish its purpose in some other manner.

Migrating Instances that Participate in an Index

If an instance participates in an index (for example, because it is part of the path on which that index was created), then the indexing structure can, under certain circumstances, cause migration to fail. Three scenarios are possible:

  • Migration succeeds. In this case, the indexing structure you have made remains intact. Commit your transaction.
  • GemStone examines the structures of the existing version of the class and the version to which you are trying to migrate, and determines that migration is incompatible with the indexing structure. In this case, GemStone raises an error notifying you of the problem, and migration does not occur.

You can commit your transaction, if you have done other meaningful work since you last committed, and then follow these steps:

1. Remove the index in which the instance participates.

2. Migrate the instance.

3. Modify the indexing code as appropriate for the new class version and re-create the index.

4. Commit the transaction.

  • In the final case, GemStone fails to determine that migration is incompatible with the indexing structure, and so migration occurs and the indexing structure is corrupted. In this case, GemStone raises an error notifying you of the problem, and you will not be permitted to commit the transaction. Abort the transaction and then follow the steps explained above.

For more information about indexing, see Chapter 7, “Indexes and Querying”.

For more information about committing and aborting transactions, see Chapter 8, “Transactions and Concurrency Control”.

Instance Variable Mappings

Earlier, we explained that migration can involve changing the structure of an object. Since migration is only useful if you can retain the data that is contained in these instances, you can set up mappings so instances using the old structure can be migrated to a new structure and updated appropriately.

The following discussion describes the default manner in which instance variables are mapped. This default arrangement can be modified if necessary.

Default Instance Variable Mappings

The simplest way to retain the data held in instance variables is to use instance variables with the same names in both class versions. If two versions of a class have instance variables with the same name, then the values of those variables are automatically retained when the instances migrate from one class to the other.

Suppose, for example, you create two instances of class Animal and initialize their instance variables as shown in Example 10.6.

Example 10.6
| aLemming aDugong |
aLemming := Animal new.
aLemming name: 'Leopold'.
aLemming favoriteFood: 'grass'.
aLemming habitat: 'tundra'.
aDugong := Animal new.
aDugong name:  'Maybelline'.
aDugong favoriteFood: 'seaweed'.
aDugong habitat: 'ocean'.
 

You then decide that class Animal really needs an additional instance variable, predator, which is a Boolean—true if the animal is a predator, false otherwise. You create a class called NewAnimal, and define it to have four instance variables: name, favoriteFood, habitat, and predator, creating accessing methods for all four. You then migrate aLemming and aDugong. What values will they have?

Example 10.7 takes the class and method definitions for granted and performs the migration. It then shows the results of printing the values of the instance variables.

Example 10.7
| bagOfAnimals |
bagOfAnimals := IdentityBag new.
bagOfAnimals add: aLemming; add: aDugong.
Animal migrateInstances: bagOfAnimals to: NewAnimal.
aLemming name.
Leopold
 
aLemming favoriteFood.
grass
 
aLemming habitat.
tundra
 
aLemming predator.
nil
 
aDugong name.
Maybelline
 
aDugong favoriteFood.
seaweed
 
aDugong habitat.
ocean
 
aDugong predator.
nil
 

As you see, the migrated instances retained the data they held. They have done so because the class to which they migrated defined instance variables that had the same names as the class from which they migrated. The new instance variable name was initialized with the value of the old instance variable name, and so on.

The new class also defined an instance variable, predator, for which the old class defined no corresponding variable. This instance variable therefore retains its default value of nil.

If the class to which you migrate instances defines no instance variable having the same name as that of the class from which the instance migrates, the data is dropped. For example, if you migrated an instance of NewAnimal back to become an instance of the original Animal class, any value in predator would be lost. Because Animal defines no instance variable named predator, there is no slot in which to place this value.

To summarize, then:

  • If an instance variable in the new class has the same name as an instance variable in the old class, it retains its value when migrated.
  • If the new class has an instance variable for which no corresponding variable exists in the old class, it is initialized to nil upon migration.
  • If the old class has an instance variable for which no corresponding variable exists in the new class, the value is dropped and the data it represents is no longer accessible from this object.

Customizing Instance Variable Mappings

This section describes two kinds of customization:

  • To initialize an instance variable with the value of a variable that has a different name, you must provide an explicit mapping from the instance variable names of the older class to the instance variable names of the migration destination.
  • To perform a specific operation on the value of a given variable before initializing the corresponding variable in the class to which the object is migrating, you can implement methods to transform the variable values.
Explicit Mapping by Name

The first situation requires providing an explicit mapping from the instance variable names of the older class to the instance variable names of the migration destination. To provide such a customized mapping, override the default mapping strategy by implementing a class method named instVarMappingTo: in your destination class.

For example, suppose that you define the class NewAnimal with three instance variables: species, name, and diet. When instances of Animal migrate to NewAnimal, it is impossible to determine the value to which species ought to be initialized. The value of name can be retained, and the value of diet ought to be initialized with the value presently held in favoriteFood. In that case, the class NewAnimal must define a class method as shown in Example 10.8.

Example 10.8
instVarMappingTo: anotherClass
| result myNames itsNames dietIndex |
"Use the default strategy first to properly fill in inst vars
having the same name."
result := super instVarMappingTo: anotherClass.
myNames := self allInstVarNames.
itsNames := anotherClass allInstVarNames.
dietIndex := myNames indexOfValue: #diet.
dietIndex > 0 
   ifTrue: [(result at: dietIndex) = 0 
	ifTrue:[ result at: dietIndex
		put:(itsNames indexOfValue: #favoriteFood)]].
^result
 

The method allInstVarNames is used because it would also migrate all inherited instance variables, although at the expense of performance. If your class inherits no instance variables, you could use the method instVarNames instead, for efficiency.

Transforming Variable Values

Another kind of customization is required when the format of data changes. For example, suppose that you have a class named Point, which defines two instance variables x and y. These instance variables define the position of the point in Cartesian two-dimensional coordinate space.

Suppose that you define a class named NewPoint to use polar coordinates. The class has two instance variables named radius and angle. Obviously the default mapping strategy is not going to be helpful here; migrating an instance of Point to become an instance of NewPoint loses its data—its position—completely. Nor is it correct to map x to radius and y to angle. Instead, what is needed is a method that implements the appropriate trigonometric function to transform the point to its appropriate position in polar coordinate space.

In this case, the method to override is migrateFrom:instVarMap:, which you implement as an instance method of the class NewPoint. Then, when you request an instance of Point to migrate to an instance of NewPoint, the migration code that calls migrateFrom:instVarMap: executes the method in NewPoint instead of in Object.

Example 10.9
Object subclass: #OldPoint
	instVarNames: #( #x #y )
	classVars: #()
	classInstVars: #()
	poolDictionaries: {}
	inDictionary: UserGlobals.
 
oldPoint compileAccessingMethodsFor: OldPoint instVarNames.
 
Object subclass: #Point
	instVarNames: #( #radius #angle )
	classVars: #()
	classInstVars: #()
	poolDictionaries: {}
	inDictionary: UserGlobals.
 
Point compileAccessingMethodsFor: Point instVarNames.
 
method: Point 
migrateFrom: oldPoint instVarMap: aMap
	| x y |
	x := oldPoint x.
	y := oldPoint y.
	radius := ((x*x) + (y*y)) asFloat sqrt.
	angle := (y/x) asFloat arcTan.
	^self
 
Point new migrateFrom: (OldPoint new x: 123; y: 456) 
   instVarMap: ’unused argument’. 
a Point 
  radius           4.7229757568719322E02 
  angle            2.6346654103491746E-01 
 

Of course, if you believe there is a chance that you might be migrating instances from a completely separate version of class Point that does not have the instance variables x and y, nor use the Cartesian coordinate system, then it is wise to check for the class of the old instance before you determine which method migrateFrom:instVarMap: to use.

For example, you could define a class method isCartesian for your old class Point that returns true. Other versions of class Point could define the same method to return false. (You could even define the method in class Object to return false.) You could then modify the above method as follows:

Example 10.10
method: Point
migrateFrom: oldPoint instVarMap: aMap| x y |
oldPoint isCartesian
    ifTrue: [
	x := oldPoint x.
	y := oldPoint y.
	radius := ((x*x) + (y*y)) asFloat sqrt.
	angle := (y/x) asFloat arcTan.
	^self]
    ifFalse: [^super migrateFrom: oldPoint instVarMap: aMap]
 

 

Previous chapter

Next chapter