10. Class versions and Instance Migration

Previous chapter

Next chapter

Although you designed your schema with care and thought, after using it for a while you will probably find a few things you would like to improve. Furthermore, even if your design was perfect, real-world changes usually require changes to the schema sooner or later.

This chapter discusses the mechanisms GemStone Smalltalk provides to allow you to make changes in your schema and manage the migration of existing objects to the new schema.

Versions of Classes
defines the concept of a class version and describes two different approaches you can take to specify one class as a version of another.

ClassHistory
describes the GemStone Smalltalk class that encapsulates the notion of class versioning.

Migrating Objects
explains how to migrate either certain instances, or all of them, from one version of a class to another while retaining the data that these instances hold.

10.1 Versions of Classes

In order to create instances of a class, the class must be invariant, and invariant classes cannot be modified, except in some specific ways. While you defined your schema to be as complete as you could at the time you created the classes, inevitably further changes are needed. You may now have instances of invariant classes populating your database and a need to modify your schema by redefining certain of these classes.

To support this schema modification, GemStone allows you to define different versions of classes. Every class in GemStone has a class history—an object that maintains a list of all versions of the class—and every class is listed in at least one class history, the class history for the class itself. You can define as many different versions of a class as required, and declare that the different versions belong to the same class history. You can migrate some or all instances of one version of a class to another version when you need to. The values of the instance variables of the migrating instances are retained if you have defined the new version to do so.

Defining a New Version

In GemStone Smalltalk classes have versions. Each version is a unique and independent class object, but the versions are related to each other through a common class history. The classes need not share a similar structure, nor even a similar implementation. The classes need not even share a name, although it is less confusing if they do, or if you establish and adhere to some naming convention.

If you define a new class in a SymbolDictionary that already contains an existing class with the same name, it automatically becomes a new version of the previously existing class. This is the most common way of creating new class versions. Instances that predate the creation of the new version remain unchanged, and continue to access the old class’s methods, although tools such as GemBuilder may provide options to automatically migrate instances to the new class. Instances created after the redefinition have the new class’s structure and access to the new class’s methods.

When you define a class, the class creation protocol includes an option to specify the existing class of which the new class is a version. See the keyword newVersionOf:.

New Versions and Subclasses

When you create a new version of a class—for example, Animal—subclasses of the old version of Animal still point to the old version of Animal as their superclass (unless you are using a tool which provides the option to automatically version and recompile subclasses). If you wish these classes to become subclasses of the new version, you need to recompile the subclass definitions to make new versions of the subclasses, specifying the new version of Animal as their superclass.

One way to do this is to file in the subclasses of Animal after making the new version of Animal (assuming the new version of the superclass has the same name).

New Versions and References in Methods

When you create a new version of a class (such as Animal) you typically want your existing code to use the new version rather than the old version. That is, without being recompiled, existing methods containing code like the following should create an instance of the new version rather than of the old version of Animal class:

pet := Animal new.

As long as the new class version replaces an existing class in the same SymbolDictionary, then references from existing methods will be automatically updated to the new class version.

This works because a compiled method does not directly reference a global (e.g., the class Animal), but references a SymbolAssociation in a SymbolDictionary. When you originally compile the method, it resolves the name using an expression similar to the following:

System myUserProfile resolveSymbol: #theClassName

The compiled method includes the resulting SymbolAssociation, whose key is the name of the global and whose value is the class (or other object). The value can be updated at any time, for example when you create a new version of a class.

This tiny performance penalty is what allows global variables to vary. If you have a global that you know will be constant, then you can reference the value directly from a compiled method by making the SymbolAssociation invariant before compiling the method.

While the SymbolAssociation is updated with the new value by versioning the class within the same SymbolDictionary, keep in mind that under some circumstances you may have a SymbolAssociation that does not reference the latest version, or the version you expect. If you have a newer class with the same name in a different SymbolDictionary, or if you delete and recreate the class, the SymbolAssociation will continue to point to the older class.

Class Variables and Class Instance Variables

Adding a Class Variable does not require a new version of your class, but adding a class instance variable does.

When you create a new version of a class, the values in any Class variables or Class Instances variables in the old class are referenced by the new class as well. By default, all versions of a class refer to the same objects referenced from Class or Class instance variables.

Class versioning and Class options

When you define a class with the same name and variables, but with a different set of options (passed in via the options: keyword to the class creation method), it does not always need to create a new version of the class.

See here for a description of the class options and how they can be used for specific class behavior.

Class options are automatically inherited by new version or by an updated version, unless the first element in options: array is #noInheritOptions.

If the new definition includes #instancesNonPersistent, then the existing class will be modified to add #instancesNonPersistent.

If the new definition includes #subclassesDisallowed, #disallowGciStore, #traverseByCallback, #dbTransient, or #instancesInvariant, and if #noInheritOptions is not the first element, the options will not be removed.

To remove an unwanted option, you may need to include #noInheritOptions as the first element of the options: array.

10.2 ClassHistory

In GemStone Smalltalk, every class has a class history, represented by the system as an instance of the class ClassHistory. A class history is an array of classes that are meant to be different versions of each other. While they often have the same class name, this is not a requirement; you can rename classes as well as change their structure.

Defining a Class as a new version of an existing Class

When you define a new class in the same symbol dictionary as an existing class with the same name, it is by default created as the latest version of the existing class and shares its class history.

When you define a new class by a name that is new to a symbol dictionary, the class is by default created with a unique class history. If you use a class creation message that includes the keyword newVersionOf:, you can specify an existing class whose history you wish the new class to share. This is useful if you want to create a version of a class with a different name or in a different symbol dictionary. If the new class version has the same name and is in the same symbol dictionary, it is not necessary to use newVersionOf:, since the new class will become a version of the existing class automatically.

For example, suppose your existing class Animal was defined like this:

Example 10.1

Object subclass: 'Animal'
	instVarNames: #('habitat' 'name' 'favoriteFood' 'predator')
	classVars: #()
	classInstVars:  #()
	poolDictionaries:  {}
	inDictionary: UserGlobals.
 
Animal compileMissingAccessingMethods.
 
 

Example 10.2 creates a class named NewAnimal and specifies that the class shares the class history used by the existing class Animal.

Example 10.2

Object subclass: 'NewAnimal'
	instVarNames: #('name' 'diet' 'predator' 'species')
	classVars: #()
	classInstVars:  #()
	poolDictionaries: {}
	inDictionary: UserGlobals
	newVersionOf: Animal
	description: nil
	options: #().
 
NewAnimal compileMissingAccessingMethods.
 
 

If you wish to define a new class Animal with its own unique class history—in other words, the new class Animal is not a version of the old class Animal—you can add it to a different symbol dictionary, and specify the argument nil to the keyword newVersionOf:. However, this can easily create confusion, and make it difficult to diagnose problems. It is only recommended when the two symbol dictionaries will not normally both be loaded and in use at the same time.

If you try to define a new class with the same name as an existing class that you did not create, you will most likely get an error, because you are trying to modify the class history of that class — an object which you are probably not permitted to modify.

Accessing a Class History

You can access the class history of a given class by sending the message classHistory to the class. For example, the following expression returns the class history of the class Employee:

Animal classHistory

This is a collection that includes all older versions of the class, in order, with the most recent version being the last one on the collection. For example:

Animal classHistory last == NewAnimal
true
Animal classHistory first == Animal
true

In the usual case, in which all classes in the class history have the same name, the compiler will resolve that name as the last one in the classHistory.

In some cases (such as in GemBuilder for Smalltalk) a class with a class history size that is larger than one these will be displayed with the position, for example:

Animal [1]
NewAnimal [2]

Assigning to a Class History

You can assign a class to a class history by sending the message addNewVersion: to the class whose class history you wish to amend; the argument to this message is the class whose history is to be reassigned.

For example, suppose that we created NewAnimal using the regular class creation protocol, and did not use the method with the keyword newVersionOf:. To later specify that it is a new version of Animal, execute the following expression:

Animal addNewVersion: NewAnimal 

10.3 Migrating Objects

Once you have defined a new version of your class, you may want to migrate your existing instances from the old class version to the new version. Migration in GemStone Smalltalk is a flexible, configurable operation.

Migration Destinations

You can establish the default destination class for migration, for use in later instance migration operations. To do so, send a message of the form:

OldClass migrateTo: NewClass

This configures the old class so it knows to migrate its instances to become instances of the new class. Migration does not occur as a result of sending the above message; this only sets the destination of migration.

It is not necessary to set a migration destination ahead of time; other protocol will allow you to specify the migration class for a specific instance migration. If you use methods that includes a specific migration destination class, the default destination is ignored.

Once you have set the migration destination, you can migrate a single specific instance, using a message of the form:

anInstanceOfOldClass migrate

Provided the object is an instance of a class for which a migration destination has been defined, the object becomes an instance of the new class. If no destination has been defined, no change occurs.

The following series of expressions, for example, creates a new instance of Animal, sets Animal’s migration destination to be NewAnimal, and then causes the new instance of Animal to become an instance of NewAnimal.

Example 10.3

| aLemming |
aLemming := Animal new.
Animal migrateTo: NewAnimal.
aLemming migrate.
 

Other instances of Animal remain unchanged, until such time as they receive the message to migrate.

Bypassing the Migration Destination

You can bypass the migration destination, or migrate instances of classes for which no migration destination has been specified. To do so, specify the destination directly in the message that performs the migration.

OldClass migrateInstances: setOfInstancesOfOldClass to: NewClass
Migrate the specific instances in setOfInstancesOfOldClass, which are instances of OldClass, to become instances of NewClass. Ignores any existing migration destination, and does not set the default migration destination.

OldClass migrateInstancesTo: NewClass
Migrate all instances of OldClass to become instances of NewClass. Ignores any existing migration destination, and does not set the default migration destination.

Example 10.4 uses migrateInstances:to: to migrate all instances of all versions of a class, except the latest version, to the latest version.

For more on listing instances, see Finding Instances.

Example 10.4

| animalHist allAnimals |
animalHist := Animal classHistory.
allAnimals := SystemRepository listInstances: animalHist.
1 to: animalHist size-1 do: [:index | 
	(animalHist at: index) 
		migrateInstances:(allAnimals at: index)
		to: Animal currentVersion].
 

When you migrate a set of instances, it is possible that not all instances in the set can be successfully migrated.

The migration methods migrateInstancesTo: and migrateInstances:to: return an array of collections:

1. The set of objects for which the current user does not have read permission.

2. The set of objects for which the current user has read permission but not write permission.

3. The set of objects that could not be migrated due to index incompatibilities.

4. The set of objects whose class was not identical to the receiver (presumably, incorrectly gathered instances) and therefore were not migrated; this will always be empty if using migrateInstancesTo:.

5. The set of object that failed migration by signalling a MigrationError. This error may be signalled, for example, by customized migration code that encounters an object that cannot be migrated.

If all these collections are empty, all requested migrations have occurred.

Migrating Instances that Participate in an Index

If an instance participates in an index (for example, because it is part of the path on which that index was created), then the indexing structure can, under certain circumstances, cause migration to fail. Three scenarios are possible:

  • Migration succeeds. In this case, the indexing structure you have made remains intact. Commit your transaction.
  • GemStone examines the structures of the existing version of the class and the version to which you are trying to migrate, and determines that migration is incompatible with the indexing structure. In this case, GemStone raises an error notifying you of the problem, and migration does not occur.

You can commit your transaction, if you have done other meaningful work since you last committed, and then follow these steps:

1. Remove the index in which the instance participates.

2. Migrate the instance.

3. Modify the indexing code as appropriate for the new class version and re-create the index.

4. Commit the transaction.

  • In the final case, GemStone fails to determine that migration is incompatible with the indexing structure, and so migration occurs and the indexing structure is corrupted. In this case, GemStone raises an error notifying you of the problem, and you will not be permitted to commit the transaction. Abort the transaction and then follow the steps explained above.

Default Instance Variable Mappings

The purpose of migration is to retain the data contained in the old object, while updating the object to a possibly entirely new structure.

In most cases, it makes sense for the object contents at particular instance variable names to have the same values as before the migration, and this is the default behavior.

There are cases in which you might want to change the use of a particular instance variable name, and relocate the data to a new instance variable slot. This is described under Customized Instance Variable Mappings.

The default migration behavior:

  • If the new class has an instance variable with the same name as the old class, data at that instance variable in the instance of the old class is moved to the instance variable in the migrated instance.
  • If the new class has an instance variable that was not present in the old class, the migrated instance has that instance variable set to nil
  • If the new class does not have an instance variable that was present in the old class, the value at that instance variable is dereferenced. The data it represents is no longer accessible from this object
  • Values of dynamic instance variables in the instance of the old class remain as dynamic instance variables in the migrated instance.

Suppose, for example, you create two instances of class Animal and initialize their instance variables as shown in Example 10.5.

Example 10.5

| aLemming |
aLemming := Animal new
	name: 'Leopold';
	favoriteFood: 'grass';
	habitat: 'tundra';
	predator: 'owl';
	yourself.
UserGlobals at: #aLemming put: aLemming.
 

You then decide that class Animal needs an additional instance variable, predator. You create the class called NewAnimal (as described in Example 10.2, with four instance variables: name, favoriteFood, habitat, and predator. You then migrate aLemming.

Example 10.6 performs the migration, and then shows the results of printing the values of the instance variables.

Example 10.6

Animal migrateInstances: { aLemming } to: NewAnimal.
 
aLemming name.
Leopold
 
aLemming predator.
owl
 
aLemming diet.
nil
 

Customized Instance Variable Mappings

To initialize an instance variable with the value of a variable that has a different name, you must provide an explicit mapping from the instance variable names of the older class to the instance variable names of the migration destination.

This can be done by overriding the implementation of the method migrateFrom:instVarMap: in the destination class.

When you migrate an object, migrateFrom:instVarMap: is sent to the new instance with the argument of the old instance. The instVarMap: argument is a mapping structure that can be further customized, but is usually left at the default.

In our example, the class Animal has the instance variables: habitat, name, favoriteFood, and predator, and NewAnimal has variables: name, diet, predator, and species.

When instances of Animal migrate to NewAnimal, the value of diet ought to be initialized with the value presently held in favoriteFood.

Also, we will keep habitat, by making it a dynamic instance variable on the new instance.

To accomplish this, NewAnimal implements migrateFrom:instVarMap:, as in Example 10.7.

Example 10.7 NewAnimal class >> instVarMappingTo:

method NewAnimal
migrateFrom: anOldInstance instVarMap: aMap
	super migrateFrom: anOldInstance instVarMap: aMap.
	self diet: anOldInstance favoriteFood.
	self dynamicInstVarAt: #habitat put: anOldInstance habitat.
%
 
Animal migrationDestination: NewAnimal.
aLemming migrate
%
a NewAnimal
  name                Leopold
  diet                grass
  predator            owl
  species             nil
  t1 habitat
  t2 tundra
 
 

Transforming Variable Values

Another kind of customization is required when the format of data changes. For example, suppose that you have a class named Point, which defines two instance variables x and y. These instance variables define the position of the point in Cartesian two-dimensional coordinate space.

Suppose that you define a class named NewPoint to use polar coordinates. The class has two instance variables named radius and angle. Obviously the default mapping strategy is not going to be helpful here; migrating an instance of Point to become an instance of NewPoint loses its data (the actual position) completely. Nor is it correct to map x to radius and y to angle. Instead, what is needed is a method that implements the appropriate trigonometric function to transform the point to its appropriate position in polar coordinate space.

In this case, the method to override is migrateFrom:instVarMap:, which you implement as an instance method of the class NewPoint. Then, when you request an instance of Point to migrate to an instance of NewPoint, the migration code that calls migrateFrom:instVarMap: executes the method in NewPoint instead of in Object.

Example 10.8 Point >> migrateFrom:instVarMap:

Object subclass: #OldPoint
	instVarNames: #(x y )
	classVars: #()
	classInstVars: #()
	poolDictionaries: {}
	inDictionary: UserGlobals.
 
OldPoint compileMissingAccessingMethods.
 
Object subclass: #Point
	instVarNames: #(radius angle )
	classVars: #()
	classInstVars: #()
	poolDictionaries: {}
	inDictionary: UserGlobals.
 
Point compileMissingAccessingMethods.
 
method: Point 
migrateFrom: oldPoint instVarMap: aMap
	| x y |
	super migrateFrom: oldPoint instVarMap: aMap.
	x := oldPoint x.
	y := oldPoint y.
	radius := ((x*x) + (y*y)) asFloat sqrt.
	angle := (y/x) asFloat arcTan.
	^self
 
OldPoint migrationDestination: Point.
(OldPoint new x: 123; y: 456) migrate.
%
a Point 
  radius               472.2975756871932
  angle                1.307329785759979
 
 

If you may be migrating instances from a completely separate version of class Point, that does not have the instance variables x and y, nor use the Cartesian coordinate system, then you should add behavior to the old class to make it polymorphic to the new class.

Finding Instances

Preparing the set of objects that needs to be migrated can be done in a number of ways. You may, for example, have application collections of the instances that need to be migrated.

Alternatively, there are several methods available to allow you to find instances of one or more classes.

Finding instances requires scanning the entire repository, which can take significant time for very large repositories. Likewise, in a large repository there may be many instances in the result set, potentially more than can fit into memory. The choice of methods to use to locate objects for migration depends on the size of the repository and the number of instances.

The following tables list the methods and some considerations for use. Other methods are available; see the image for details.

Table 10.9 Finding instances

Expression

Return value

Utility

SystemRepository listInstances: anArrayOfClasses

SystemRepository fastListInstances: anArrayOfClasses

Returns an Array of Arrays; each contains all instances whose class is equal to the corresponding element in anArrayOfClasses. anArrayOfClasses may include multiple classes with the same class name, such as versions of the same class.

Performs one repository scan finding instances of all the specified classes. The result set objects are in-memory.

Both temporary instances and persistent instances are included in the results.

SystemRepository allInstances: classOrCollOfClasses

SystemRepository fastAllInstances: classOrCollOfClasses

Returns an instance of GsBitmap, or a set of class/GsBitmap instances corresponding to the elements of the argument.

Performs one repository scan finding instances of all the specified classes.

Only committed objects are included in the results.

GsBitmap and its content objects do not consume object memory, but you will still need to bring each instance into memory in order to migrate.

Tuning migration and managing memory

Your session is configured to have a certain amount of object memory, defined by the configuration parameter GEM_TEMPOBJ_CACHE_SIZE. The collection containing the instances you are going to migrate must fit into memory, as well as the instances themselves. If the number of instances you are going to migrate is large, you will likely not have enough memory to hold everything.

For a large migration, you will want to:

  • Increase the amount of temporary object memory by tuning GEM_TEMPOBJ_CACHE_SIZE. See the System Administration Guide for details.
  • Commit your transaction after migrating some number of instances. This allows that memory to be reused.
  • Use GsBitmaps to hold the collections of objects to be migrated, and bring them into memory (fault them in) in manageable chunks. See Example 10.10.
  • When finding instances, consider the time it will take to scan your repository. Repository scans can be tuned; a fast variant may be more appropriate, or a reduced-impact scans that can run in the background.
  • Dividing the work between multiple gem sessions allows migrate to complete more quickly.
  • Migration generates shadowed objects, and large migrations should tune reclaim to make sure these shadow objects do not cause the repository to grow excessively. See the System Administration Guide for how to manage reclaim.
  • In some cases, migrating in page order can provide benefit. See Example 10.11 below.

Using GsBitmaps to manage memory for large result sets

For large result sets, it may be helpful to use Repository >> allInstances:, which returns one or more instances of GsBitmap. A GsBitmap uses heap memory, not object memory, and the result can be arbitrarily large.

Once you have the GsBitmap or GsBitmaps, you may enumerate them using do:, or retrieve objects from them using methods such as removeCount:, and perform the migration. For more on how to use GsBitmaps, see the section GsBitmap.

While a GsBitmap cannot be committed, it can be saved to a file and reloaded later. However, note that if any of the objects become dereferenced in the period after this file is written, and becomes garbage collected, this cannot be detected. When you read in the GsBitmap file, the oopNumber may reference an entirely different object. It is important to read and process the file as soon as possible after it is created.

Tuning system resource use when finding instances

Repository-wide scans such as the ones described above use a multi-threaded scan that can be tuned to use more or less resources of the system, thereby impacting performance of anything else running on this system to a greater or lesser degree.

The regular methods use a conservative amount of system resources, while the "fast" variants allow the scan to complete faster and use a greater percentage of system resources. If you are performing a scan in an offline or single-user system, the fast variants may be more appropriate. It is also possible to tune the scan to use fewer resources.

For details on tuning the multi-threaded scan, see the System Administration Guide.

Committing the migration in chunks

You will, of course, always commit or abort immediately before starting the migration and commit at the end, to ensure all your migrations are committed successfully.

For larger migrations, you will likely need to perform periodic commits to avoid running out of memory, since the migrated objects must be kept in memory until they are committed.

Example 10.10 brings instances into memory and commits in chunks of 10000 objects. The appropriate chunk size will depend on your memory and result set size; much larger chunks may be more appropriate.

Example 10.10 Migration using GsBitmap in chunks

| searchResults startIndex limit |
searchResults := (SystemRepository fastAllInstances: { Animal } ).
searchResults do: [:entry | | bm |
	bm := entry last.
	[bm isEmpty] whileFalse: 
		[ | chunk |
		chunk := bm removeCount: 10000.
		entry first migrateInstances: chunk to: NewAnimal.
		System commitTransaction 
			ifFalse: [self error: 'commit failed'].
		]
	]
 

Migrating instances in Page Order

For the most efficient migration of large sets of objects of multiple classes, you should perform the migration in page order—the same order as the objects are stored on disk. This allows multiple objects of several different classes on the same page in the repository to be migrated at the same time.

If the repository is in active use, the objects will move from page to page (reclaim moves the live objects off of a page, in order to reclaim the page). If there has been movement such as this, than the page-order efficiency will be lost.

Getting the collection of objects to migrate in page order uses GsBitmap file protocol:

GsBitMap >> writeToFileInPageOrder: aFileName
GsBitMap >> readFromFile: aFileName withLimit: int startingAt: startIndex

For example, to migrate all instances of Animal in page order, in chunks of 2000;

Example 10.11 Page order migration

| searchResults bm startIndex limit |
searchResults := (SystemRepository allInstances: { Animal }) first.
searchResults last writeToFileInPageOrder: 'animal_instances.bm'.
limit := bm size.
startIndex := 1.
Animal migrationDestination: NewAnimal.
[ startIndex <= limit ]  whileTrue: 
	[bm := GsBitmap new.
	(bm readFromFile: 'animal_instances.bm' withLimit: 2000 
		startingAt: startIndex).
	bm do: [:ea | ea migrate]. 
	startIndex := startIndex + 2000.
	System commitTransaction 
		ifFalse: [self error: 'commit failed'].
	].
 

Previous chapter

Next chapter