14. Performance and Optimization

GemStone Smalltalk includes several tools to help you tune your applications for faster performance.

Profiling Smalltalk Execution
Profiling tools that allow you to pinpoint the problem areas in your application code.

Clustering Objects for Faster Retrieval
How to cluster objects that are often accessed together so that many of them can be found in the same disk access.

Modifying Cache Sizes for Better Performance
How to increase or decrease the size of various caches in order to minimize disk access and storage reclamation.

Managing VM Memory
Issues to consider when managing temporary object memory, and presents techniques for diagnosing and addressing OutOfMemory conditions.

NotTranloggedGlobals
Optimize certain operations by avoiding writing tranlog entries.

Other Optimization Hints
Allow operations on large collections without using temporary object memory.

14.1 Profiling Smalltalk Execution

Many things impact performance, and cache size and disk access often have the largest impact on application performance. However, your GemStone Smalltalk code can also affect the speed of your application. There are a number of tools to help you identify issues and optimize your code.

Time to execute a block

If you simply want to know how long it takes a given block to return its value, you can use GemStone Smalltalk methods that execute a block and return a number.

CPU Time

The familiar method System class >> millisecondsToRun: takes a zero-argument block as its argument and returns the time in milliseconds required to evaluate the block.

topaz 1> run

System millisecondsToRun: [

   System performOnServer: 'ping -c1 gemtalksystems.com']

For microseconds resolution use the parallel microsecondsToRun:

topaz 1> run

System microsecondsToRun:  [

   System performOnServer: 'ping -c1 gemtalksystems.com']

Elapsed Time

Time class >> millisecondsElapsedTime: works similarly, but returns the elapsed rather than the CPU time required.

topaz 1> run

Time millisecondsElapsedTime:  [

   System performOnServer: 'ping -c1 gemtalksystems.com']

To get further resolution, use Time class >> secondsElapsedTime:, which returns a float with system-dependent resolution. For example, to get a result in microseconds:

topaz 1> run

((Time secondsElapsedTime:  [

   System performOnServer: 'ping -c1 gemtalksystems.com']) * 1000000) asInteger

ProfMonitor

The ProfMonitor class allows you to sample the methods that are executed in a given block of code and analyze the percentage of total execution time represented by each method. When an instance starts profiling, it will take a method call stack at specified intervals for a specified period of time. When it is done, it collects the results and returns them in the form of a string formatted as a report.

ProfMonitorTree is a subclass of ProfMonitor, that by default returns an execution tree report or reports, in adddition to the reports generated by ProfMonitor. By specifying the desired reports in arguments to ProfMonitor, these tree reports can be also generated from ProfMonitor.

Sample intervals

ProfMonitor, by default, will take a sample every millisecond (1 ms). You can specify the interval at which ProfMonitor takes samples using the instance methods interval: or intervalNs:, or class methods with these keywords. interval: specifies milliseconds, while intervalNs: specifies the interval in nanoseconds (a nanosecond is a billionth of a second). The minimum interval is 1000 nanoseconds.

It may be convenient to refer to Table 14.1 when determining the sample interval and reading the results:

Table 14.1 Subsecond time conversions
seconds	milliseconds ms	microseconds μs	nanoseconds ns
1	1000	1,000,000	1,000,000,000
	1	1000	1,000,000
		1	1000

Reporting limits

By default, ProfMonitor reports every method it found executing. It is usually useful to limit the reporting of methods to the ones that appear more frequently, to reduce clutter in the results and allow you to focus on what is taking the most time.

To limit the reporting results, set the lower limit using the instance method reportDownTo: limit or methods with the keyword downTo:. Each result at the limit or larger is included in the report.

These methods accept a limit of either an integer, which is an absolute number of samples, or a SmallDouble, which defines a percentage of the total number of samples.

For example, a downTo: of 50 would specify that the reports include information for every method that was sampled at least 50 times, regardless of whether the number of samples was 100 or 1000. A downTo: of 0.50 would specify that the reports include information for methods that were sampled 50% of the time or more; if the total number of samples is 100, this would be 50 actual samples, for a sample set size of 1000, this would be 500 samples.

Reports

ProfMonitor provides profiling results in the form of a string, containing up to six individual reports that analyze the profiling raw data in different ways. The desired reports to be output can be specified using methods with the reports: keyword. By specifying reports, you can also enable object creation tracking.

Available reports include:

#samples—sample counts report, labeled STATISTICAL SAMPLING RESULTS.

#stackSamples—stack sampling report, labeled STATISTICAL STACK SAMPLING RESULTS.

#senders—method senders report, labeled STATISTICAL METHOD SENDERS RESULTS.

#objCreation—object creation report, labeled OBJECT CREATION REPORT. Including this in the reports: argument enables object tracking.

#tree—method execution tree report, labeled STACK SAMPLING TREE RESULTS. Including this in the reports: argument causes ProfMonitorTree to be used for profiling.

#objCreationTree—object creation tree report, labeled OBJECT CREATION TREE REPORT. Including this in the reports: argument enables object tracking and causes ProfMonitorTree to be used for profiling.

The default reports that are provided depend on the initial class specified;

ProfMonitor defaults to { #samples . #stackSamples . #senders }
ProfMonitorTree defaults to { #samples . #stackSamples . #senders . #tree }

Temporary results file

ProfMonitor stores its results temporarily in a file with the default filename /tmp/gempid.tmp. You can specify a different filename by using ProfMonitor’s instance creation method newWithFile: and variants. This file is deleted by profiling block methods, profileOff, and reportAfterRun* methods. Note that if the Gem that is executing profiling terminates abnormally, it may leave this file behind; such files must be manually deleted.

Real vs. CPU time

Profiling operates by taking samples of the stack at intervals specified by the interval: or intervalNs: arguments. Generally, this specifies that samples are taken at the given intervals in CPU time, which provides information about the relative performance of operations based on how much CPU time they use.

It is also possible to profile based on real time, in which case samples are taken after the specified interval of real time has elapsed. This can detect performance issues that are not based on CPU execution, such as a sleep:, and expose the performance impact of disk access and other performance issues external to the code executing.

Sampling time is one of the options that is defined by the setOptions: keyword, either the convenience profiling methods or the instance method. You may include #real or #cpu in this array.

Profiling Code

Convenience Profiling of a Block of Code

ProfMonitor provides several methods that allow you to profile a block of code and report the results with a single class method.

The following profiling methods are available:

monitorBlock:

monitorBlock:reports:

monitorBlock:intervalNs:

monitorBlock:intervalNs:options:

monitorBlock:downTo:

monitorBlock:downTo:interval:

monitorBlock:downTo:intervalNs:

monitorBlock:downTo:intervalNs:options:

monitorBlock:downTo:intervalNs:reports:

The following defaults apply:

When no downTo: keyword is provided, the default is 1; each method sampled is reported.
When no interval: or intervalNs: keyword is provided, the default is 0.5ms (500 microseconds).
When no options: are provided, #CPU is used.
When no reports: are specified, the default is {#samples . #stackSamples . #senders} for ProfMonitor, and {#samples . #stackSamples . #senders . #tree} for ProfMonitorTree.

For example, to take samples every millisecond, and only report methods that were sampled at least 10 times:

ProfMonitor

	monitorBlock: [ 100 timesRepeat:

		[ System myUserProfile dictionaryNames ]]

	downTo: 10

	interval: 1

For a more detailed report, you could take samples every 1/10 of a millisecond; this interval is 100000 nanoseconds. This creates many more samples; to make it easier to control the reporting limit we’ll use a percent, and only include methods whose number of samples was 20% or more of the total.

ProfMonitor

	monitorBlock: [ 100 timesRepeat:

		[ System myUserProfile dictionaryNames ]]

	downTo: 0.2

	intervalNs: 100000

These two reports will give you similar results, but since there are many more samples, the effect of chance sampling error will be less. The choice of sampling interval and report limit depends on the specific code you are profiling. You may need to run a number of iterations, starting with a more coarse-grained profile and refining for subsequent runs.

Background Profiling

To sample blocks of code, the quick profiling methods are sufficient. You can also explicitly start and stop profiling, allowing you to profile any arbitrary sequence of GemStone Smalltalk statements.

To start and stop profiling, use the class method profileOn, which create an instances of ProfMonitor and starts profiling; when you are done, the instance method profileOff stops profiling and reports the results.

For example:

run

UserGlobals at: #myMonitor put: ProfMonitor profileOn.

run

100 timesRepeat: [ System myUserProfile dictionaryNames ].

run

(UserGlobals at: #myMonitor) profileOff.

Manual Profiling

You can also create and configure the instance of ProfMonitor. To profile in this way, perform the following steps:

Step 1. Create instance using ProfMonitor new, newWithFile:, newWithFile:interval:, or newWithFile:intervalNs:.

Step 2. Configure it as desired, using instance methods including interval:, intervalNs:, setOptions:, and traceObjectCreation:.

Step 3. start profiling using the instance method startMonitoring.

Step 4. execute your code.

Step 5. stop profiling using the instance method stopMonitoring.

Steps 3, 4 and 5 can also be done using runBlock:.

Step 6. gather results and report, using reportAfterRun or reportAfterRunDownTo:.

For example:

| aMonitor |

aMonitor := ProfMonitor newWithFile:

	'$GEMSTONE/data/profMon.dat'.

aMonitor interval: 2.

aMonitor setOptions: {#objCreation}.

aMonitor startMonitoring.

100 timesRepeat: [ System myUserProfile dictionaryNames ].

aMonitor stopMonitoring.

aMonitor reportAfterRun.

Saving a ProfMonitor for later analysis

ProfMonitor raw data is written to a disk file, and as long as the disk file is available, you may save the instance of ProfMonitor and it will reopen its file to perform the analysis later or in a different session.

To ensure that the file is saved, use methods such as runBlock:, which do not automatically create the report and delete the file.

For example:

run

UserGlobals at: #aProfMon put:

   (ProfMonitor runBlock: [

	200 timesRepeat: [System myUserProfile dictionaryNames]

]).

commit

logout

login

run

aProfMon reportAfterRun

The Profile Report

The profiling methods discussed in the previous sections return a string formatted as a report. The following example shows a sample run and the resulting report.

Example 14.1

topaz 1> printit

ProfMonitor

        monitorBlock:[

                200 timesRepeat:[ System myUserProfile dictionaryNames] ]

	reports: { #samples . #stackSamples . #senders . #tree}

================

STATISTICAL SAMPLING RESULTS

elapsed CPU time:    90 ms

monitoring interval: 1.0 ms

report limit threshold: 2 hits / 2.2%

0 pageFaults  2061 objFaults  0 gcMs  824413 edenBytesUsed

 tally       %   class and method name

------   -----   --------------------------------------

    23   24.21   Array                     >> _at:

    22   23.16   IdentityDictionary        >> associationsDo:

    18   18.95   block in SymbolList       >> names

    18   18.95   AbstractDictionary        >> _at:

    11   11.58   block in AbstractDictionary >> associationsDetect:ifNone:

     2    2.11   Object                    >> _basicSize

     1    1.05   11 other methods

    95  100.00   Total

================

STATISTICAL STACK SAMPLING RESULTS

elapsed CPU time:    90 ms

monitoring interval: 1.0 ms

report limit threshold: 2 hits / 2.2%

0 pageFaults  2061 objFaults  0 gcMs  824413 edenBytesUsed

 total       %   class and method name

------   -----   --------------------------------------

    95  100.00   GsNMethod class           >> _gsReturnToC

    95  100.00   executed code

    95  100.00   ProfMonitor class         >> monitorBlock:downTo:

    95  100.00   ProfMonitor               >> monitorBlock:

    94   98.95   block in executed code

    94   98.95   UserProfile               >> dictionaryNames

    94   98.95   SymbolList                >> namesReport

    94   98.95   SymbolList                >> names

    94   98.95   AbstractDictionary        >> associationsDetect:ifNone:

    94   98.95   IdentityDictionary        >> associationsDo:

    29   30.53   block in AbstractDictionary >> associationsDetect:ifNone:

    23   24.21   Array                     >> _at:

    18   18.95   block in SymbolList       >> names

    18   18.95   AbstractDictionary        >> _at:

     2    2.11   Object                    >> _basicSize

     1    1.05   2 other methods

    95  100.00   Total

================

STATISTICAL METHOD SENDERS RESULTS

elapsed CPU time:    90 ms

monitoring interval: 1.0 ms

report limit threshold: 2 hits / 2.2%

     %       %                     Parent

  self  total   total  local  Method

  Time   Time      ms    %         Child

------ ------  ------  -----  -----------

=    0.0  100.0    90.0    0.0  GsNMethod class >> _gsReturnToC

                   90.0  100.0       executed code

-----------------------------------------------------

                   90.0  100.0       GsNMethod class >> _gsReturnToC

=    0.0  100.0    90.0    0.0  executed code

                   90.0  100.0       ProfMonitor class >> monitorBlock:downTo:

-----------------------------------------------------

                   90.0  100.0       executed code

=    0.0  100.0    90.0    0.0  ProfMonitor class >> monitorBlock:downTo:

                   90.0  100.0       ProfMonitor  >> monitorBlock:

-----------------------------------------------------

                   90.0  100.0       ProfMonitor class >> monitorBlock:downTo:

=    0.0  100.0    90.0    0.0  ProfMonitor  >> monitorBlock:

                   89.1   98.9       block in executed code

                    0.9    1.1       ProfMonitor  >> startMonitoring

-----------------------------------------------------

                   89.1  100.0       ProfMonitor  >> monitorBlock:

=    0.0   98.9    89.1    0.0  block in executed code

                   89.1  100.0       UserProfile  >> dictionaryNames

-----------------------------------------------------

                   89.1  100.0       block in executed code

=    0.0   98.9    89.1    0.0  UserProfile  >> dictionaryNames

                   89.1  100.0       SymbolList   >> namesReport

-----------------------------------------------------

                   89.1  100.0       UserProfile  >> dictionaryNames

=    0.0   98.9    89.1    0.0  SymbolList   >> namesReport

                   89.1  100.0       SymbolList   >> names

-----------------------------------------------------

                   89.1  100.0       SymbolList   >> namesReport

=    0.0   98.9    89.1    0.0  SymbolList   >> names

                   89.1  100.0       AbstractDictionary >> associationsDetect:ifNone:

-----------------------------------------------------

                   89.1  100.0       SymbolList   >> names

=    0.0   98.9    89.1    0.0  AbstractDictionary >> associationsDetect:ifNone:

                   89.1  100.0       IdentityDictionary >> associationsDo:

-----------------------------------------------------

                   89.1  100.0       AbstractDictionary >> associationsDetect:ifNone:

=   23.2   98.9    89.1   23.4  IdentityDictionary >> associationsDo:

                    1.9    2.1       Object       >> _basicSize

                   27.5   30.9       block in AbstractDictionary >> associationsDetect:ifNone:

                   21.8   24.5       Array        >> _at:

                   17.1   19.1       AbstractDictionary >> _at:

-----------------------------------------------------

                   27.5  100.0       IdentityDictionary >> associationsDo:

=   11.6   30.5    27.5   37.9  block in AbstractDictionary >> associationsDetect:ifNone:

                   17.1   62.1       block in SymbolList >> names

-----------------------------------------------------

                   21.8  100.0       IdentityDictionary >> associationsDo:

=   24.2   24.2    21.8  100.0  Array        >> _at:

-----------------------------------------------------

                   17.1  100.0       block in AbstractDictionary >> associationsDetect:ifNone:

=   18.9   18.9    17.1  100.0  block in SymbolList >> names

-----------------------------------------------------

                   17.1  100.0       IdentityDictionary >> associationsDo:

=   18.9   18.9    17.1  100.0  AbstractDictionary >> _at:

-----------------------------------------------------

                    1.9  100.0       IdentityDictionary >> associationsDo:

=    2.1    2.1     1.9  100.0  Object       >> _basicSize

-----------------------------------------------------

================

STACK SAMPLING TREE RESULTS

elapsed CPU time:    90 ms

monitoring interval: 1.0 ms

report limit threshold: 2 hits / 2.2%

 100.0% (95) executed code        [UndefinedObject]

   100.0% (95) ProfMonitor class    >> monitorBlock:downTo: [ProfMonitor class]

     100.0% (95) ProfMonitor          >> monitorBlock: [ProfMonitor]

       98.9% (94) block in executed code [ExecBlock0]

        |  98.9% (94) UserProfile          >> dictionaryNames

        |    98.9% (94) SymbolList           >> namesReport

        |      98.9% (94) SymbolList           >> names

        |        98.9% (94) AbstractDictionary >> associationsDetect:ifNone: [SymbolDictionary]

        |          98.9% (94) IdentityDictionary   >> associationsDo: [SymbolDictionary]

        |            30.5% (29) block in AbstractDictionary >> associationsDetect:ifNone: [ExecBlock1]

        |             |  18.9% (18) block in SymbolList  >> names [ExecBlock1]

        |            24.2% (23) Array                >> _at: [IdentityCollisionBucket]

        |            18.9% (18) AbstractDictionary   >> _at: [SymbolDictionary]

        |             2.1% (2) Object               >> _basicSize [IdentityCollisionBucket]

As you can see, the report is in four sections, corresponding to the requested reports:

#samples: STATISTICAL SAMPLING RESULTS
#stackSamples: STATISTICAL STACK SAMPLING RESULTS
#senders:STATISTICAL METHOD SENDERS RESULTS
#tree: STACK SAMPLING TREE RESULTS

Each section includes the same set of methods that the profile monitor encountered when it checked the execution stack every millisecond; the report is presented to give different views of this data.

Keep in mind that these numbers are based on sampling, and depending on the size and number of samples, may not exactly reflect the actual percentage of time spent in each method and will likely vary from run to run. Also, if you make external calls to the OS, to user actions or other C libraries, this will also distort results for the invoking method.

Profiling Beyond Performance

Profiling as previously described is focused on the performance of a block of code. ProfMonitor provides additional options that let you track other things that are going on, alongside your code execution, that can impact application performance. These are:

number of persistent and temporary objects created
number of object faults by this Gem
number of page faults by this Gem
space used in the temporary object "eden" space
time spent in in-memory garbage collection

To profile these attributes, use profiling methods with the setOptions: keyword, specify the option that you want to profile.

When profiling these options, you get one or more standard time-based reports along with the specific attribute profile or profiles. This allows you to correlate with the basic performance over a specific execution run.

Type of Profile	Options Keyword	Units	Default time profiling
Object creation	#objCreation		#cpu
Object faults	#objFaults	faults	#real
Page faults	#pageFaults	faults	#real
GC operations	#gcTime	milliseconds	#cpu
Temporary memory eden space	#edenUsage	bytes	#real

Object Creation Tracking

Object creation tracking is enabled in a number of ways:

#objCreation in the setOptions: array;
Specifying #objCreation or #objCreationTree in the reports: array; or by
using the ProfMonitor instance method traceObjectCreation:.

When object creation tracking is enabled, after the standard report sections, an additional section is included to report the count and object creation.

For example:

Example 14.2 Object creation report

OBJECT CREATION REPORT:

elapsed CPU time:    40 ms

monitoring interval: 2.0 ms

tally  class of created object

           call stack

------  -----------------------------------------

   600  String class

 - - -  - - - - - - - - - - - - - - - - - - - - -

            500  SmallInteger >> asString

              500  SymbolList   >> namesReport

                500  UserProfile  >> dictionaryNames

                  500  executed code

                    500  GsNMethod class >> _gsReturnToC

 - - -  - - - - - - - - - - - - - - - - - - - - -

            100  String class >> new

              100  SymbolList   >> namesReport

                100  UserProfile  >> dictionaryNames

                  100  executed code

                    100  GsNMethod class >> _gsReturnToC

------  -----------------------------------------

   100  Array class

 - - -  - - - - - - - - - - - - - - - - - - - - -

            100  SymbolList   >> names

              100  SymbolList   >> namesReport

                100  UserProfile  >> dictionaryNames

                  100  executed code

                    100  GsNMethod class >> _gsReturnToC

Memory Use Profiling

While object creation is the most important way to profile a Gem’s use of memory, ProfMonitor provides several other options to allow you to track the impact of the operations by methods on the Gem’s memory use.

Using the following keys in the setOptions: argument changes the profiling to the specific kind of profile. Only one of these can be used at a time.

#objFaults—profiles object faults
#pageFaults—profiles page faults
#edenSpace—profiles eden space, the area of temporary objects memory into which new temporary objects are put.
#gcTime—profiles the time spent in in-memory garbage collection, as tracked by the cache statistic TimeInScavenges.

By default, for object faults, page faults, and eden space, the sampling frequency is calculated in real time, rather than CPU time. This can be changed by also including #cpu in the setOptions: array. Garbage collection time is sampled by default in CPU time.

When using these option, the first two reports describe faults, rather than milliseconds. The third report provides millisecond performance information to allow correlation with the faulting data.

14.2 Clustering Objects for Faster Retrieval

As you’ve seen, GemStone ordinarily manages the placement of objects on the disk automatically—you’re never forced to worry about it. Occasionally, you might choose to group related objects on secondary storage to enable GemStone to read all of the objects in the group with as few disk accesses as possible.

Because an access to the first element usually presages the need to read the other elements, it makes sense to arrange those elements on the disk in the smallest number of disk pages. This placement of objects on physically contiguous regions of the disk is the function of class Object’s clustering protocol. By clustering small groups of objects that are often accessed together, you can sometimes improve performance.

Clustering a group of objects packs them into disk pages, each page holding as many of the objects as possible. The objects are contiguous within a page, but pages are not necessarily contiguous on the disk.

Will Clustering Solve the Problem?

Clustering objects solves a specific problem—slow performance due to excessive disk accessing. However, disk access is not the only factor in poor performance. In order to determine if clustering will solve your problem, you need to do some diagnosis. You can use GemStone’s VSD utility to find out how many times your application is accessing the disk. VSD allows you to chart system statistics over time to better understand the performance of your system. See the VSD User’s Guide for more information on using VSD.

The following statistics are of interest:

PageReads — how many pages your session has read from the disk since the session began
PageWrites — how many pages your session has written to the disk since the session began

You can examine the values of these statistics before and after you commit each transaction to discover how many pages it read in order to perform a particular query, and to determine the number of disk accesses required by the process of committing the transaction.

It is tempting to ignore these issues until you experience a problem such as an extremely slow application, but if you keep track of such statistics on a regular (even if intermittent) basis, you will have a better idea of what is “normal” behavior when a problem crops up.

Cluster Buckets

You can think of clustering as writing the components of their receivers on a stream of disk pages. When a page is filled, another is randomly chosen and subsequent objects are written on the new page. A new page is ordinarily selected for use only when the previous page is filled, or when a transaction ends. Sending the message cluster to objects in repeated transactions will, within the limits imposed by page capacity, place its receivers in adjacent disk locations. (Sending the message cluster to objects repeatedly within a transaction has no effect.)

The stream of disk pages used by cluster and its companion methods is called a bucket. GemStone captures this concept in the class ClusterBucket.

If you determine that clustering will improve your application’s performance, you can use instances of the class ClusterBucket to help. All objects assigned to the same instance of ClusterBucket are to be clustered together. When the objects are written, they are moved to contiguous locations on the same page, if possible. Otherwise the objects are written to contiguous locations on several pages.

Once an object has been clustered into a particular bucket and committed, that bucket remains associated with the object until you specify otherwise. When the object is modified, it continues to cluster with the other objects in the same bucket, although it might move to another page within the same bucket.

Using Existing Cluster Buckets

By default, a global array called AllClusterBuckets defines seven instances of ClusterBucket. Each can be accessed by specifying its offset in the array. For example, the first instance, AllClusterBuckets at: 1, is the default bucket when you log in. It specifies an extentId of nil. This bucket is invariant—you cannot modify it.

The second, third, and seventh cluster buckets in the array also specify an extentId of nil. They can be used for whatever purposes you require and can all be modified.

The GemStone system makes use of the fourth, fifth, and sixth buckets of the array AllClusterBuckets:

AllClusterBuckets at: 4 is the bucket used to cluster the methods associated with kernel classes.
AllClusterBuckets at: 5 is the bucket used to cluster the strings that define source code for kernel classes.
AllClusterBuckets at: 6 is the bucket used to cluster other kernel objects such as globals.

You can determine how many cluster buckets are currently defined by executing:

System maxClusterBucket

A given cluster bucket’s offset in the array specifies its clusterId. A cluster bucket’s clusterId is an integer in the range of 1 to (System maxClusterBucket).

NOTE
For compatibility with previous versions of GemStone, you can use a clusterId as an argument to any keyword that takes an instance of ClusterBucket as an argument.

You can determine which cluster bucket is currently the system default by executing:

System currentClusterBucket

You can access all instances of cluster buckets in your system by executing:

ClusterBucket allInstances

You can change the current default cluster bucket by executing an expression of the form:

System clusterBucket: aClusterBucket

Creating New Cluster Buckets

You are not limited to the predefined instances of ClusterBucket. You can create new instances of ClusterBucket with the simple expression ClusterBucket new.

This expression creates a new instance of ClusterBucket and adds it to the array AllClusterBuckets. You can then access the bucket in one of two ways. You can assign it a name:

UserGlobals at: #empClusterBucket put: (ClusterBucket new)

You could then refer to it in your application as empClusterBucket. Alternatively, you can use the offset into the array AllClusterBuckets. For example, if this is the first cluster bucket you have created, you could refer to it this way:

AllClusterBuckets at: 8

(Recall that the first seven elements of the array are predefined.)

You can determine the clusterId of a cluster bucket by sending it the message clusterId. For example:

empClusterBucket clusterId

You can access an instance of ClusterBucket with a specific clusterId by sending it the message bucketWithId:.

You can create and use as many cluster buckets as you need; up to thousands, if necessary.

NOTE
For best performance and disk space usage, use no more than 32 cluster buckets in a single session.

Cluster Buckets and Concurrency

Cluster buckets are designed to minimize concurrency conflicts. As many users as necessary can cluster objects at the same time, using the same cluster bucket, without experiencing concurrency conflicts. Cluster buckets do not contain or reference the objects clustered on them -- the objects that are clustered keep track of their bucket. This also avoids problems with authorizations.

However, creating a new instance of ClusterBucket automatically adds it to the global array AllClusterBuckets. Adding an instance to AllClusterBuckets causes a concurrency conflict when more than one transaction tries to create new cluster buckets at the same time, since all the transactions are all trying to write the same array object.

To avoid concurrency conflicts, you should design your clustering when you design your application. Create all the instances of ClusterBucket you anticipate needing and commit them in one or few transactions.

To facilitate this kind of design, GemStone allows you to associate descriptions with specific instances of ClusterBucket. In this way, you can communicate to your fellow users the intended use of a given cluster bucket with the message description:. For example:

Example 14.3

UserGlobals at: #empClusterBucket put: (ClusterBucket new).

empClusterBucket description: 'Use this bucket for

	clustering employees and their instance variables.'

As you can see, the message description: takes a string of text as an argument.

Changing the attributes of a cluster bucket, such as its description or clusterId, writes that cluster bucket and thus can cause concurrency conflict. Only change these attributes when necessary.

NOTE
For best performance and disk space usage as well as avoiding concurrency conflicts, create the required instances of ClusterBucket all at once, instead of on a per-transaction basis, and update their attributes infrequently.

Cluster Buckets and Indexing

Indexes on instance of subclasses of UnorderedCollection are created and modified using the cluster bucket associated with the specific collection, if any. To change the clustering of an indexed collection:

1. Remove its index.

2. Recluster the collection.

3. Re-create its index.

Clustering Objects

Class Object defines several clustering methods. One method is simple and fundamental. Another method is more sophisticated and attempts to order the receiver’s instance variables as well as writing the receiver itself.

The Basic Clustering Message

The basic clustering message defined by class Object is cluster. For example:

myObject cluster

This simplest clustering method simply assigns the receiver to the current default cluster bucket; it does not attempt to cluster the receiver’s instance variables. When the object is next written to disk, it will be clustered according to the attributes of the current default cluster bucket.

If you wish to cluster the instance variables of an object, you can define a special method to do so.

CAUTION
Do not redefine the method cluster in the class Object, because other methods rely on the default behavior of the cluster method. You can, however, define a cluster method for classes in your application if required.

Suppose, for example, that you defined class Name and class Employee as shown in Example 14.4.

Example 14.4

Object subclass: 'Name'

	instVarNames: #('first' 'middle' 'last')

	classVars: #( )

	classInstVars: #()

	poolDictionaries: {}

	inDictionary: UserGlobals.

Object subclass: 'Employee'

	instVarNames: #('name' 'job' 'age' 'address')

	classVars: #( )

	classInstVars: #()

	poolDictionaries: {}

	inDictionary: UserGlobals.

The following clustering method might be suitable for class Employee. (A more purely object-oriented approach would embed the information on clustering first, middle, and last names in the cluster method for Name, but such an approach does not exemplify the breadth-first clustering technique we wish to show here.)

Example 14.5

method: Employee

clusterBreadthFirst

	self cluster.

	name cluster.

	job cluster.

	address cluster.

	name first cluster.

	name middle cluster.

	name last cluster.

	^false

| Lurleen |

Lurleen := Employee new name: (Name new first: #Lurleen);

	job: 'busdriver'; age: 24; address: '540 E. Sixth'.

Lurleen clusterBreadthFirst

The elements of byte objects such as instances of String and Float are always clustered automatically. A string’s characters, for example, are always written contiguously within disk pages. Consequently, you need not send cluster to each element of each string stored in job or address; clustering the strings themselves is sufficient. Sending cluster to individual special objects (instances of SmallInteger, Character, Boolean, SmallDouble, or UndefinedObject) has no effect. Hence no clustering message is sent to age in the previous example.

After sending cluster to an Employee, the Employee is clustered as follows:

anEmp aName job address first middle last

cluster returns a Boolean value. You can use that value to eliminate the possibility of infinite recursion when you’re clustering the variables of an object that can contain itself. Here are the rules that cluster follows in deciding what to return:

If the receiver has already been clustered during the current transaction or if the receiver is a special object, cluster declines to cluster the object and returns true to indicate that all of the necessary work has been done.
If the receiver is a byte object that has not been clustered in the current transaction, cluster writes it on a disk page and, as in the previous case, returns true to indicate that the clustering process is finished for that object.
If the receiver is a pointer object that has not been clustered in the current transaction, cluster writes the object and returns false to indicate that the receiver might have instance variables that could benefit from clustering.

Depth-First Clustering

clusterDepthFirst differs from cluster only in one way: it traverses the tree representing its receiver’s instance variables (named, indexed, or unordered) in depth-first order, assigning each node to the current default cluster bucket as it is visited. That is, it writes the receiver’s first instance variable, then the first instance variable of that instance variable, then the first instance variable of that instance variable, and so on to the bottom of the tree. It then backs up and visits the nodes it missed before, repeating the process until the whole tree has been written.

After sending clusterDepthFirst to an Employee, the Employee is clustered as follows:

anEmp aName first middle last job address

Assigning Cluster Buckets

Both cluster and clusterDepthFirst use the current default cluster bucket. If you wish to use a specific cluster bucket instead, you can use the method clusterInBucket:. For example, the following expression clusters aBagOfEmployees using the specific cluster bucket empClusterBucket:

aBagOfEmployees clusterInBucket: empClusterBucket

In order to determine the cluster bucket associated with a given object, you can send it the message clusterBucket. For example, after executing the example above, the following example would return the value shown below:

aBagOfEmployees clusterBucket

empClusterBucket

Clustering and Memory Use

Clustering tags objects in memory so that when the next successful commit occurs, the objects are clustered onto data pages according to the method specified. After an object has been clustered, it is considered to be “dirty”. If you cluster a large number of objects, you may need to increase temporary object memory to avoid running out of session memory. See Managing VM Memory.

Using Several Cluster Buckets

When you want to write a loop that clusters parts of each object in a group into separate pages, it is helpful to have multiple cluster buckets available.

Suppose that you had defined class SetOfEmployees and class Employee as in Example 14.4. Suppose, in addition, that you wanted a clustering method to write all employees contiguously and then write all employee addresses contiguously.

With only one cluster bucket at your disposal, you would need to define your clustering method as shown in Example 14.6. In this approach, each employee is fetched once for clustering, then fetched again in order to cluster the employee’s address.

Example 14.6

method: SetOfEmployees

clusterEmployees

	self do: [:n | n cluster].

	self do: [:n | n address cluster].

myEmployees clusterEmployees

Clustering Class Objects

Clustering provides the most benefit for small groups of objects that are often accessed together — for example, a class with its instance variables. Those instance variables of a class that describe the class’s variables are often accessed in a single operation, as are the instance variables that contain a class’s methods. Therefore, class Behavior defines the following special clustering methods for classes:

Table 14.2 Clustering Protocol
clusterBehavior	Clusters in depth-first order the parts of the receiver required for executing GemStone Smalltalk code (the receiver and its method dictionary).
clusterDescription	Clusters in depth-first order those instance variables in the receiver that describe the structure of the receiver’s instances. (Does not cluster the receiver itself.) The instance variables clustered are instVarNames, classVars, categories, and class histories.
clusterBehaviorExceptMethods: aCollectionOfMethodNames	This method can sometimes provide a better clustering of the receiving class and its method dictionary by omitting those methods that are seldom used. This omission allows frequently used methods to be packed more densely.

The code in Example 14.7 clusters class Employee’s structure-describing variables, then its class methods, and finally its instance methods.

Example 14.7

| behaviorBucket descriptionBucket |

behaviorBucket := AllClusterBuckets at: 4.

descriptionBucket := AllClusterBuckets at: 5.

System clusterBucket: descriptionBucket.

Employee clusterDescription.

System clusterBucket: behaviorBucket.

Employee class clusterBehavior.

Employee clusterBehavior.

The following clusters all of class Employee’s instance methods except for address and address:

Employee clusterBehaviorExceptMethods: #(#address #address:).

Maintaining Clusters

Once you have clustered certain objects, they do not necessarily stay clustered in the same way forever. If you edit some of the objects in the data structure, the edited object will be placed on a new page in the same clusterBucket. The performance benefit of clustering is that the objects are on the same page, but since the clusterBucket will span multiple pages, the objects may be in the same clusterBucket but not on the same page.

You may therefore wish to check an object’s location, especially if you suspect that such declustering is causing your application to run more slowly than it used to.

Determining an Object’s Location

To enable you to check your clustering methods for correctness, Class Object defines the message page, which returns an integer identifying the disk page on which the receiver resides. For example:

anEmp page

Disk page identifiers are returned only for temporary use in examining the results of your custom clustering methods—they are not stable pointers to storage locations. The page on which an object is stored can change for several reasons, as discussed in the next section.

For special objects (instances of SmallInteger, Character, Boolean, SmallDouble, or UndefinedObject), the page number returned is 0.

Why Do Objects Move?

The page on which an object is stored can change for any of the following reasons:

A clustering message is sent to the object or to another object on the same page.
The current transaction is aborted.
The object is modified.
Another object on the page with the object is modified.
The extent in which you requested the object be clustered had insufficient space.

As your application updates clustered objects, new values are placed on secondary storage using GemStone’s normal space allocation algorithms. When objects are moved, they are automatically reclustered within the same clusterId. If a specific clusterId was specified, it continues to be used; if not, the default clusterId is used.

If, for example, you replace the string at position 2 of the clustered array ProscribedWords, the replacement string is stored in a page separate from the one containing the original, although it will still be within the same clusterId. Therefore, it might be worthwhile to recluster often-modified collections occasionally to counter the effects of this fragmentation. You’ll probably need some experience with your application to determine how often the time required for reclustering is justified by the resulting performance enhancement.

14.3 Modifying Cache Sizes for Better Performance

As code executes in GemStone, committed objects must be fetched from disk or from cache, and temporary objects must be managed. This is handled transparently by the GemStone repository monitor. The performance of your application can be affected both by the tuning of the caches, and the structure and usage patterns of your application.

GemStone Caches

GemStone uses four kinds of caches: temporary object space, the Gem private page cache, the Stone private page cache, and the shared page cache.

Two caches are associated with Gem processes: the temporary object space and the Gem private page cache. The other two caches (Stone private page cache and shared page cache) are associated with the Stone (although the Gem also makes use of the shared page cache).

Temporary Object Space

The temporary object space cache is used to store temporary objects created by your application. Each Gem session has a temporary object memory that is private to the Gem process and its corresponding session. When you fault persistent (committed) objects into your application, they are copied to temporary object memory.

Some of the temporary objects in the cache may ultimately become permanent and reside on the disk, but probably not all of them. Temporary objects that your application creates merely in order to do its work reside in temporary object space until they are no longer needed, when the Gem’s garbage collector reclaims the storage they use.

It is important to provide sufficient temporary object space. At the same time, you must design your application so that it does not create an infinite amount of reachable temporary objects. Temporary object memory must be large enough to accommodate the sum of live temporary objects and modified persistent objects. It that sum exceeds the allocated temporary object memory, the Gem can encounter an OutOfMemory condition and terminate.

The amount of memory allocated for temporary object space is primarily determined by the GEM_TEMPOBJ_CACHE_SIZE configuration option. You should increase this value for applications that create a large number of temporary objects — for example, applications that make heavy use of the reduced conflict classes or sessions performing a bulk load.

You will probably need to experiment somewhat before you determine the optimum size of the temporary object space for the application. The default of 10000 (10 MB) should be adequate for normal user sessions. For sessions that place a high demand on the temporary object cache, such as upgrade, you may wish to use 100000 (i.e., 100 MB).

For a more exhaustive discussion of the issues involved in managing the size of temporary object memory, and a general discussion of garbage collection, see the “Garbage Collection” chapter of the System Administration Guide.

For details about how to set the size of GEM_TEMPOBJ_CACHE_SIZE in the Gem configuration file, see the “GemStone Configuration Options” appendix of the System Administration Guide.

Gem Private Page Cache

The Gem private page cache is only used to hold bitmap pages and shadow object table pages during commit processing. When you commit objects created by your application, they move directly from temporary object memory to the shared page cache.

The amount of memory allocated for the Gem private page cache is determined by the GEM_PRIVATE_PAGE_CACHE_KB configuration option. The default size is 1000 KB; the minimum is 128 KB; the maximum is 524288 KB.

NOTE
Under normal circumstances, you should not need to modify the default values of the Gem private page cache.

Stone Private Page Cache

The Stone private page cache is used to maintain lists of allocated object identifiers and pages for each active Gem process that the Stone is monitoring. The single active Stone process per repository has one Stone private page cache.

The amount of memory allocated for the Stone private page cache is determined by the STN_PRIVATE_PAGE_CACHE_KB configuration option. The default size is 2000 KB; the minimum is 128 KB; the maximum is 524288 KB.

NOTE
Under normal circumstances, you should not need to modify the default values of the Stone private page cache.

Shared Page Cache

The shared page cache is used to hold the object table—a structure containing pointers to all the objects in the repository—and copies of the disk pages that hold the objects with which users are presently working. The system administrator must enable the shared page cache in the configuration file for a host. The single active Stone process per repository has one shared page cache per host machine. The shared page cache is automatically enabled for the host machine on which the Stone process is running.

Whenever the Gem needs to read an object, it reads into the shared page cache the entire page on which an object resides. If the Gem then needs to access another object, GemStone first checks to see if the object is already in the shared page cache. If it is, no further disk access is necessary. If it is not, it reads another page into the shared page cache.

For acceptable performance, the shared page cache should be large enough to hold the entire object table. To get the best possible performance, make the shared page cache as large as possible.

The amount of memory allocated for the shared page cache is determined by the SHR_PAGE_CACHE_SIZE_KB configuration parameter (in the Stone configuration file). The default size is 75000 KB; the minimum is 512 KB; the maximum is limited by the available system memory and the kernel configuration.

For details about how to set the size of SHR_PAGE_CACHE_SIZE_KB in the Stone configuration file, see the System Administration Guide (Appendix A, GemStone Configuration Options).

By default, only the system administrator is privileged to set this parameter, which is set at repository startup. However, if a Gem session is running remotely and it is the first Gem session on its host, its configuration file sets the size of the shared page cache on that host.

Getting Rid of Non-Persistent Objects

As discussed in Chapter 4, you can create instances of KeySoftValueDictionary to enable your session to free up temporary object memory as needed. The entries in a KeySoftValueDictionary are non-persistent; that is, they cannot be committed to the database. When there is a demand on memory, you can configure GemStone to clear non-persistent entries as needed during a VM mark/sweep garbage collection.

The action taken during mark/sweep depends on two configuration parameters, along with startingMemUsed — the percentage of temporary object memory in-use at the beginning of the VM mark/sweep.

Case 1: GEM_SOFTREF_CLEANUP_PERCENT_MEM < startingMemUsed < 80%

If startingMemUsed is greater than GEM_SOFTREF_CLEANUP_PERCENT_MEM but less than 80%, the VM mark/sweep will attempt to clear an internally determined number of least recently used SoftReferences (non-persistent entries). Under rare circumstances, you might choose to specify a minimum number (GEM_KEEP_MIN_SOFTREFS) that will not be cleared.

Case 2: startingMemUsed < GEM_SOFTREF_CLEANUP_PERCENT_MEM

No SoftReferences will be cleared.

Case 3: startingMemUsed > 80%

VM mark/sweep will attempt to clear all SoftReferences.

For more about these and other configuration parameters, see the “GemStone Configuration Options” appendix of the System Administration Guide.

Several cache statistics may also be of interest: NumSoftRefsCleared, NumLiveSoftRefs, and NumNonNilSoftRefs. For more about these statistics, see the “Monitoring GemStone” chapter of the System Administration Guide.

14.4 Managing VM Memory

As mentioned earlier in this chapter, each Gem session has a temporary object memory that is private to the Gem process and its corresponding session. When you fault persistent (committed) objects into your application, they are copied to temporary object memory.

It is important to provide sufficient temporary object space. At the same time, you must design your application so that it does not create an infinite amount of reachable temporary objects. Temporary object memory must be large enough to accommodate the sum of live temporary objects and modified persistent objects. If that sum exceeds the allocated temporary object memory, the Gem can encounter an OutOfMemory condition and terminate.

There is a limit on how large a transaction can be, either in terms of the total size of previously committed objects that are modified, or of the total size of temporary objects that are transitively reachable from modified committed objects. For large applications, you may need to commit incrementally, rather than waiting to commit all at once.

The remainder of this chapter discusses issues to consider when allocating and managing temporary object memory, and presents techniques for diagnosing and addressing OutOfMemory conditions. This section assumes you have read the general discussion of memory organization in the “Managing Memory” chapter of the System Administration Guide.

Large Working Set

If your application requires a large working set of committed objects in memory, you can configure the pom area to be large (compared to other object spaces) without having an adverse effect on in-memory garbage collection. To do this, increase the setting for the configuration parameter GEM_TEMPOBJ_POMGEN_SIZE. For details on how to do this, see the System Administration Guide, Appendix A.

Class Hierarchy

If your application references a very deep class hierarchy, you may need to adjust the memory configuration accordingly to allow a larger temporary object memory. When an object is in memory, its class is also faulted into the perm area of temporary object memory, along with the class’s superclass, extending up through the hierarchy all the way to Object. While this approach provides for significantly faster message lookups, it also increases the consumption of temporary object memory.

For example, the default configuration provides 1 MB for the perm area. Each class consumes about 400 bytes (including the metaclass). Thus, the default configuration can accommodate about 2500 classes in memory at once.

UserAction Considerations

NOTE
Do not compact the code region of temporary object memory while a UserAction is executing.

When using GemBuilder for C, you may encounter an OutOfMemory error within an UserAction in either of the following situations:

The UserAction faults in a large number of methods via GciPerform.
The UserAction compiles a large number of anonymous methods via GciExecute.

Exported Set

The Export Set is a collection of objects for which the Gem process has handed out its OOP to one of the interfaces (GCI, GBS, objects returned from topaz run commands). Objects in the export set are prevented from being garbage collected by any of the garbage collection processes (that is, by a Gem’s in-memory collection of temporary objects, markForCollection, or the epoch garbage collection). The export set is used to guarantee referential integrity for objects only referenced by an application, that is, objects that have no references to them within the Gem.

The application program is responsible for timely removal of objects from the export set. The contents of the export set can be examined using hidden set methods defined in class System.

In general, the smaller the size of the export set, the better the performance is likely to be. There are several reasons for this relationship. The export set is one of the root sets used for garbage collection. The larger the export set, the more likely it is that objects that would otherwise be considered garbage are being retained. One threshold for performance is when the size of the export set exceeds 16K objects. When its size is smaller than 16K objects, the export set is a small object in object memory. When its size is larger than 16K, the export set becomes a large object, implemented as a tree of small objects in memory.

The configuration parameter #GemDropCommittedExportedObjs will allow committed object to be removed from the export set when memory is low, at the expense of having to re-fault these object when they are needed.

Debugging out of memory errors

If you find that your application is running out of temporary memory, you can set several GemStone environment variables to help you identify which parts of your application are triggering OutOfMemory conditions. These environment variables allow you to obtain multiple Smalltalk stack printouts and other useful information before your application runs out of temporary object memory. For example, it displays now many objects of each class are in temporary memory.

Details on these environment variables are provided in the System Administration Guide, and they are listed in the $GEMSTONE/sys/gemnetdebug file, which is a debug version of the gemnetobject script. gemnetdebug enables some, but not all, available memory related environment variables. By using gemnetdebug instead of gemnetobject in your RPC login parameters, you can generate memory logging information. For help with analysis, contact GemTalk Technical Support.

Once you’ve identified the cause/s of the problem, you can modify your application to reduce the demand on memory, or adjust your GemStone configuration options to provide a larger amount of memory.

Signal on low memory condition

When a session runs low on temporary object memory, there are actions it can take to avoid running out of memory altogether; for example, the session may commit or abort, or discard temporary objects. By enabling handling for the notification AlmostOutOfMemory, an application can take appropriate action before memory is entirely full. This notification is asynchronous, so may be received at any time memory use is greater than the threshold the end of an in-memory markSweep. However, if the session is executing a user action, or is in index maintenance, the error is deferred and generated when execution returns.

After an AlmostOutOfMemory notification is delivered, the handling is automatically disabled. Handling must be reenabled each time the signal occurs. Handling this signal is enabled by executing either of the following:

System enableAlmostOutOfMemoryError

System signalAlmostOutOfMemoryThreshold: 0

When handling is enabled, the default threshold is 85%. You can find out the current threshold using:

System almostOutOfMemoryErrorThreshold

This will return -1 if handling is not enabled.

The threshold can be modified using:

System Class >> signalAlmostOutOfMemoryThreshold: anInteger
Controls the generation of an error when session's temporary object memory is almost full. Calling this method with 0 < anInteger < 100, sets the threshold to the given value and enables generation of the error.

Calling this method with an argument of -1 disables generation of the error and resets the threshold to the default.

Calling this method with an argument of 0 enables the generation of the error and does not change the threshold.

Methods for Computing Temporary Object Space

To find out how much space is left in the old area of temporary memory, the following methods in class System (category Performance Monitoring) are provided:

System _tempObjSpaceUsed
Returns the approximate number of bytes of temporary object memory being used to store objects.

System _tempObjSpaceMax
Returns the size of the old area of temporary object memory; that is, the approximate maximum number of bytes of temporary object memory that are usable for storing objects. When the old area fills up, the Gem process may terminate with an OutOfMemory error.

System _tempObjSpacePercentUsed
Returns the approximate percentage of temporary object memory that is being used to store temporary objects. This is equivalent to the expression:

(System _tempObjSpaceUsed * 100) // System _tempObjSpaceMax.

Note that it is possible for the result to be slightly greater than 100%. Such a result indicates that temporary memory is almost completely full.

To measure the size of complex objects, you might create a known object graph containing typical instances of the classes in question, and then execute the following methods at various points in your test code to get memory usage information:

CAUTION
Do not execute this sequence in your production code!

Example 14.8

System _vmMarkSweep.

System _tempObjSpaceUsed.

Statistics for monitoring memory use

You can monitor the following statistics to better understand your application’s memory usage. The statistics are grouped here with related statistics, rather than alphabetically.

Table 14.3 Statistics Related to the Objects Copied into Memory
ObjectsRead	The number of committed objects copied into VM memory since the start of the session.
ClassesRead	The number of classes copied into the perm generation area of VM memory since the start of the session.
MethodsRead	The number of GsNMethods copied into the code generation area of VM memory since the start of the session.
ObjectsRefreshed	The number of committed objects in VM memory that have been re-read from the shared page cache after transaction boundaries, since the start of the session.

Table 14.4 Statistics Related to Mark/Sweeps and Scavenges
NumberOfMarkSweeps	The number of mark/sweeps executed by the in-memory garbage collector.
NumberOfScavenges	The number of scavenges executed by the in-memory garbage collector. Only updated at mark/sweeps.
TimeInMarkSweep	The real time (in milliseconds) spent in in-memory garbage collector mark/sweeps.
TimeInScavenge	The real time (in milliseconds) spent in in-memory garbage collector scavenges. Only updated at mark/sweeps.

Table 14.5 Statistics Related to Object Memory Regions
CodeCacheSizeBytes	Total size in bytes of copies of GsNMethods that are in the code generation area and ready for execution, as of the end of mark/sweep.
NewGenSizeBytes	The number of used bytes in the new generation at the end of mark/sweep.
OldGenSizeBytes	The number of used bytes in the old generation at the end of mark/sweep.
PomGenSizeBytes	The number of used bytes in the pom generation area at the end of mark/sweep. Pom generation holds clean copies of committed objects.
PermGenSizeBytes	The number of used bytes in the perm generation area at the end of mark/sweep. Perm generation holds copies of Classes.
MeSpaceUsedBytes	The number of bytes occupied by the remembered set (remSet), in-memory oopMap, and in-use map entries.
MeSpaceAllocatedBytes	The number of bytes allocated for the remembered set (remSet), in-memory oopMap, and map entries.

Table 14.6 Statistics Related to Stubbing
NumRefsStubbedMarkSweep	The number of in-memory references that were stubbed (converted to a POM objectId) by in-memory mark/sweep.
NumRefsStubbedScavenge	The number of in-memory references that were stubbed (converted to a POM objectId) by in-memory scavenge.

Table 14.7 Statistics Related to Garbage Collection
CodeGenGcCount	The number of times the code generation area has been garbage collected.
PomGenScavCount	The number of times scavenge has thrown away the oldest pom generation space.

Symbol Creation

When a new symbol is needed (which may just be from evaluating a code snippet that includes a symbol), it is created by the SymbolGem. The SymbolGem process runs in the background and is responsible for creating all new Symbols, based on session requests that are managed by the Stone. You can examine the following statistics to track the effect of symbol creation activity on temporary object memory.

Table 14.8 Statistics Related to Symbol Creation
NewSymbolRequests	The number of symbol creation requests by a session to the symbol creation gem.
NewSymbolsCount	The number of symbol creation requests by a session that did not resolve to an already committed symbol.
TimeWaitingForSymbols	Cumulative elapsed time (in milliseconds) waiting for symbol creation requests to be processed.

Table 14.9 Other Statistics
ExportedSetSize	The number of objects in the ExportSet (Exported Set).
TrackedSetSize	The number of objects in the Tracked Objects Set, as defined by the GCI. You can use GciReleaseObjs to remove objects from the Tracked Objects Set. For details, see the GemStone/S 64 Bit GemBuilder for C manual.
DirtyListSize	The number of modified committed objects in the temporary object memory dirty list.
WorkingSetSize	The number of objects in memory that have an objectId assigned to them; approximately the number of committed objects that have been faulted in plus the number that have been created and committed.
TempObjSpacePercentUsed	The approximate percentage of temporary object memory for this session that is being used to store temporary objects. If this value approaches or exceeds 100%, sessions will probably encounter an OutOfMemory error. This statistic is only updated at the end of a mark/sweep operation. Compare with System _tempObjSpacePercentUsed, which is computed whenever the primitive is executed.

14.5 NotTranloggedGlobals

All changes to the repository are written to the transaction logs when the transaction is committed, to ensure these changes are recoverable in case of unexpected shutdown, and to allow these changes to be applied to warm standby copies of the repository. However, you may have data that you will be committing changes to, but that does not need to be recovered in case of system crash or corruption. For this kind of data, you can avoid the overhead of writing each change to the transaction logs, and the disk space required for the transaction logs to archive large amounts of non-critical data.

For objects that are intended to be persistent, but not log changes in the transaction logs, there must be no reference from persistent objects, and the reference should be from the variable NotTranloggedGlobals. This is in the Globals SymbolDictionary.

For example:

NotTranloggedGlobals at: #perfLog put: PerformanceLogger new.

If the object in NotTranlogGlobals is reachable from AllUsers (the regular root for all persistent objects), it will generate an error on commit.

On system crash or unexpected shutdown, the state of the objects reachable from NotTranloggedGlobals will be as was recorded in the most recent checkpoint prior to the shutdown; changes made after that checkpoint will be lost. If the repository is restored from backup, and transaction logs applied, the state of these objects will be as of the time the backup was taken; all changes made since the backup was taken are lost.

14.6 Other Optimization Hints

While optimization is an application-specific problem, we can provide a few ideas for improving application performance:

Arrays tend to be faster than sets. If you do not need the particular semantics that a set affords, use an array instead.
The following Number classes are listed in decreasing order of performance:

SmallInteger
SmallDouble
Float
LargeInteger
ScaledDecimal
DecimalFloat

Avoid coercing integers to floating point numbers. Although GemStone Smalltalk can easily handle mixing integers and floating point numbers in computations, the coercion required can be time-consuming.
If you create an instance of a Dictionary class (or subclass) that you intend to load with values later, create it to be approximately the final required size in order to avoid rehashing, which can significantly slow performance.
Prefer methods that invoke primitives, if possible, or methods that cause primitives to be invoked after fewer intermediate message-sends. (For information on writing your own primitive methods, see the GemBuilder for C manual.)
Prefer message-sends over path notation, where possible. (This is not possible in indexed queries, however.)
Prefer simpler blocks to more complex blocks. The most efficient blocks refer only to one or more literals, global variables, pool variables, class variables, local block arguments, or block temporaries; they also do not include a return statement.

Less efficient blocks include a return statement and can also refer to one or more of the pseudovariables super or self, instance variables of self, arguments to the enclosing method, temporary variables of the enclosing method, block arguments, or block temporaries of an enclosing block.

The least efficient blocks enclose a less efficient block of the kind described in the above paragraph.

Blocks provided as arguments to the methods ifTrue:, ifFalse:, ifTrue:ifFalse:, ifFalse:ifTrue:, whileFalse:, and whileTrue: are specially optimized. Unless they contain block temporary variables, you need not count them when counting levels of block nesting.

Use optimized selectors whenever possible. For example, iterations using to:do are specially optimized; using to:do: instead of another collection iteration method avoids a message send and a level of block nesting, possibly avoiding the cost of using a block altogether. A list of optimized selectors is under Reserved and Optimized Selectors.

In the same way, for fastest performance in iterating over Collections, use the to:do: or to:by:do: methods to iterate, rather than do: or other collection iteration methods

Append to rather than concatenate strings. String >> , creates a new string that combines the receiver and argument, while String >> add: modifies the receiver. This is much more efficient in memory use, although otherwise performance is similar.
If you have a choice between a method that modifies an object and one that returns a modified copy, use the method that modifies the object directly if your application allows it. This creates fewer temporary objects whose storage will have to be reclaimed.
Avoid generating temporary objects whose storage will need to be reclaimed. Storage reclamation can slow your application significantly.
Keep repository files on a disk reserved for their use, if possible. Particularly avoid putting repository files on the disk used for swapping.
For large applications, you may need to commit incrementally, rather than waiting to commit all at once. There is a limit on how large a transaction can be, either in terms of the total size of previously committed objects that are modified, or of the total size of temporary objects that are transitively reachable from modified committed objects.
Consider trade-offs in indexing. While indexes can improve query performance on large collections, there is overhead. If the collection has fewer than about 2000 objects, the extra overhead in internal objects and index maintenance may not be worth negligible performance gain in queries.

14. Performance and Optimization

14.1 Profiling Smalltalk Execution

Time to execute a block

CPU Time

Elapsed Time

ProfMonitor

Sample intervals

Table 14.1 Subsecond time conversions

Reporting limits

Reports

Temporary results file

Real vs. CPU time

Profiling Code

Convenience Profiling of a Block of Code

Background Profiling

Manual Profiling

Saving a ProfMonitor for later analysis

The Profile Report

Example 14.1

Profiling Beyond Performance

Object Creation Tracking

Example 14.2 Object creation report

Memory Use Profiling

14.2 Clustering Objects for Faster Retrieval

Will Clustering Solve the Problem?

Cluster Buckets

Using Existing Cluster Buckets

Creating New Cluster Buckets

Cluster Buckets and Concurrency

Example 14.3

Cluster Buckets and Indexing

Clustering Objects

The Basic Clustering Message

Example 14.4

Example 14.5

Depth-First Clustering

Assigning Cluster Buckets

Clustering and Memory Use

Using Several Cluster Buckets

Example 14.6

Clustering Class Objects

Table 14.2 Clustering Protocol

Example 14.7

Maintaining Clusters

Determining an Object’s Location

Why Do Objects Move?

14.3 Modifying Cache Sizes for Better Performance

GemStone Caches

Temporary Object Space

Gem Private Page Cache

Stone Private Page Cache

Shared Page Cache

Getting Rid of Non-Persistent Objects

14.4 Managing VM Memory

Large Working Set

Class Hierarchy

UserAction Considerations

Exported Set

Debugging out of memory errors

Signal on low memory condition

Methods for Computing Temporary Object Space

Example 14.8

Statistics for monitoring memory use

Table 14.3 Statistics Related to the Objects Copied into Memory

Table 14.4 Statistics Related to Mark/Sweeps and Scavenges

Table 14.5 Statistics Related to Object Memory Regions

Table 14.6 Statistics Related to Stubbing

Table 14.7 Statistics Related to Garbage Collection

Symbol Creation

Table 14.8 Statistics Related to Symbol Creation

Table 14.9 Other Statistics

14.5 NotTranloggedGlobals

14.6 Other Optimization Hints