GemStone Smalltalk includes several tools to help you tune your applications for faster performance.
Profiling Smalltalk Execution
Profiling tools that allow you to pinpoint the problem areas in your application code.
Clustering Objects for Faster Retrieval
How to cluster objects that are often accessed together so that many of them can be found in the same disk access.
Modifying Cache Sizes for Better Performance
How to increase or decrease the size of various caches in order to minimize disk access and storage reclamation.
Managing VM Memory
Issues to consider when managing temporary object memory, and presents techniques for diagnosing and addressing OutOfMemory conditions.
NotTranloggedGlobals
Optimize certain operations by avoiding writing tranlog entries.
Other Optimization Hints
Allow operations on large collections without using temporary object memory.
Many things impact performance, and cache size and disk access often have the largest impact on application performance. However, your GemStone Smalltalk code can also affect the speed of your application. There are a number of tools to help you identify issues and optimize your code.
If you simply want to know how long it takes a given block to return its value, you can use GemStone Smalltalk methods that execute a block and return a number.
The familiar method System class >> millisecondsToRun: takes a zero-argument block as its argument and returns the time in milliseconds required to evaluate the block.
topaz 1> printit
System millisecondsToRun: [
System performOnServer: 'ping -c1 gemtalksystems.com']
%
0
For microseconds resolution use the parallel microsecondsToRun:
topaz 1> printit
System microsecondsToRun: [
System performOnServer: 'ping -c1 gemtalksystems.com']
%
484
Time class >> millisecondsElapsedTime: works similarly, but returns the elapsed rather than the CPU time required.
topaz 1> printit
Time millisecondsElapsedTime: [
System performOnServer: 'ping -c1 gemtalksystems.com']
%
20
To get further resolution, use Time class >> secondsElapsedTime:, which returns a float with system-dependent resolution. For example, to get a result in microseconds:
topaz 1> printit
((Time secondsElapsedTime: [
System performOnServer: 'ping -c1 gemtalksystems.com']) * 1000000) asInteger
%
19961
The ProfMonitor class allows you to sample the methods that are executed in a given block of code and analyze the percentage of total execution time represented by each method. When an instance starts profiling, it will take a method call stack at specified intervals for a specified period of time. When it is done, it collects the results and returns them in the form of a string formatted as a report.
ProfMonitorTree is a subclass of ProfMonitor, that by default returns an execution tree report or reports, in addition to the reports generated by ProfMonitor. By specifying the desired reports in arguments to ProfMonitor, these tree reports can be also generated from ProfMonitor.
ProfMonitor, by default, will take a sample every millisecond (1 ms). You can specify the interval at which ProfMonitor takes samples using the instance methods interval: or intervalNs:, or class methods with these keywords. interval: specifies milliseconds, while intervalNs: specifies the interval in nanoseconds (a nanosecond is a billionth of a second). The minimum interval is 1000 nanoseconds.
It may be convenient to refer to Table 16.1 when determining the sample interval and reading the results:
By default, ProfMonitor reports every method it found executing. It is usually useful to limit the reporting of methods to the ones that appear more frequently, to reduce clutter in the results and allow you to focus on what is taking the most time.
To limit the reporting results, set the lower limit using the instance method reportDownTo: limit or methods with the keyword downTo:. Each result at the limit or larger is included in the report.
These methods accept a limit of either an integer, which is an absolute number of samples, or a SmallDouble, which defines a percentage of the total number of samples.
For example, a downTo: of 50 would specify that the reports include information for every method that was sampled at least 50 times, regardless of whether the number of samples was 100 or 1000. A downTo: of 0.50 would specify that the reports include information for methods that were sampled 50% of the time or more; if the total number of samples is 100, this would be 50 actual samples, for a sample set size of 1000, this would be 500 samples.
ProfMonitor provides profiling results in the form of a string, containing up to six individual reports that analyze the profiling raw data in different ways. The desired reports to be output can be specified using methods with the reports: keyword. By specifying reports, you can also enable object creation tracking.
#samples—sample counts report, labeled STATISTICAL SAMPLING RESULTS.
#stackSamples—stack sampling report, labeled STATISTICAL STACK SAMPLING RESULTS.
#senders—method senders report, labeled STATISTICAL METHOD SENDERS RESULTS.
#objCreation—object creation report, labeled OBJECT CREATION REPORT. Including this in the reports: argument enables object tracking.
#tree—method execution tree report, labeled STACK SAMPLING TREE RESULTS. Including this in the reports: argument causes ProfMonitorTree to be used for profiling.
#objCreationTree—object creation tree report, labeled OBJECT CREATION TREE REPORT. Including this in the reports: argument enables object tracking and causes ProfMonitorTree to be used for profiling.
The default reports that are provided depend on the initial class specified;
ProfMonitor stores its results temporarily in a file with the default filename /tmp/gempid.tmp. You can specify a different filename by using ProfMonitor’s instance creation method newWithFile: and variants. This file is deleted by profiling block methods, profileOff, and reportAfterRun* methods. Note that if the Gem that is executing profiling terminates abnormally, it may leave this file behind; such files must be manually deleted.
Profiling operates by taking samples of the stack at intervals specified by the interval: or intervalNs: arguments. Generally, this specifies that samples are taken at the given intervals in CPU time, which provides information about the relative performance of operations based on how much CPU time they use.
It is also possible to profile based on real time, in which case samples are taken after the specified interval of real time has elapsed. This can detect performance issues that are not based on CPU execution, such as a sleep:, and expose the performance impact of disk access and other performance issues external to the code executing.
Sampling time is one of the options that is defined by the setOptions: keyword, either the convenience profiling methods or the instance method. You may include #real or #cpu in this array.
ProfMonitor provides several methods that allow you to profile a block of code and report the results with a single class method.
The following profiling methods are available:
monitorBlock:
monitorBlock:reports:
monitorBlock:intervalNs:
monitorBlock:intervalNs:options:
monitorBlock:downTo:
monitorBlock:downTo:interval:
monitorBlock:downTo:intervalNs:
monitorBlock:downTo:intervalNs:options:
monitorBlock:downTo:intervalNs:reports:
For example, to take samples every millisecond, and only report methods that were sampled at least 10 times:
ProfMonitor
monitorBlock: [ 100 timesRepeat:
[ System myUserProfile dictionaryNames ]]
downTo: 10
interval: 1
For a more detailed report, you could take samples every 1/10 of a millisecond; this interval is 100000 nanoseconds. This creates many more samples; to make it easier to control the reporting limit we’ll use a percent, and only include methods whose number of samples was 20% or more of the total.
ProfMonitor
monitorBlock: [ 100 timesRepeat:
[ System myUserProfile dictionaryNames ]]
downTo: 0.2
intervalNs: 100000
These two reports will give you similar results, but since there are many more samples, the effect of chance sampling error will be less. The choice of sampling interval and report limit depends on the specific code you are profiling. You may need to run a number of iterations, starting with a more coarse-grained profile and refining for subsequent runs.
To sample blocks of code, the quick profiling methods are sufficient. You can also explicitly start and stop profiling, allowing you to profile any arbitrary sequence of GemStone Smalltalk statements.
To start and stop profiling, use the class method profileOn, which create an instances of ProfMonitor and starts profiling; when you are done, the instance method profileOff stops profiling and reports the results.
run
UserGlobals at: #myMonitor put: ProfMonitor profileOn.
%
run
100 timesRepeat: [ System myUserProfile dictionaryNames ].
%
run
(UserGlobals at: #myMonitor) profileOff.
%
You can also create and configure the instance of ProfMonitor. To profile in this way, perform the following steps:
Step 1. Create instance using ProfMonitor new, newWithFile:, newWithFile:interval:, or newWithFile:intervalNs:.
Step 2. Configure it as desired, using instance methods including interval:, intervalNs:, setOptions:, and traceObjectCreation:.
Step 3. start profiling using the instance method startMonitoring.
Step 5. stop profiling using the instance method stopMonitoring.
Steps 3, 4 and 5 can also be done using runBlock:.
Step 6. gather results and report, using reportAfterRun or reportAfterRunDownTo:.
| aMonitor |
aMonitor := ProfMonitor newWithFile:
'$GEMSTONE/data/profMon.dat'.
aMonitor interval: 2.
aMonitor setOptions: {#objCreation}.
aMonitor startMonitoring.
100 timesRepeat: [ System myUserProfile dictionaryNames ].
aMonitor stopMonitoring.
aMonitor reportAfterRun.
ProfMonitor raw data is written to a disk file, and as long as the disk file is available, you may save the instance of ProfMonitor and it will reopen its file to perform the analysis later or in a different session.
To ensure that the file is saved, use methods such as runBlock:, which do not automatically create the report and delete the file.
run
UserGlobals at: #aProfMon put:
(ProfMonitor runBlock: [
200 timesRepeat: [System myUserProfile dictionaryNames]
]).
%
commit
logout
login
run
aProfMon reportAfterRun
%
The profiling methods discussed in the previous sections return a string formatted as a report. The following example shows a sample run and the resulting report.
topaz 1> printit
ProfMonitor
monitorBlock:[
200 timesRepeat:[ System myUserProfile dictionaryNames] ]
reports: { #samples . #stackSamples . #senders . #tree}
%
================
STATISTICAL SAMPLING RESULTS
elapsed CPU time: 90 ms
monitoring interval: 1.0 ms
report limit threshold: 2 hits / 2.2%
0 pageFaults 2061 objFaults 0 gcMs 824413 edenBytesUsed
tally % class and method name
------ ----- --------------------------------------
23 24.21 Array >> _at:
22 23.16 IdentityDictionary >> associationsDo:
18 18.95 block in SymbolList >> names
18 18.95 AbstractDictionary >> _at:
11 11.58 block in AbstractDictionary >> associationsDetect:ifNone:
2 2.11 Object >> _basicSize
1 1.05 11 other methods
95 100.00 Total
================
STATISTICAL STACK SAMPLING RESULTS
elapsed CPU time: 90 ms
monitoring interval: 1.0 ms
report limit threshold: 2 hits / 2.2%
0 pageFaults 2061 objFaults 0 gcMs 824413 edenBytesUsed
total % class and method name
------ ----- --------------------------------------
95 100.00 GsNMethod class >> _gsReturnToC
95 100.00 executed code
95 100.00 ProfMonitor class >> monitorBlock:downTo:
95 100.00 ProfMonitor >> monitorBlock:
94 98.95 block in executed code
94 98.95 UserProfile >> dictionaryNames
94 98.95 SymbolList >> namesReport
94 98.95 SymbolList >> names
94 98.95 AbstractDictionary >> associationsDetect:ifNone:
94 98.95 IdentityDictionary >> associationsDo:
29 30.53 block in AbstractDictionary >> associationsDetect:ifNone:
23 24.21 Array >> _at:
18 18.95 block in SymbolList >> names
18 18.95 AbstractDictionary >> _at:
2 2.11 Object >> _basicSize
1 1.05 2 other methods
95 100.00 Total
================
STATISTICAL METHOD SENDERS RESULTS
elapsed CPU time: 90 ms
monitoring interval: 1.0 ms
report limit threshold: 2 hits / 2.2%
% % Parent
self total total local Method
Time Time ms % Child
------ ------ ------ ----- -----------
= 0.0 100.0 90.0 0.0 GsNMethod class >> _gsReturnToC
90.0 100.0 executed code
-----------------------------------------------------
90.0 100.0 GsNMethod class >> _gsReturnToC
= 0.0 100.0 90.0 0.0 executed code
90.0 100.0 ProfMonitor class >> monitorBlock:downTo:
-----------------------------------------------------
90.0 100.0 executed code
= 0.0 100.0 90.0 0.0 ProfMonitor class >> monitorBlock:downTo:
90.0 100.0 ProfMonitor >> monitorBlock:
-----------------------------------------------------
90.0 100.0 ProfMonitor class >> monitorBlock:downTo:
= 0.0 100.0 90.0 0.0 ProfMonitor >> monitorBlock:
89.1 98.9 block in executed code
0.9 1.1 ProfMonitor >> startMonitoring
-----------------------------------------------------
89.1 100.0 ProfMonitor >> monitorBlock:
= 0.0 98.9 89.1 0.0 block in executed code
89.1 100.0 UserProfile >> dictionaryNames
-----------------------------------------------------
89.1 100.0 block in executed code
= 0.0 98.9 89.1 0.0 UserProfile >> dictionaryNames
89.1 100.0 SymbolList >> namesReport
-----------------------------------------------------
89.1 100.0 UserProfile >> dictionaryNames
= 0.0 98.9 89.1 0.0 SymbolList >> namesReport
89.1 100.0 SymbolList >> names
-----------------------------------------------------
89.1 100.0 SymbolList >> namesReport
= 0.0 98.9 89.1 0.0 SymbolList >> names
89.1 100.0 AbstractDictionary >> associationsDetect:ifNone:
-----------------------------------------------------
89.1 100.0 SymbolList >> names
= 0.0 98.9 89.1 0.0 AbstractDictionary >> associationsDetect:ifNone:
89.1 100.0 IdentityDictionary >> associationsDo:
-----------------------------------------------------
89.1 100.0 AbstractDictionary >> associationsDetect:ifNone:
= 23.2 98.9 89.1 23.4 IdentityDictionary >> associationsDo:
1.9 2.1 Object >> _basicSize
27.5 30.9 block in AbstractDictionary >> associationsDetect:ifNone:
21.8 24.5 Array >> _at:
17.1 19.1 AbstractDictionary >> _at:
-----------------------------------------------------
27.5 100.0 IdentityDictionary >> associationsDo:
= 11.6 30.5 27.5 37.9 block in AbstractDictionary >> associationsDetect:ifNone:
17.1 62.1 block in SymbolList >> names
-----------------------------------------------------
21.8 100.0 IdentityDictionary >> associationsDo:
= 24.2 24.2 21.8 100.0 Array >> _at:
-----------------------------------------------------
17.1 100.0 block in AbstractDictionary >> associationsDetect:ifNone:
= 18.9 18.9 17.1 100.0 block in SymbolList >> names
-----------------------------------------------------
17.1 100.0 IdentityDictionary >> associationsDo:
= 18.9 18.9 17.1 100.0 AbstractDictionary >> _at:
-----------------------------------------------------
1.9 100.0 IdentityDictionary >> associationsDo:
= 2.1 2.1 1.9 100.0 Object >> _basicSize
-----------------------------------------------------
================
STACK SAMPLING TREE RESULTS
elapsed CPU time: 90 ms
monitoring interval: 1.0 ms
report limit threshold: 2 hits / 2.2%
100.0% (95) executed code [UndefinedObject]
100.0% (95) ProfMonitor class >> monitorBlock:downTo: [ProfMonitor class]
100.0% (95) ProfMonitor >> monitorBlock: [ProfMonitor]
98.9% (94) block in executed code [ExecBlock0]
| 98.9% (94) UserProfile >> dictionaryNames
| 98.9% (94) SymbolList >> namesReport
| 98.9% (94) SymbolList >> names
| 98.9% (94) AbstractDictionary >> associationsDetect:ifNone: [SymbolDictionary]
| 98.9% (94) IdentityDictionary >> associationsDo: [SymbolDictionary]
| 30.5% (29) block in AbstractDictionary >> associationsDetect:ifNone: [ExecBlock1]
| | 18.9% (18) block in SymbolList >> names [ExecBlock1]
| 24.2% (23) Array >> _at: [IdentityCollisionBucket]
| 18.9% (18) AbstractDictionary >> _at: [SymbolDictionary]
| 2.1% (2) Object >> _basicSize [IdentityCollisionBucket]
As you can see, the report is in four sections, corresponding to the requested reports:
Each section includes the same set of methods that the profile monitor encountered when it checked the execution stack every millisecond; the report is presented to give different views of this data.
Keep in mind that these numbers are based on sampling, and depending on the size and number of samples, may not exactly reflect the actual percentage of time spent in each method and will likely vary from run to run. Also, if you make external calls to the OS, to user actions or other C libraries, this will also distort results for the invoking method.
Profiling as previously described is focused on the performance of a block of code. ProfMonitor provides additional options that let you track other things that are going on, alongside your code execution, that can impact application performance. These are:
To profile these attributes, use profiling methods with the setOptions: keyword, specify the option that you want to profile.
When profiling these options, you get one or more standard time-based reports along with the specific attribute profile or profiles. This allows you to correlate with the basic performance over a specific execution run.
Object creation tracking is enabled in a number of ways:
When object creation tracking is enabled, after the standard report sections, an additional section is included to report the count and object creation.
OBJECT CREATION REPORT:
elapsed CPU time: 40 ms
monitoring interval: 2.0 ms
tally class of created object
call stack
------ -----------------------------------------
600 String class
- - - - - - - - - - - - - - - - - - - - - - - -
500 SmallInteger >> asString
500 SymbolList >> namesReport
500 UserProfile >> dictionaryNames
500 executed code
500 GsNMethod class >> _gsReturnToC
- - - - - - - - - - - - - - - - - - - - - - - -
100 String class >> new
100 SymbolList >> namesReport
100 UserProfile >> dictionaryNames
100 executed code
100 GsNMethod class >> _gsReturnToC
------ -----------------------------------------
100 Array class
- - - - - - - - - - - - - - - - - - - - - - - -
100 SymbolList >> names
100 SymbolList >> namesReport
100 UserProfile >> dictionaryNames
100 executed code
100 GsNMethod class >> _gsReturnToC
While object creation is the most important way to profile a Gem’s use of memory, ProfMonitor provides several other options to allow you to track the impact of the operations by methods on the Gem’s memory use.
Using the following keys in the setOptions: argument changes the profiling to the specific kind of profile. Only one of these can be used at a time.
By default, for object faults, page faults, and eden space, the sampling frequency is calculated in real time, rather than CPU time. This can be changed by also including #cpu in the setOptions: array. Garbage collection time is sampled by default in CPU time.
When using these option, the first two reports describe faults, rather than milliseconds. The third report provides millisecond performance information to allow correlation with the faulting data.
As you’ve seen, GemStone ordinarily manages the placement of objects on the disk automatically—you’re never forced to worry about it. Occasionally, you might choose to group related objects on secondary storage to enable GemStone to read all of the objects in the group with as few disk accesses as possible.
Because an access to the first element usually presages the need to read the other elements, it makes sense to arrange those elements on the disk in the smallest number of disk pages. This placement of objects on physically contiguous regions of the disk is the function of class Object’s clustering protocol. By clustering small groups of objects that are often accessed together, you can sometimes improve performance.
Clustering a group of objects packs them into disk pages, each page holding as many of the objects as possible. The objects are contiguous within a page, but pages are not necessarily contiguous on the disk.
Clustering objects solves a specific problem—slow performance due to excessive disk accessing. However, disk access is not the only factor in poor performance. In order to determine if clustering will solve your problem, you need to do some diagnosis. You can use GemStone’s VSD utility to find out how many times your application is accessing the disk. VSD allows you to chart system statistics over time to better understand the performance of your system. See the VSD User’s Guide for more information on using VSD.
The following statistics are of interest:
You can examine the values of these statistics before and after you commit each transaction to discover how many pages it read in order to perform a particular query, and to determine the number of disk accesses required by the process of committing the transaction.
It is tempting to ignore these issues until you experience a problem such as an extremely slow application, but if you keep track of such statistics on a regular (even if intermittent) basis, you will have a better idea of what is “normal” behavior when a problem crops up.
You can think of clustering as writing the components of their receivers on a stream of disk pages. When a page is filled, another is randomly chosen and subsequent objects are written on the new page. A new page is ordinarily selected for use only when the previous page is filled, or when a transaction ends. Sending the message cluster to objects in repeated transactions will, within the limits imposed by page capacity, place its receivers in adjacent disk locations. (Sending the message cluster to objects repeatedly within a transaction has no effect.)
The stream of disk pages used by cluster and its companion methods is called a bucket. GemStone captures this concept in the class ClusterBucket.
If you determine that clustering will improve your application’s performance, you can use instances of the class ClusterBucket to help. All objects assigned to the same instance of ClusterBucket are to be clustered together. When the objects are written, they are moved to contiguous locations on the same page, if possible. Otherwise the objects are written to contiguous locations on several pages.
Once an object has been clustered into a particular bucket and committed, that bucket remains associated with the object until you specify otherwise. When the object is modified, it continues to cluster with the other objects in the same bucket, although it might move to another page within the same bucket.
By default, a global array called AllClusterBuckets defines seven instances of ClusterBucket. Each can be accessed by specifying its offset in the array. For example, the first instance, AllClusterBuckets at: 1, is the default bucket when you log in. This bucket is invariant—you cannot modify it.
The second, third, and seventh cluster buckets in the array can be used for whatever purposes you require and can all be modified.
The GemStone system makes use of the fourth, fifth, and sixth buckets of the array AllClusterBuckets:
You can determine how many cluster buckets are currently defined by executing:
System maxClusterBucket
A given cluster bucket’s offset in the array specifies its clusterId. A cluster bucket’s clusterId is an integer in the range of 1 to (System maxClusterBucket).
NOTE
For compatibility with previous versions of GemStone, you can use a clusterId as an argument to any keyword that takes an instance of ClusterBucket as an argument.
You can determine which cluster bucket is currently the system default by executing:
System currentClusterBucket
You can access all instances of cluster buckets in your system by executing:
ClusterBucket allInstances
You can change the current default cluster bucket by executing an expression of the form:
System clusterBucket: aClusterBucket
You are not limited to the predefined instances of ClusterBucket. You can create new instances of ClusterBucket with the simple expression ClusterBucket new.
This expression creates a new instance of ClusterBucket and adds it to the array AllClusterBuckets. You can then access the bucket in one of two ways. You can assign it a name:
UserGlobals at: #empClusterBucket put: (ClusterBucket new)
You could then refer to it in your application as empClusterBucket. Alternatively, you can use the offset into the array AllClusterBuckets. For example, if this is the first cluster bucket you have created, you could refer to it this way:
AllClusterBuckets at: 8
(Recall that the first seven elements of the array are predefined.)
You can determine the clusterId of a cluster bucket by sending it the message clusterId. For example:
empClusterBucket clusterId
8
You can access an instance of ClusterBucket with a specific clusterId by sending it the message bucketWithId:.
You can create and use as many cluster buckets as you need; up to thousands, if necessary.
NOTE
For best performance and disk space usage, use no more than 32 cluster buckets in a single session.
Cluster buckets are designed to minimize concurrency conflicts. As many users as necessary can cluster objects at the same time, using the same cluster bucket, without experiencing concurrency conflicts. Cluster buckets do not contain or reference the objects clustered on them -- the objects that are clustered keep track of their bucket. This also avoids problems with authorizations.
However, creating a new instance of ClusterBucket automatically adds it to the global array AllClusterBuckets. Adding an instance to AllClusterBuckets causes a concurrency conflict when more than one transaction tries to create new cluster buckets at the same time, since all the transactions are all trying to write the same array object.
To avoid concurrency conflicts, you should design your clustering when you design your application. Create all the instances of ClusterBucket you anticipate needing and commit them in one or few transactions.
To facilitate this kind of design, GemStone allows you to associate descriptions with specific instances of ClusterBucket. In this way, you can communicate to your fellow users the intended use of a given cluster bucket with the message description:. For example:
UserGlobals at: #empClusterBucket put: (ClusterBucket new).
empClusterBucket description: 'Use this bucket for
clustering employees and their instance variables.'
As you can see, the message description: takes a string of text as an argument.
Changing the attributes of a cluster bucket, such as its description or clusterId, writes that cluster bucket and thus can cause concurrency conflict. Only change these attributes when necessary.
NOTE
For best performance and disk space usage as well as avoiding concurrency conflicts, create the required instances of ClusterBucket all at once, instead of on a per-transaction basis, and update their attributes infrequently.
Class Object defines several clustering methods. One method is simple and fundamental. Another method is more sophisticated and attempts to order the receiver’s instance variables as well as writing the receiver itself.
The basic clustering message defined by class Object is cluster. For example:
myObject cluster
This simplest clustering method simply assigns the receiver to the current default cluster bucket; it does not attempt to cluster the receiver’s instance variables. When the object is next written to disk, it will be clustered according to the attributes of the current default cluster bucket.
If you wish to cluster the instance variables of an object, you can define a special method to do so.
CAUTION
Do not redefine the method cluster in the class Object, because other methods rely on the default behavior of the cluster method. You can, however, define a cluster method for classes in your application if required.
Suppose, for example, that you defined class Name and class Employee as shown in Example 16.4.
Object subclass: 'Name'
instVarNames: #('first' 'middle' 'last')
classVars: #( )
classInstVars: #()
poolDictionaries: {}
inDictionary: UserGlobals.
Object subclass: 'Employee'
instVarNames: #('name' 'job' 'age' 'address')
classVars: #( )
classInstVars: #()
poolDictionaries: {}
inDictionary: UserGlobals.
The following clustering method might be suitable for class Employee. (A more purely object-oriented approach would embed the information on clustering first, middle, and last names in the cluster method for Name, but such an approach does not exemplify the breadth-first clustering technique we wish to show here.)
method: Employee
clusterBreadthFirst
self cluster.
name cluster.
job cluster.
address cluster.
name first cluster.
name middle cluster.
name last cluster.
^false
%
| Lurleen |
Lurleen := Employee new name: (Name new first: #Lurleen);
job: 'busdriver'; age: 24; address: '540 E. Sixth'.
Lurleen clusterBreadthFirst
The elements of byte objects such as instances of String and Float are always clustered automatically. A string’s characters, for example, are always written contiguously within disk pages. Consequently, you need not send cluster to each element of each string stored in job or address; clustering the strings themselves is sufficient. Sending cluster to individual special objects (instances of SmallInteger, Character, Boolean, SmallDouble, or UndefinedObject) has no effect. Hence no clustering message is sent to age in the previous example.
After sending cluster to an Employee, the Employee is clustered as follows:
anEmp aName job address first middle last
cluster returns a Boolean value. You can use that value to eliminate the possibility of infinite recursion when you’re clustering the variables of an object that can contain itself. Here are the rules that cluster follows in deciding what to return:
clusterDepthFirst differs from cluster only in one way: it traverses the tree representing its receiver’s instance variables (named, indexed, or unordered) in depth-first order, assigning each node to the current default cluster bucket as it is visited. That is, it writes the receiver’s first instance variable, then the first instance variable of that instance variable, then the first instance variable of that instance variable, and so on to the bottom of the tree. It then backs up and visits the nodes it missed before, repeating the process until the whole tree has been written.
After sending clusterDepthFirst to an Employee, the Employee is clustered as follows:
anEmp aName first middle last job address
Both cluster and clusterDepthFirst use the current default cluster bucket. If you wish to use a specific cluster bucket instead, you can use the method clusterInBucket:. For example, the following expression clusters aBagOfEmployees using the specific cluster bucket empClusterBucket:
aBagOfEmployees clusterInBucket: empClusterBucket
In order to determine the cluster bucket associated with a given object, you can send it the message clusterBucket. For example, after executing the example above, the following example would return the value shown below:
aBagOfEmployees clusterBucket
empClusterBucket
Clustering tags objects in memory so that when the next successful commit occurs, the objects are clustered onto data pages according to the method specified. After an object has been clustered, it is considered to be “dirty”. If you cluster a large number of objects, you may need to increase temporary object memory to avoid running out of session memory. See Managing VM Memory.
When you want to write a loop that clusters parts of each object in a group into separate pages, it is helpful to have multiple cluster buckets available.
Suppose that you had defined class SetOfEmployees and class Employee as in Example 16.4. Suppose, in addition, that you wanted a clustering method to write all employees contiguously and then write all employee addresses contiguously.
With only one cluster bucket at your disposal, you would need to define your clustering method as shown in Example 16.6. In this approach, each employee is fetched once for clustering, then fetched again in order to cluster the employee’s address.
Clustering provides the most benefit for small groups of objects that are often accessed together — for example, a class with its instance variables. Those instance variables of a class that describe the class’s variables are often accessed in a single operation, as are the instance variables that contain a class’s methods. Therefore, class Behavior defines the following special clustering methods for classes:
The code in Example 16.7 clusters class Employee’s structure-describing variables, then its class methods, and finally its instance methods.
| behaviorBucket descriptionBucket |
behaviorBucket := AllClusterBuckets at: 4.
descriptionBucket := AllClusterBuckets at: 5.
System clusterBucket: descriptionBucket.
Employee clusterDescription.
System clusterBucket: behaviorBucket.
Employee class clusterBehavior.
Employee clusterBehavior.
Once you have clustered certain objects, they do not necessarily stay clustered in the same way forever. If you edit some of the objects in the data structure, the edited object will be placed on a new page in the same clusterBucket. The performance benefit of clustering is that the objects are on the same page, but since the clusterBucket will span multiple pages, the objects may be in the same clusterBucket but not on the same page.
You may therefore wish to check an object’s location, especially if you suspect that such declustering is causing your application to run more slowly than it used to.
To enable you to check your clustering methods for correctness, Class Object defines the message page, which returns an integer identifying the disk page on which the receiver resides. For example:
anEmp page
2539
Disk page identifiers are returned only for temporary use in examining the results of your custom clustering methods—they are not stable pointers to storage locations. The page on which an object is stored can change for several reasons, as discussed in the next section.
For special objects (instances of SmallInteger, Character, Boolean, SmallDouble, or UndefinedObject), the page number returned is 0.
The page on which an object is stored can change for any of the following reasons:
As your application updates clustered objects, new values are placed on secondary storage using GemStone’s normal space allocation algorithms. When objects are moved, they are automatically reclustered within the same clusterId. If a specific clusterId was specified, it continues to be used; if not, the default clusterId is used.
If, for example, you replace the string at position 2 of the clustered array ProscribedWords, the replacement string is stored in a page separate from the one containing the original, although it will still be within the same clusterId. Therefore, it might be worthwhile to recluster often-modified collections occasionally to counter the effects of this fragmentation. You’ll probably need some experience with your application to determine how often the time required for reclustering is justified by the resulting performance enhancement.
As code executes in GemStone, committed objects must be fetched from disk or from cache, and temporary objects must be managed. This is handled transparently by the GemStone repository monitor. The performance of your application can be affected both by the tuning of the caches, and the structure and usage patterns of your application.
GemStone uses two kinds of caches: temporary object space and the shared page cache.
The temporary object space cache is used to store temporary objects created by your application. Each Gem session has a temporary object memory that is private to the Gem process and its corresponding session. When you fault persistent (committed) objects into your application, they are copied to temporary object memory.
Some of the temporary objects in the cache may ultimately become permanent and reside on the disk, but probably not all of them. Temporary objects that your application creates merely in order to do its work reside in temporary object space until they are no longer needed, when the Gem’s garbage collector reclaims the storage they use.
It is important to provide sufficient temporary object space. At the same time, you must design your application so that it does not create an infinite amount of reachable temporary objects. Temporary object memory must be large enough to accommodate the sum of live temporary objects and modified persistent objects. It that sum exceeds the allocated temporary object memory, the Gem can encounter an OutOfMemory condition and terminate.
The amount of memory allocated for temporary object space is primarily determined by the GEM_TEMPOBJ_CACHE_SIZE configuration option. You should increase this value for applications that create a large number of temporary objects — for example, applications that make heavy use of the reduced conflict classes or sessions performing a bulk load.
You will probably need to experiment somewhat before you determine the optimum size of the temporary object space for the application. The default of 50000 (50 MB) should be adequate for normal user sessions. For sessions that place a high demand on the temporary object cache, such as upgrade, you may wish to use a much larger cache size.
For a more exhaustive discussion of the issues involved in managing the size of temporary object memory, and a general discussion of garbage collection, see the “Garbage Collection” chapter of the System Administration Guide.
For details about how to set the size of GEM_TEMPOBJ_CACHE_SIZE in the Gem configuration file, see the “GemStone Configuration Options” appendix of the System Administration Guide.
The shared page cache is used to hold the object table—a structure containing pointers to all the objects in the repository—and copies of the disk pages that hold the objects with which users are presently working. The system administrator defines the size of the shared page cache in the configuration file for the Stone. The single active Stone process per repository has one shared page cache per host machine.
Whenever the Gem needs to read an object, it reads the entire page on which an object resides into the shared page cache. If the Gem then needs to access another object, GemStone first checks to see if the object is already in the shared page cache. If it is, no further disk access is necessary. If it is not, it reads another page into the shared page cache.
For acceptable performance, the shared page cache should be large enough to hold the entire object table. To get the best possible performance, make the shared page cache as large as possible, ideally large enough to contain all objects in the repository.
You can determine the size of the object table by examining the results of a pageaudit (see the System Administration Guide), which includes the number and size required of various kinds of pages in the repository, including the kinds of object table pages.
The amount of memory allocated for the shared page cache is determined by the SHR_PAGE_CACHE_SIZE_KB configuration parameter (in the Stone configuration file). This is described in the System Administration Guide (Appendix A, GemStone Configuration Options).
As discussed in Chapter 4, you can create instances of KeySoftValueDictionary to enable your session to free up temporary object memory as needed. The entries in a KeySoftValueDictionary are non-persistent; that is, they cannot be committed to the database. When there is a demand on memory, you can configure GemStone to clear non-persistent entries as needed during a VM mark/sweep garbage collection.
The action taken during mark/sweep depends on two configuration parameters, along with startingMemUsed — the percentage of temporary object memory in-use at the beginning of the VM mark/sweep.
Case 1: GEM_SOFTREF_CLEANUP_PERCENT_MEM < startingMemUsed < 80%
If startingMemUsed is greater than GEM_SOFTREF_CLEANUP_PERCENT_MEM but less than 80%, the VM mark/sweep will attempt to clear an internally determined number of least recently used SoftReferences (non-persistent entries). Under rare circumstances, you might choose to specify a minimum number (GEM_KEEP_MIN_SOFTREFS) that will not be cleared.
Case 2: startingMemUsed < GEM_SOFTREF_CLEANUP_PERCENT_MEM
No SoftReferences will be cleared.
VM mark/sweep will attempt to clear all SoftReferences.
For more about these and other configuration parameters, see the “GemStone Configuration Options” appendix of the System Administration Guide.
Several cache statistics may also be of interest: NumSoftRefsCleared, NumLiveSoftRefs, and NumNonNilSoftRefs. For more about these statistics, see the “Monitoring GemStone” chapter of the System Administration Guide.
As mentioned earlier in this chapter, each Gem session has a temporary object memory that is private to the Gem process and its corresponding session. When you fault persistent (committed) objects into your application, they are copied to temporary object memory.
It is important to provide sufficient temporary object space. At the same time, you must design your application so that it does not create an infinite amount of reachable temporary objects. Temporary object memory must be large enough to accommodate the sum of live temporary objects and modified persistent objects. If that sum exceeds the allocated temporary object memory, the Gem can encounter an OutOfMemory condition and terminate.
There is a limit on how large a transaction can be, either in terms of the total size of previously committed objects that are modified, or of the total size of temporary objects that are transitively reachable from modified committed objects. For large applications, you may need to commit incrementally, rather than waiting to commit all at once.
The remainder of this chapter discusses issues to consider when allocating and managing temporary object memory, and presents techniques for diagnosing and addressing OutOfMemory conditions. This section assumes you have read the general discussion of memory organization in the “Managing Memory” chapter of the System Administration Guide.
If your application requires a large working set of committed objects in memory, you can configure the pom area to be large (compared to other object spaces) without having an adverse effect on in-memory garbage collection. To do this, increase the setting for the configuration parameter GEM_TEMPOBJ_POMGEN_SIZE. For details on how to do this, see the System Administration Guide, Appendix A.
If your application references a very deep class hierarchy, you may need to adjust the memory configuration accordingly to allow a larger temporary object memory. When an object is in memory, its class is also faulted into the perm area of temporary object memory, along with the class’s superclass, extending up through the hierarchy all the way to Object. While this approach provides for significantly faster message lookups, it also increases the consumption of temporary object memory.
For example, the default configuration provides 1 MB for the perm area. Each class consumes about 400 bytes (including the metaclass). Thus, the default configuration can accommodate about 2500 classes in memory at once.
NOTE
Do not compact the code region of temporary object memory while a UserAction is executing.
When using GemBuilder for C, you may encounter an OutOfMemory error within an UserAction in either of the following situations:
The Export Set is a collection of objects for which the Gem process has handed out its OOP to one of the interfaces (GCI, GBS, objects returned from topaz run commands). Objects in the export set are prevented from being garbage collected by any of the garbage collection processes (that is, by a Gem’s in-memory collection of temporary objects, markForCollection, or the epoch garbage collection). The export set is used to guarantee referential integrity for objects only referenced by an application, that is, objects that have no references to them within the Gem.
The application program is responsible for timely removal of objects from the export set. The contents of the export set can be examined using hidden set methods defined in class System.
In general, the smaller the size of the export set, the better the performance is likely to be. There are several reasons for this relationship. The export set is one of the root sets used for garbage collection. The larger the export set, the more likely it is that objects that would otherwise be considered garbage are being retained. One threshold for performance is when the size of the export set exceeds 16K objects. When its size is smaller than 16K objects, the export set is a small object in object memory. When its size is larger than 16K, the export set becomes a large object, implemented as a tree of small objects in memory.
The configuration parameter #GemDropCommittedExportedObjs will allow committed object to be removed from the export set when memory is low, at the expense of having to re-fault these object when they are needed.
If you find that your application is running out of temporary memory, you can set several GemStone environment variables to help you identify which parts of your application are triggering OutOfMemory conditions. These environment variables allow you to obtain multiple Smalltalk stack printouts and other useful information before your application runs out of temporary object memory. For example, it displays now many objects of each class are in temporary memory.
Details on these environment variables are provided in the System Administration Guide, and they are listed in the $GEMSTONE/sys/gemnetdebug file, which is a debug version of the gemnetobject script. gemnetdebug enables some, but not all, available memory related environment variables. By using gemnetdebug instead of gemnetobject in your RPC login parameters, you can generate memory logging information. For help with analysis, contact GemTalk Technical Support.
Once you’ve identified the cause/s of the problem, you can modify your application to reduce the demand on memory, or adjust your GemStone configuration options to provide a larger amount of memory.
When a session runs low on temporary object memory, there are actions it can take to avoid running out of memory altogether; for example, the session may commit or abort, or discard temporary objects.
By enabling handling for AlmostOutOfMemoryError, an application can take appropriate action before memory is entirely full. This exception is asynchronous, so may be received at any time memory use is greater than the threshold the end of an in-memory markSweep. However, if the session is executing a user action, or is in index maintenance, the error is deferred and generated when execution returns.
After an AlmostOutOfMemoryError notification is delivered, the handling is automatically disabled. Handling must be reenabled each time the signal occurs. Handling this signal is enabled by executing either of the following:
AlmostOutOfMemoryError enable
The default the default threshold (the amount of memory that is used) is 90%. You can specify a different threshold using,
AlmostOutOfMemoryError enableAtThreshold: integerBetween1And125
When you enable this error, when your temporary object memory is 90% full, your session will get an error; but the session will not terminate with an out of memory error. You can take such steps as are needed to recover temporary object space (this will likely happen automatically as temporary variables in your executing code will be available for reclaim).
You may also set an handler block to take specific action, such as commit or abort, on an AlmostOutOfMemoryError. The following example shows how you can catch an AlmostOutOfMemoryError, automatically commit your work, and resume execution.
To find out how much space is left in the old area of temporary memory, the following methods in class System (category Performance Monitoring) are provided:
System _tempObjSpaceUsed
Returns the approximate number of bytes of temporary object memory being used to store objects.
System _tempObjSpaceMax
Returns the size of the old area of temporary object memory; that is, the approximate maximum number of bytes of temporary object memory that are usable for storing objects. When the old area fills up, the Gem process may terminate with an OutOfMemory error.
System _tempObjSpacePercentUsed
Returns the approximate percentage of temporary object memory that is being used to store temporary objects. This is equivalent to the expression:
(System _tempObjSpaceUsed * 100) // System _tempObjSpaceMax.
Note that it is possible for the result to be slightly greater than 100%. Such a result indicates that temporary memory is almost completely full.
To measure the size of complex objects, you might create a known object graph containing typical instances of the classes in question, and then execute the following methods at various points in your test code to get memory usage information:
CAUTION
Do not execute this sequence in your production code!
You can monitor the following statistics to better understand your application’s memory usage. The statistics are grouped here with related statistics, rather than alphabetically.
When a new symbol is needed (which may just be from evaluating a code snippet that includes a symbol), it is created by the SymbolGem. The SymbolGem process runs in the background and is responsible for creating all new Symbols, based on session requests that are managed by the Stone. You can examine the following statistics to track the effect of symbol creation activity on temporary object memory.
The number of objects in the ExportSet (Exported Set). |
|
The number of objects in the Tracked Objects Set, as defined by the GCI. You can use GciReleaseObjs to remove objects from the Tracked Objects Set. For details, see the GemStone/S 64 Bit GemBuilder for C manual. |
|
The number of modified committed objects in the temporary object memory dirty list. |
|
The number of objects in memory that have an objectId assigned to them; approximately the number of committed objects that have been faulted in plus the number that have been created and committed. |
|
The approximate percentage of temporary object memory for this session that is being used to store temporary objects. If this value approaches or exceeds 100%, sessions will probably encounter an OutOfMemory error. This statistic is only updated at the end of a mark/sweep operation. Compare with System _tempObjSpacePercentUsed, which is computed whenever the primitive is executed. |
All changes to the repository are written to the transaction logs when the transaction is committed, to ensure these changes are recoverable in case of unexpected shutdown, and to allow these changes to be applied to warm standby copies of the repository. However, you may have data that you will be committing changes to, but that does not need to be recovered in case of system crash or corruption. For this kind of data, you can avoid the overhead of writing each change to the transaction logs, and the disk space required for the transaction logs to archive large amounts of non-critical data.
For objects that are intended to be persistent, but not log changes in the transaction logs, there must be no reference from persistent objects, and the reference should be from the variable NotTranloggedGlobals. This is in the Globals SymbolDictionary.
NotTranloggedGlobals at: #perfLog put: PerformanceLogger new.
If the object in NotTranlogGlobals is reachable from AllUsers (the regular root for all persistent objects), it will generate an error on commit.
On system crash or unexpected shutdown, the state of the objects reachable from NotTranloggedGlobals will be as was recorded in the most recent checkpoint prior to the shutdown; changes made after that checkpoint will be lost. If the repository is restored from backup, and transaction logs applied, the state of these objects will be as of the time the backup was taken; all changes made since the backup was taken are lost.
While optimization is an application-specific problem, we can provide a few ideas for improving application performance:
SmallInteger
SmallDouble
Float
LargeInteger
ScaledDecimal
DecimalFloat
Less efficient blocks include a return statement and can also refer to one or more of the pseudovariables super or self, instance variables of self, arguments to the enclosing method, temporary variables of the enclosing method, block arguments, or block temporaries of an enclosing block.
The least efficient blocks enclose a less efficient block of the kind described in the above paragraph.
Blocks provided as arguments to the methods ifTrue:, ifFalse:, ifTrue:ifFalse:, ifFalse:ifTrue:, whileFalse:, and whileTrue: are specially optimized. Unless they contain block temporary variables, you need not count them when counting levels of block nesting.
In the same way, for fastest performance in iterating over Collections, use the to:do: or to:by:do: methods to iterate, rather than do: or other collection iteration methods