15. Managing Repository Growth

Previous chapter

Next chapter

In the course of everyday operations, your GemStone/S 64 Bit repository will grow. Some of this growth will be the result of new data in your repository, but some will represent unreferenced or outdated objects. These objects, no longer needed, must be removed to prevent the repository from growing arbitrarily large. The process of removing unwanted objects to reclaim their storage is referred to as garbage collection.

This chapter describes GemStone’s garbage collection mechanisms and explains how and when to use them.

This chapter discusses the following topics:

Basic Concepts
The main concepts underlying garbage collection.

Garbage Collection Operations
This include MarkForCollection, Epoch Garbage Collection, and Reclaim.

Running Admin and Reclaim Gems
How to configure, start, and stop the Admin Gem and the Reclaim Gem.

Further Tuning Garbage Collection
Tuning multi-threaded scan operations, and other special issues affecting Garbage Collection.

15.1  Basic Concepts

Smalltalk execution can produce a number of objects needed only for the moment. In addition, normal business operations can cause previously committed objects to become obsolete. To make the best use of system resources, it is desirable to reclaim the resources these objects use as soon as possible.

Different Types of Garbage

Garbage collection mechanisms vary according to where garbage collection occurs — temporary (scratch) memory or permanent object space — and how it occurs — automatically, or in response to an administrator’s action.

Each Gem session has its own private memory intended for scratch space, known as local object memory. The Gem session uses local object memory for a variety of temporary objects, which can be garbage-collected individually. This type of garbage collection is handled automatically by the session and is (for the most part) not configurable, although memory can be configured for specific gem requirements. These issues are covered in Chapter 14, “Managing Gem Memory”.

Permanent objects are organized in units of 16 KB called pages. Pages exist in the shared page cache and on disk in the extents. When first created, each page is associated with a specific transaction; after its transaction has completed, GemStone does not write to that page again until all its storage can be reclaimed.

Objects on pages are not garbage-collected individually. Instead, the presence of a shadow object or dead object triggers reclaim of the page on which the object resides. Live objects on this page are copied to another page.

The Process of Garbage Collection

Removing unwanted objects is a two-phase process:

1. Identify—mark—superfluous objects.

2. Reclaim the resources they consume.

Together, marking and reclaiming unwanted objects is collecting garbage.

Complications ensue because each Gem in a transaction is guaranteed a consistent snapshot view of the repository: all visible objects are guaranteed to remain in the same state as when the transaction began. If another Gem commits a change to a mutually visible object, both states of the object must somehow coexist until the older transaction commits or aborts, refreshing its snapshot view. Therefore, resources can be reclaimed only after all transactions concurrent with marking have committed or aborted.

Older views of committed, modified objects are called shadow objects.

Garbage collection reclaims three kinds of resources:

  • The storage occupied by dead objects
  • The storage occupied by shadow objects
  • Object identifiers (OOPs) for dead objects
Live objects

GemStone considers an object live if it can be reached by traversing a path from AllUsers, the root object of the GemStone repository. By definition, AllUsers contains a reference to each user’s UserProfile. Each UserProfile contains a reference to the symbol list for a given user, and those symbol dictionaries in these lists in turn point to classes and instances created by that user’s applications. Thus, AllUsers is the root node of a tree whose branches and leaves encompass all the objects that the repository requires at a given time to function as expected.

Transitive closure

Traversing such a path from a root object to all its branches and leaves is called transitive closure.

Dead objects

An object is dead if it cannot be reached from the AllUsers root object. Other dead objects may refer to it, but no live object does. Without living references, the object is visible only to the system, and is a candidate for reclaim of both its storage and its OOP.

Shadow objects

A shadow object is a committed object with an outdated value. A committed object becomes shadowed when it is modified during a transaction. Unlike a dead object, a shadow object is still referenced in the repository because the old and new values share a single object identifier. The shadow object must be maintained as long as it is visible to other transactions on the system; then the system can reclaim only its storage, not its OOP (which is still in use identifying the committed object with its current value).

Commit records

Views of the repository are based on commit records, structures written when a transaction is committed. Commit records detail every object modified (the write set), as well as the new values of modified objects. The Stone maintains these commit records; when a Gem begins a transaction or refreshes its snapshot view of the repository, the starting snapshot view is based on the most recent commit record available.

Each session’s snapshot view is based on exactly one commit record at a time, but any number of sessions’ snapshot views can be based on the same commit record.

NOTE
The repository must retain each commit record and the shadow objects to which it refers as long as that commit record defines the transactionsnapshot view of any session.

Commit record backlog

The list of commit records that the Stone maintains in order to support multiple repository views is the commit record backlog.

Shadow or Dead?

The following example illustrates the difference between dead and shadow objects. In Figure 15.1, a user creates a SymbolAssociation in the SymbolDictionary Published. The SymbolAssociation is an object (oop 27111425) that refers to two other objects, its instance variables key (#City, oop 20945153), and value ('Beaverton', oop 27110657).

The Topaz command “display oops” causes Topaz to display within brackets ( [ ] ) the identifier, size, and class of each object. This display is helpful in examining the initial SymbolAssociation and the changes that occur.

Figure 15.1   An Association Is Created and Committed

topaz 1> display oops
topaz 1> printit
Published at: #City put: 'Beaverton'.
Published associationAt: #City
%
[27111425 sz:2 cls: 111617 SymbolAssociation] a SymbolAssociation
  key                 [20945153 sz:4 cls: 110849 Symbol] City
  value               [27110657 sz:9 cls: 74753 String]

 

 

Figure 15.2 shows a second Topaz session that logs in at this point. Notice that the Topaz prompt identifies the session by displaying a digit. Because Session 1 committed the SymbolAssociation to the repository, Session 2 can see the SymbolAssociation.

Figure 15.2   A Second Session Can See the Association

topaz 2> display oops
topaz 2> printit
Published associationAt: #City.
%
[27111425 sz:2 cls: 111617 SymbolAssociation] a SymbolAssociation
  key                 [20945153 sz:4 cls: 110849 Symbol] City
  value               [27110657 sz:9 cls: 74753 String] Beaverton

Now Session 1 changes the value instance variable, creating a new SymbolAssociation (Figure 15.3). Notice in the oops display that the new SymbolAssociation object has the same identifier (27111425) as the previous Association.

Figure 15.3   The Value Is Replaced, Changing the Association

topaz 1> printit
City := 'Portland'.
Published associationAt: #City.
%
[27111425 sz:2 cls: 111617 SymbolAssociation] a SymbolAssociation
  key                 [20945153 sz:4 cls: 110849 Symbol] City
  value               [27109121 sz:8 cls: 74753 String] Portland
topaz 1> commit
Successful commit

 

 

  • The SymbolAssociation is now shadowed. Because the shadow SymbolAssociation was part of the committed repository and is still visible to other transactions (such as that of Session 2), it cannot be overwritten. Instead, the new SymbolAssociation is written to another page, one allocated for the current transaction.
  • The previous value (oop 27110657) is no longer referenced in the repository. For now, this object is considered possibly dead; we cannot be sure it is dead because, although the object has been dereferenced by a committed transaction, other, concurrent transactions might have created a reference to it.

Even though Session 1 committed the change, Session 2 continues to see the original SymbolAssociation and its value (Figure 15.4). Session 2 (and any other concurrent sessions) will not see the new SymbolAssociation and value until it either commits or aborts the transaction that was ongoing when Session 1 committed the change.

Figure 15.4   Session 2 Sees Change After Renewing Transaction View of Repository

topaz 2> printit
Published associationAt: #City.
%
[27111425 sz:2 cls: 111617 SymbolAssociation] a SymbolAssociation
  key                 [20945153 sz:4 cls: 110849 Symbol] City
  value               [27110657 sz:9 cls: 74753 String] Beaverton
 
topaz 2> abort
topaz 2> printit
Published associationAt: #City.
%
[27111425 sz:2 cls: 111617 SymbolAssociation] a SymbolAssociation
  key                 [20945153 sz:4 cls: 110849 Symbol] City
  value               [27109121 sz:8 cls: 74753 String] Portland

Only when all sessions with concurrent transactions have committed or aborted can the shadow object be garbage collected.

What Happens to Garbage?

This section describes the steps involved in garbage collection. Specific garbage collection mechanisms will follow these steps, although the details will vary when using different garbage collection mechanisms.

The basic garbage collection process encompasses nine steps:

1. Find all the live objects in the system by traversing references, starting at the system root AllUsers. This step is called mark/sweep.

2. The Gem that performed mark/sweep now has a list of all live objects. It also knows the universe of all possible objects: objects whose OOPs range from zero to the highest OOP in the system. It can now compute the set of possible dead objects as follows:

a. Subtract the live objects from the universe of possible objects.

b. Subtract all the unassigned (free) OOPs in that range.

This step is called the object table sweep because the Gem uses the object table to determine the universe of possible objects and the unassigned OOPs.

3. The Gem performing this work now has a list of possibly dead objects. We can’t be sure they’re dead because, during the time that the mark/sweep and object table sweep were occurring, other concurrent transactions might have created references to some of them.

The Gem sends the Stone the possible dead set and returns.

4. Now, in a step called voting, each Gem logged into the system must search its private memory to see if it has created any references to objects in the possible dead set. When it next commits or aborts, it votes on every object in the possible dead set. Objects referenced by a Gem are removed from the possible dead set.

Gems do not vote until they complete their current transaction. If a Gem is sleeping or otherwise engaged in a long transaction, the vote cannot be finalized and garbage collection pauses at this point. Commit records accumulate, garbage accumulates, and a variety of problems can ensue. Repository scan operations such as listInstances cannot be executed until voting is complete.

5. Because all the previous steps take time, it’s possible that some Gems were on the system when the mark/sweep began, created a reference to an object now in the possible dead set, and then logged out. They cannot vote on the possible dead set, but objects they’ve modified are in the write sets of their commit records. The Admin Gem, a process dedicated to administrative garbage collection tasks, scans all these write sets (the write set union), and votes on their behalf. This is called the write set union sweep.

6. After all voting is complete, the resulting set now holds nothing but unreferenced objects. The Stone now promotes the objects from possibly dead to dead.

7. the Reclaim Gem reclaims pages: it copies live objects on the page onto a new page, thereby compacting live objects in page space. The page now contains only recycleable objects and perhaps free space.

8. The Reclaim Gem commits. The reclaimed OOPs are returned to their free pool.

9. The Reclaim Gem’s commit record is disposed of. The reclaimed pages are returned to their free pool.

Admin and Reclaim Gems

It is useful to understand the distinction between the Admin Gem and the Reclaim Gem:

  • The Admin Gem finalizes the vote on possibly dead objects (Step 5), and performs the write set union sweep. The Admin Gem also performs Epoch Garbage Collection, if enabled.
  • The Reclaim Gem is dedicated to the task of reclaiming shadowed pages and dead objects repository-wide, along with their OOPs.

The Reclaim Gem includes a master session and multiple reclaim sessions, each being a thread within the Reclaim Gem process. This allows reclaim to occur in parallel.

By default, the Admin Gem and the Reclaim Gem with one (depending on the repository size) reclaim session are configured to run, and are started automatically when the Stone is started. By default, epoch is disabled.

  • We recommend that you leave the Admin Gem running at all times, although it is required only following a markForCollection or markGcCandidatesFromFile:, or after a epoch garbage collection operation. (Subsequent sections of this chapter describe these operations in detail.) If the Admin Gem is not running following one of these operations, the garbage collection process cannot complete, and garbage can build up in your system.
  • We recommend that you have the Reclaim Gem running at all times, to reclaim shadow objects that are continually created.

Admin and Reclaim Gem configuration parameters

Both the Admin and Reclaim Gems are run from the GcUser account, a special user account that logs in to the repository to perform garbage collection tasks. This account is used to set configuration values for the GcGems.

The configuration parameters that apply to either the Admin or Reclaim Gems can either be set persistently, or at runtime. Note that these are not configuration file parameters; these values are stored in the UserGlobals of the GcUser user profile account. While you can login as GcUser to make persistent updates, it is not necessary to do this.

To set parameters persistently, a user with GarbageCollection privilege, such as DataCurator, can execute System class methods setPersistentReclaimConfig: toValue: or setPersistentAdminConfig:toValue:. For example, to set #reclaimMinPages to 90:

topaz> set user DataCurator password thePassword
login
...
topaz 1> run
System setPersistentReclaimConfig: #reclaimMinPages toValue: 90
%

This has the effect of both setting the value in the current environment and setting the persistent value. These methods perform a commit, and will error if there are uncommitted changes in the image, and will error if the value is out of range or invalid.

GcGem configuration parameters can also be set transiently, so they do not persist if the Admin or Reclaim GcGem is restarted, using the System class methods setReclaimConfig:toValue: or setAdminConfig:toValue:. These methods also require GarbageCollection privilege.

GemStone’s Garbage Collection Mechanisms

GemStone provides the following mechanisms that together mark and reclaim garbage, thereby helping you to control repository growth.

Marking

Repository-wide marking — To prevent the repository from growing large enough to cause problems on a regular basis, you can run Repository >> markForCollection. This method combines a full sweep of all objects in the repository and the marking of each possible dead object in a single operation.

Epoch garbage collection — If enabled, the Admin Gem periodically examines all transactions written since a specific, recent time (the beginning of this epoch) for objects that were created and then dereferenced during that period. However, epoch garbage collection cannot reclaim objects that are created in one epoch but dereferenced in another. In spite of its name, epoch garbage collection only marks; it does not reclaim. You can configure various aspects to maximize its usefulness. Epoch garbage collection is disabled by default. For details about epoch garbage collection, see Epoch Garbage Collection.

Reclaiming

Reclaim — Once you’ve run markForCollection or epoch garbage collection, the Reclaim Gem will reclaim pages that contain either dead or shadow objects. When there are a high number of objects needing to be reclaimed, you may increase the number of sessions under the Reclaim Gem. For details about reclaiming pages, see Reclaim.

GcLocks

Garbage collection processes such as mark/sweep, and some other repository-wide scan operations, cannot safely be run concurrently. To prevent this, there is a shared internal set of locks, collectively called the GcLock. Operations that cannot run concurrently get a GcLock, which prevents another incompatible operation from starting up. There are a number of types of GcLock, including for MarkForCollection, Backup, Epoch GC, and Repository Scan (operations such as listInstances and listReferences).

If another task that requires the GcLock is in progress at the time you try to do markForCollection or findDisconnectedObjects..., they will not execute, but report an error similar to that shown below.

-- Request for MFC gclock by session 10 denied, reason: vote state
is voting, sessionId not voted 2 
ERROR 2501 , a Error occurred (error 2501), Request for gcLock
timed out. 

You can find out more about active locks in your system by invoking System class >> gcLocksReport.

Symbol Garbage Collection

Symbols in GemStone are a special case of Object, since they must always have a unique OOP across all sessions. To ensure this, symbol creation is managed by the SymbolUser, who creates all new Symbols. Symbols are stored in the AllSymbols dictionary, and are not removed, to avoid any risk of creating duplicate symbols.

However, there are cases where a large number of unimportant symbols are created, perhaps inadvertently. To reclaim this space and to manage the size of AllSymbols, you can configure GemStone to collect unreferenced symbols in a multi-step process that ensures that symbols in use are not collected.

By default, Symbol garbage collection is not enabled. It can be enabled using the configuration parameter, STN_SYMBOL_GC_ENABLED, or by the runtime equivalent, #StnSymbolGcEnabled. If enabled, symbol garbage collection is performed automatically in the background and requires no management.

When enabled, unused symbols are located and put in a possibleDeadSymbols collection as part of a markForCollection,. These symbols are hidden, to remove references from AllSymbols but retain the OOPs until the voting, union, and finalization is done. Any lookups on the hidden symbol will return the existing hidden symbol and restore it to the AllSymbols dictionary.

Once voting and write-set union sweep are done, the symbols that are otherwise

unreferenced are removed from the possibleDeadSymbols and automatically moved to the DeadNotReclaimed set. The Reclaim Gem will then process them, to reclaim the used space and OOPs.

15.2  MarkForCollection

The method Repository>>markForCollection sweeps the entire repository and marks as live all objects that can be reached through a transitive closure on the symbol lists in AllUsers, as described on here. The remaining objects become the list of possible dead objects.

markForCollection only provides a set of possible dead objects for voting and eventual reclaiming as described under What Happens to Garbage?. It does not reclaim the space or OOPs itself; the Reclaim Gem does that, as described under Reclaim.

To mark unreferenced GemStone objects for collection, log in to GemStone as a user with GarbageCollection privilege (normally DataCurator), and execute SystemRepository markForCollection.

Running markForCollection, and the subsequent reclaim tasks, places demands on system resources. On production systems. consider scheduling MFC for off hours and otherwise reducing the impact, as described in the following sections.

markForCollection aborts the current transaction and runs the mark/sweep operation inside a transaction, but monitors the commit record backlog so it can abort as necessary to prevent the backlog from growing. When markForCollection completes, the session reenters a transaction, if it was in one when this method was invoked.

When markForCollection completes successfully, the Gem that started it displays a message such as the one below:

Warning: a Warning occurred (notification 2515), markForCollection
found 110917 live objects, 3496 dead objects(occupying approx
314640 bytes), 25 possibleDeadSymbols

If another garbage collection task is in progress at the time you try to do markForCollection, this method will retry for a fixed period, reporting status. If the other operation does not complete within the timeout period, it reports an error indicating it could not get the GcLock. See GcLocks for more details. In addition, if a previous epoch or MFC completed the mark phase, but voting on possible dead objects has not completed, the markForCollection will not run. For voting to complete, the Admin Gem must be running. Also, any long-running session that neither aborts nor commits will prevent the vote from completing.

By default, the markForCollection method waits for up to a minute for the other operation to complete. To have the markForCollection wait for a longer period, use markForCollectionWait: waitTimeSeconds.

To avoid markForCollection having to wait:

Impact on Other Sessions

The markForCollection operation uses multi-threaded scan. For more details on this, see Multi-Threaded Scan.

By default, markForCollection limits is use of cpu resources if the cpu load on the system reaches 90%. It starts the operation with two threads and a page buffer size of 128. If the cpu limit is reached, the code automatically causes threads to sleep until the load is less than 90%. Depending upon the I/O required, the system may never reach this limit.

To enable markForCollection to complete as quickly as possible, you can use:

SystemRepository fastMarkForCollection

This uses higher settings (95% of CPU, and a number of threads based on the current hardware) to use as many system resources as possible. The performance of anything else running on the same system may be heavily degraded.

For maximum control, use the method

SystemRepository markForCollectionWithMaxThreads: threadsCt
	waitForLock: seconds
	pageBufSize: pageBufSize
	percentCpuActiveLimit: percentLimit

This allows you to specify the precise limits.

Starting markForCollection with these limits provides a specification for the trade-off you wish to make between speed to complete and the impact on other sessions. The desired trade-off may vary over time; for example, if your markForCollection extends over both business hours and non-business hours, you may accept greater impact during these periods of light load. The Multi-threaded scan parameters can be changed at runtime, as described under Tuning a running Multi-Threaded Scan.

After the markForCollection has completed, there may be additional impact on other sessions, since it is likely that dead objects that require reclaim were identified. After the remaining Garbage Collection steps have completed, the Reclaim Gem Sessions may become busy reclaiming the dead objects.

Scheduling markForCollection

To invoke markForCollection using the cron facility, create a three-line script file similar to the Topaz example here by entering everything except the prompt. Use this script as standard input to topaz, and redirect the standard output to another file:

topaz < scriptName > logName

Make sure that $GEMSTONE and any other required environment variables are defined during the cron job. Either create a .topazini file for a user who has GarbageCollection privilege, or insert those login settings at the beginning of the script.

15.3  Epoch Garbage Collection

Epoch garbage collection operates on a finite set of recent transactions: the epoch. Using the write set that the Stone maintains for each transaction, the Admin Gem examines every object created during the epoch. If an object is unreferenced by the end of the epoch, it is marked as garbage and added to the list of possible dead objects.

Epoch collection is efficient because:

Although epoch collection identifies a lot of dead objects, it cannot replace markForCollection because it will never detect objects created in one epoch and dereferenced in another.

By default, epoch garbage collection is disabled. You can enable it in either of two ways:

After your installation has been operating for a while, and you’ve had the chance to collect operational statistics, consider this: epochs of the wrong length can be notably inefficient. The section Determining the Epoch Length includes an in-depth discussion of the performance trade-offs of short or long epochs

Running Epoch Garbage Collection

When epoch garbage collection is enabled, it will run automatically according to the GcUser configuration parameters #epochGcTimeLimit and #epochGcTransLimit.

You can force an epoch garbage collection to begin using System class >> forceEpochGc. forceEpochGc will return false, and not start an epoch garbage collection, if any of the following are true:

  • Checkpoints are suspended.
  • Another garbage collection operation is in progress.
  • Unfinalized possible dead objects exist (that is, System voteState returns a non-zero value).
  • The system is in restore mode.
  • The Admin Gem is not running.
  • Epoch garbage collection is disabled (that is, STN_EPOCH_GC_ENABLED = FALSE).
  • The system is performing a reclaimAll.
  • A previous forceEpochGc operation was performed and the epoch has not yet started or completed.

Tuning Epoch

Epoch Configuration Parameters

There are a number of configuration parameters that control the performance of epoch garbage collection.

The number of threads used for Epoch GC is computed each time it run based on the larger of the Stone’s current setting of STN_NUM_GC_RECLAIM_SESSIONS and the setting for #epochGcMaxThreads.

The following parameters can be used to tune Epoch garbage collection:

#epochGcTimeLimit

The maximum frequency of epoch garbage collection (in seconds). This value should be at least 1800 (30 minutes), since the aging of objects faulted into Gem memory uses 5 minute aging for each of 10 subspaces of the POM generation. Default: 3600 (1 hour). Minimum: 5. Maximum: 2147483647.

#epochGcTransLimit

The minimum number of transactions required to trigger epoch garbage collection. Default: 5000. Minimum: 0. Maximum: 2147483647.

#epochGcPercentCpuActiveLimit

Limit active epoch threads when system percentCpuActive is above this limit. Default: 90. Minimum: 0. Maximum: 100.

#epochGcPageBufferSize

Size in pages of buffer used for epoch GC. Must be a power of 2. Default: 64. Minimum: 8. Maximum: 1024.

#epochGcMaxThreads

The MaxThreads used in computing the number of threads for the next epoch GC. Default: 1. Minimum: 1. Maximum: 32.

You can get the current descriptions from the image by executing

System adminGemConfigs

which returns a report of all supported Admin Gem configuration parameters. For details on modifying values, see here.

Determining the Epoch Length

Epoch garbage collection’s ability to identify unreferenced objects depends on the relationship between three variables:

  • The rate of production R of short-lived objects.
  • The lifetime L of these objects.
  • The epoch length E.

The only variable under your direct control is epoch length. Although you cannot specify it explicitly, the following configuration parameters jointly control the length of an epoch:

  • #epochGcTimeLimit
  • #epochGcTransLimit

Epoch garbage collection occurs when:

(the time since last epoch > epochGcTimeLimit ) AND
	(transactions since last epoch > epochGcTransLimit) 

The following discussion assumes that the epoch is determined by the minimum time interval (#epochGcTimeLimit) because other threshold is always met.

Figure 15.5 shows the effect of the epoch on the number of items marked. If L = E, for example, five minutes, every object’s lifetime spans epochs (top part of graph), and none are collected.

When the epoch is longer than an average object’s lifetime, however, some objects live and die within the same epoch, and can be marked. The lower part of Figure 15.5 shows an example where E = 3L and objects are created at a uniform rate. Objects created during the first two-thirds of the interval die before its end and are marked. Only those created during the final third survive to the next epoch.

The results shown in Figure 15.5 can be expressed as:

Objects Missed by EpochGC = R x L
Objects Recovered by EpochGC = R(E – L)

For example, assume R = 1000 objects per minute, L = 5 minutes, and E = 15 minutes. Then, for each epoch:

Objects Missed = 1000 x 5 = 5000
Objects Recovered = 1000 (15 – 5) = 10000

Figure 15.5   Effect of Collection Interval on Epoch Garbage Collection

Therefore:

  • Set #epochGcTimeLimit E > lifetime L of short-lived objects.

Figure 15.6 graphs the effect of the epoch. When E = L, epoch garbage collection is in effect disabled; all objects survive into the next epoch; the number of unmarked yet dead objects in the repository grows at the creation rate. These dead objects remain unidentified until you run markForCollection.

When the epoch is extended so that E = 3L, each epoch garbage collection marks those objects both created and dereferenced during that interval. This ratio causes the sawtooth pattern in the graph. If the creation rate is uniform, two-thirds of the dead objects are marked ((E-L)/E), and one-third are missed (L/E). Consequently, the repository grows at one-third the rate of the case E = L.

This configuration trades short bursts of epoch garbage collection activity for:

  • moderate growth in the repository, and
  • the need to run markForCollection often enough to mark dead objects that survive between epochs.
Figure 15.6   Repository Growth with Short Epoch

Suppose we extend the epoch to E = 12L. The result is shown in Figure 15.7, superimposed on part of the previous figure.

Figure 15.7   Effect of Longer Epoch on Repository Growth

Although the longer epoch allows many more dead objects to accumulate, the growth rate of the repository is substantially less—25% of the previous case.

This configuration trades a slower growth rate for:

  • a need for greater headroom on the disk, and
  • longer bursts of epoch garbage collection activity.

Certain cases have needed an epoch as long as several hours, or even a day.

Cache Statistics

Several cache statistics include information about the epoch garbage collection process. These are visible by using statmonitor data viewed in VSD (the visual statistics display tool). You may also access methods in System to get the values programmatically; see Programmatic Access to Cache Statistics for more information.

The following statistics may be useful in monitoring epoch:

EpochGcCount

The number of times that the epoch garbage collection process was run by the Admin Gem since the Admin Gem was started. For a system in steady state, look for uniform periods between runs or a uniform run rate.

EpochNewObjs

The number of new objects that were created during the last epoch.

EpochPossibleDeadObjs

The number of possible dead objects found by the last epoch garbage collection.

EpochScannedObjs

The number of objects scanned by the last epoch garbage collection.

15.4  Reclaim

The Reclaim Gem is responsible for reclaiming both dead and shadowed objects (see Shadow or Dead? for the difference between these types of garbage).

Shadowed objects are created naturally as your application modifies existing objects, so it is a good idea to always have the Reclaim Gem running to avoid shadowed objects accumulating. Some operations, such as migration, create a very large number of shadowed objects that need to be reclaimed.

After a mark/sweep operation — markForCollection or epoch — completes, there will be a number of dead objects that need to be reclaimed.

Although it is objects that are dead or shadowed, reclaim is done in pages. Pages that contain dead or shadowed objects may also contain some live objects; these live objects are copied to fresh pages, and the resulting page may then be reclaimed.

Reclaim is performed multi-threaded. Each thread within the Reclaim Gem is similar to a session, but runs within the Reclaim Gem process.

When the Reclaim Gem is running, its sessions examine pages marked reclaimable because they contain either dead or shadow objects, and reclaim fragments of space left by transactions that did not fill an entire page. This occurs in the background, with no specific action required.

Although it is recommended to allow the background processes to perform the reclaim, you can explicitly invoke it by executing SystemRepository reclaimAll.

Note that immediately after a mark/sweep, objects are not yet eligible for reclaim. All sessions must vote, and the Admin Gem must complete the write set union sweep, before objects can be reclaimed. The method Repository>>reclaimAllWait: timeoutSeconds allows the reclaim all to be started immediately and wait for these tasks to complete.

Reclaimed space does not appear as free space in the repository until other sessions have committed or aborted all transactions concurrent with the reclaim transaction, and the Stone has disposed the commit record. Make sure there is some extra space in your repository extents to hold the dead and shadowed objects that are in the process of being reclaimed, until that space becomes available again.

Tuning Reclaim

Reclaim Configuration Parameters

The following configuration parameters are available to control the reclaim task. You can get the current descriptions from the image by executing

System reclaimGemConfigs

which returns a report of all supported Reclaim Gem configuration parameters. For details on modifying values, see here.

#deadObjsReclaimedCommitThreshold

The maximum number of dead objects to reclaim in a single transaction, including dead objects reclaimed when reclaiming shadow pages. The default is 20000, the minimum is 32, and the maximum is 2147483647.

#deferReclaimCacheDirtyThreshold

If the primary shared page cache (the shared cache on the stone's machine) is more than this percentage dirty, then Reclaim Gems will wait until the cache is less than 5% below this threshold before resuming reclaims. The default is 75%, maximum is 100% (which disables this feature) and minimum is 10%

#maxTransactionDurationUs

The maximum length (in microseconds) of a GcGem transaction. The transaction will be committed once this time is exceeded. Must be ≥ 1; the default is 100000 (0.1 second), minimum 100 and the maximum is 20000000 (20 seconds).

#objsMovedPerCommitThreshold

The approximate maximum number of live objects to move in a reclaim transaction. Must be ≥ 100; the default is 20000, maximum is 2147483647.

#reclaimDeadEnabled

A Boolean indicating whether or not to reclaim dead objects; the default is true.

#reclaimMinFreeSpaceMb

Minimum repository free space which must be available in order for reclaims to proceed. Reclaims will be temporarily suspended if the repository free space drops below this threshold. The default value of 0 specifies a limit computed as the current size of the repository divided by 1000, with a minimum value of 5MB. Default and minimum 0, maximum 65536.

#reclaimMinPages

The minimum number of pages to process in a single reclaim operation (reclaiming does not start until this threshold is reached). Must be ≥ 1; the default is 30 pages, maximum is S2147483647.

#sleepTimeBetweenReclaimUs

The minimum amount of time in microseconds that the process will sleep between reclaims, even when work is scheduled. The default is 0 microseconds, maximum 300000000 (300 seconds).

#sleepTimeWithCrBacklogUs

Amount of time (in microseconds) to sleep after a commit when the commit record backlog is larger than 1.25 * the current setting for STN_CR_BACKLOG_THRESHOLD. For each 25 percent above the threshold the sleep time is increased so that the ReclaimGem does fewer commits the higher the number of commit records is about the threshold. Must been between 0 and 300000000 (300 seconds); the default is 0.

Reclaim Commit Frequency

A Reclaim Gem session will commit reclaim changes as soon as any one of the following conditions is met:

  • Number of live objects moved exceeds #objsMovedPerCommitThreshold.
  • Duration of the transaction exceeds #maxTransactionDuration.
  • Number of dead objects reclaimed exceeds #deadObjsReclaimedCommitThreshold.

Controlling the impact of reclaim

Reclaim, particularly with a larger number of sessions configured for the Reclaim Gem, can perform quickly but place a large load on your system. If you are likely to be doing reclaim during periods where users will also need to use the system, you may wish to slow down reclaim. This can be done in a number of ways:

  • Reduce the number of reclaim sessions using System class >> changeNumberOfReclaimGemThreads: with a argument of 1 or 0.
  • Set #sleepTimeBetweenReclaimUs to ensure that reclaim Gem sessions pause between reclaim operations.
  • Set #sleepTimeWithCrBacklogUs so that in case your system encounters a commit record backlog, the impact of reclaim is automatically reduced.

Speeding up reclaim

You can also setup your system to run reclaim with the maximum impact during off-hours. If you have a large amount of reclaim to perform, this allows the reclaim to finish more quickly. You can increase the number of Reclaim session to the maximum using:

System class >> startMaxReclaimGemSessions

This will start the number of sessions specified by STN_MAX_GC_RECLAIM_SESSIONS.

Avoiding disk space issues

Reclaim requires pages from the repository in order to copy non-dead objects. There are further steps that the stone must complete, before the space on the reclaimed pages is available again. So initially, reclaim will cause the amount of free space in the repository to drop.

Depending on overhead required by your system and the largest amount of reclaim that needs to be done at any time, you may want to configure a larger #reclaimMinFreeSpaceMb. This will ensure that reclaim pauses before your repository becomes dangerously low in free space.

Reclaim Logging

In addition to the above configuration parameters that control the reclaim behavior, there is an additional configuration parameter, #reclaimVerboseLogging.

By default, this is set to false or 0, and a summary of reclaim operations is printed every 15 minutes (or less frequently, if no reclaims take place) to the ReclaimGem log file. Values of true or > 0 cause additional updates to be printed to the ReclaimGem log file; these are normally only useful for debugging.

These are the definitions for the levels:

0 - only summary information every 5 minutes in slow, 15 minutes in a fast

1 - log sigAbort, gcGemAlert, gcHwPage, etc

2 - numDead from stone, numDead processed per thread

3 - summary: commitCount, stayInTrans, crPage, numLive, numDead, numReclPages

4 - abort, start phases

5 - conflict set

6 - num live and dead found per page

7 - summary of info added to ot slot

8 - deltas in rootPage

9 - push live or dead

Cache Statistics

Several cache statistics provide information about reclaim. These are visible by using statmonitor data viewed in VSD (the visual statistics display tool). You may also access methods in System to get the values programmatically; see Programmatic Access to Cache Statistics for more information.

The following Stone statistics may be useful in monitoring reclaim:

DeadNotReclaimedObjs

The number of objects known to be dead but not yet reclaimed.

DeadObjsReclaimedCount

The total number of dead objects reclaimed since the Stone repository monitor process was last started.

GcVoteState

Indicates the current phase of garbage collection: Gems voting, voting complete, Possible Dead Write-Set Union Sweep (PDWSUS) in progress, or PDWSUS complete.

PagesNeedReclaimSize

The amount of work waiting for the reclaim task.

PossibleDeadObjs

The number of objects marked as dereferenced but not yet declared to be dead.

ReclaimCount

The number of times the reclaim process has been run.

ReclaimedPagesCount

The number of scavenged pages.

15.5  Running Admin and Reclaim Gems

Admin Gem Privileges required: GarbageCollection

The initial configuration for the Admin and Reclaim Gems are provided in the system configuration file for the stone; by default, $GEMSTONE/data/system.conf. These settings determine what is started automatically when the stone starts up. During runtime, you can start and stop the Admin Gem and change the number of Reclaim sessions that are running.

Configuring Admin Gem

The Admin Gem is enabled or disabled by the setting for the STN_ADMIN_GC_SESSION_ENABLED configuration option. By default, this is enabled, and normally you should leave this enabled. You can stop and restart the Admin Gem at runtime as needed.

Configuring Reclaim Gem

The number of Reclaim sessions is set by the STN_NUM_GC_RECLAIM_SESSIONS configuration option. By default, this is one, and you should normally keep at least one Reclaim session running. Most systems will benefit from increasing the number of Reclaim sessions. In general, we recommend running one Reclaim session for between 5 and 10 extents. You may need to experiment to find the correct balance for your system. The number of Reclaim sessions can be changed at runtime as needed.

To ensure that Reclaim sessions do not impact the number of user sessions, a separate configuration setting, STN_MAX_GC_RECLAIM_SESSIONS, configures the maximum number of Reclaim sessions you will be running.

By default, this is set to the number of extents on your system. This parameter cannot be changed without restarting the stone. The upper limit for the number for the number of Reclaim sessions that can be run under any configuration is 255.

While the number of Reclaim sessions should normally be less than or equal to STN_MAX_GC_RECLAIM_SESSIONS, it is possible to start a larger number of Reclaim sessions. However, this will reduce the number of user sessions that can login to this Stone. If your system does not have excess unused user sessions, you should be careful to configure STN_MAX_GC_RECLAIM_SESSIONS high enough that you will never want to run a larger number of Reclaim sessions.

Starting GcGems

You can ensure all configured GcGems are running using:

System startAllGcGems
If the Admin Gem is not running, start it. If the Reclaim Gem is not running, start it with the configured number of Reclaim sessions. Return true if the Admin Gem and at least one Reclaim sessions are started.

or by executing both:

System startAdminGem
If the Admin Gem is not running, start it. Return true if the Admin Gem is running, false if the Admin Gem could not be started.

System startReclaimGem
If the Reclaim Gem is not running, start it with the configured number of Reclaim sessions. Return the number of Reclaim sessions that will be running. If the Reclaim Gem is already running, has no effect and returns the number of Reclaim sessions already running.

It may take a little time for the GcGems to complete login. The above methods do not block; they initiate the startup and return immediately. To wait a given period of time for the GcGems to start up:

System waitForAllGcGemsToStartForUpToSeconds: anInt
If the Admin Gem is not running, start it. If the Reclaim Gem is not running, start it with the configured number of Reclaim sessions. If all the GcGems have not started up within that time, return false. However, this does not necessarily mean that any GcGems have failed to start; on a slow system with a short timeout, this method may return false, even though all GcGems eventually start correctly.

To confirm that the GcGems are running,:

System hasMissingGcGems
Returns false if either the Admin Gem or the Reclaim Gem is not running.

To determine the number of Reclaim sessions that are currently running:

System reclaimGcSessionCount
Returns the total number of Reclaim sessions that are running.

Stopping GcGems

To ensure that the Admin Gem and all Reclaim sessions are stopped:

System stopAllGcGems

or you may execute both:

System stopAdminGem.System stopReclaimGem.

Adjusting the number of Reclaim sessions

You can adjust the number of Reclaim sessions that are running during the course of operation of your application. When there is a large amount of reclaim and little other load on your system, running a large number of Reclaim sessions will allow the reclaim work to complete more quickly. During normal operation, reducing the number of Reclaim sessions avoids using too many system resources and impacting users.

To set the number of Reclaim sessions that are running:

System changeNumberOfReclaimGemThreads: targetReclaimThreadCount
Start the ReclaimGem, if it is not running, with targetReclaimThreadCount Reclaim threads.

targetReclaimThreadCount should be a number less than or equal to the value for STN_MAX_GC_RECLAIM_SESSIONS. Using a larger argument does not error, but may have consequences for user logins; see the discussion here.

Return the new target number of Reclaim threads; Reclaim threads will be started or stopped to reach this number. This method does not block, so it may take a little time before the correct number of Reclaim threads is actually running.

Using this method only changes the currently running number of Reclaim threads, but does not affect the configured number. After stopping the ReclaimGem, on restart the regular configured number of threads will be started.

To change the default number of Reclaim threads that will be started by default when the ReclaimGem starts up:

System configurationAt: #StnNumGcReclaimSessions 
	put: targetReclaimThreadCount

This does not effect the number of Reclaim threads that are currently running, if any. Changes to the runtime parameter do not persist if the Stone is restarted. For a permanent change, you should edit the configuration parameters in the configuration file used by the stone: STN_NUM_GC_RECLAIM_SESSIONS, and if necessary, STN_MAX_GC_RECLAIM_SESSIONS.

15.6  Further Tuning Garbage Collection

Multi-Threaded Scan

For large systems, it can take a considerable amount of time to scan the entire repository, as is required by a mark/sweep operation (or other operations such as listInstances). To allow these scans to complete faster, operations that scan the entire repository use multiple threads running in parallel. There is a trade-off between how fast the operation completes and how much of the system resources it uses. Obviously, the faster the scan completes, the less of anything else can be done on that system during that period.

Methods with “fast”

Most repository scan operations, including markForCollection, listInstances:, etc., have variants that use a larger number of threads based on the number of CPUs and the number of configured sessions. These method selectors start with ‘’fast”, e.g. fastMarkForCollection, and increase the maximum CPU load as well. fast* repository scan methods are not affected by setting the default number of threads.

Methods with threads: keyword

Most repository scan operations, including markForCollection, listInstances:, etc., have variants that accept, as an argument, the number of threads to use. These method selectors end in the keyword such as ‘’threads:” or WithMaxThreads:”, e.g. markForCollectionWithMaxThreads:. Methods that allow you to specify the number of threads are not affected by setting the default number of threads.

Large Repositories

On a large repository, you should use the fast variants, or a threads: variant using 5-10 or more threads to run markForCollection, make or restore backup.

Tuning a running Multi-Threaded Scan

While the initial number of threads is set before the scan operation starts, you can also update them while the scan is running. This enables you, for example, to reduce impact during working hours, while allowing more resources to be used during off hours

The impact can also been tuned while it is running:

  • MtThreadsLimit—The upper limit on the number of threads can be activated.
  • MtPercentCpuActiveLimit—The total CPU load level at which the scan starts to deactivate threads.

Since the scan is running, you need to update these variables from a second session, using the sessionId of the session that is running the scan.

One way to determine the session Id of the session that is running a scan operation is by checking the session holding the GcLock.

To access the upper limit on the number of threads:

System mtThreadsLimit: aSessionId

To update the upper limit on the number of threads:

System mtThreadsLimit: aSessionId setValue: anInt

To access the CPU load limit:

System mtPercentCpuActiveLimit: aSessionId

To update the CPU load limit:

System mtPercentCpuActiveLimit: aSessionId setValue: anInt

Both of these variables are used in tuning, but they have somewhat different uses. The primary way you will tune the impact on your system is by setting MtPercentCpuActiveLimit. The operations then controls its impact by activating or deactivating threads, up to a limit of MtThreadsLimit. The operation will proceed, using more or less resources at any particular time depending on what else is executing on your system. Note that the CPU load includes non-GemStone process running on this same machine, so if a machine is heavily used by non-GemStone processes, the operation may make little progress even if the GemStone repository itself is idle.

MtThreadsLimit acts as a ceiling on the impact as well. Since this limit is of more relevance within GemStone, on heavily loaded machines you may want to pay more attention to this limit to control the impact within the repository. This limit is also useful when you want to pause the scan. Setting the MtThreadsLimit to 0 means that the scan cannot perform work, but does not stop executing, it waits until a non-zero limit is set.

Cache Statistics

The following cache statistics are important for tuning multi-threaded scans. These are visible by using statmonitor data viewed in VSD (the visual statistics display tool); see VSD User’s Guide. You may also access methods in System to get the values programmatically; see Programmatic Access to Cache Statistics for more information.

MtThreadsLimit

The upper limit on the number of threads that can be running at any one time.

MtPercentCpuActiveLimit

The upper limit on percent of CPU that can be active before threads are deactivated.

percentCpuActive

The current percentage of CPU that is active.

MtActiveThreads

The current number of active threads

Memory Impact

Multi-threaded operations may require considerable C Heap memory. This memory requirement is not part of temporary object cache memory. You can configure your GEM_TEMPOBJ_CACHE_SIZE according to other application Gem requirements, or even configure the sessions performing repository scan operations with a very small temporary object cache size.

The amount of memory space that is needed depends primarily upon the current oopHighWater value, the number of threads, and the page buffer size. markForCollection uses a pageBufferSize of 128, epoch and writeSetUnionSweep use a size of 64, and it is an explicit argument to FDC.

The overhead associated with the oopHighWater value can be computed as:

(stnOopHighWater + 10M) / 2

The memory cost per thread is:

50K + (180K * pageBufSize)

For example, a system with an oopHighWater mark of 500M running eight threads with a page buffer size of 128 would require a minimum of about 440 MB of free memory.

Identifying Sessions Holding Up Voting

Voting is the 4th phase of garbage collection, described in Step 4. During this phase, each logged-in gem must vote on possibly dead objects. Sessions perform this vote on the next abort or commit that they execute, or on logout. If there are idle sessions that do not commit or abort, voting will not be able to complete.

You can determine the status of voting using VSD to examine cache statistics in your system. See the Gem statistic VoteOnDeadCount.

You may find these sessions using:

System class >> notVotedSessionsReport

Which returns the sessions that have not voted, including information such as UserId, process ID and host.

The results of executing method System class >> descriptionOfSession: includes the not voted status, as element 20. For details, see the comment in the image.

Repository scan operations, such as listInstances:, are not allowed while voting is in progress.

Tuning Write Set Union Sweep

The write set union sweep is the 5th phase of garbage collection, described in Step 5. It is performed by the Admin Gem.

The number of threads used for Write Set Union Sweep is computed each time it runs based on the larger of the Stone’s current setting of STN_NUM_GC_RECLAIM_SESSIONS and the setting for #sweepWsUnionMaxThreads.

The following AdminGem parameters apply:

#sweepWsUnionPercentCpuActiveLimit

Limit active wsUnion threads when system percentCpuActive is above this limit. Default: 90. Minimum: 0. Maximum: 100.

#sweepWsUnionPageBufferSize

Size (in pages) of buffer used for wsUnion sweep. Must be a power of 2. Default: 64. Minimum: 8. Maximum: 1024.

#sweepWsUnionMaxThreads

MaxThreads used in computing the number of threads for the next wsUnion sweep. Default: 1. Minimum: 1. Maximum: 32

You can get the current descriptions from the image by executing

System adminGemConfigs

which returns a report of all supported Admin Gem configuration parameters. For details on modifying values, see here.

Identifying Sessions Holding Up Page Reclaim

Reclaiming pages can proceed only up to those pages currently providing some session’s transaction snapshot view of the repository—that is, only up to the oldest commit record. When other sessions are logged in, reclaim stops at that point until all sessions using that commit record either commit or abort their transaction.

It can be helpful to identify which sessions are holding on to the oldest commit record.

You can determine the status of voting using VSD to examine cache statistics in your system. See the Stone statistics OldestCrSession and OldestCrSessionNotInTrans.

The method System class>>sessionsReferencingOldestCr returns an array of session IDs, which can be mapped to GemStone logins through various System class methods, including currentSessionsReport and descriptionOfSession:aSessionId. For example:

topaz 1> exec System sessionsReferencingOldestCr %
an Array
  #1 5
 
topaz 1> exec System currentSessionsReport %
2 SymbolUser symbolgem 27555
3 GcUser reclaimgcgem 27550
4 GcUser admingcgem 27552
5 DataCurator gem 27906 on localhost

The method currentSessionsReport prints out information in human-readable form; to descriptionOfSession: returns this information, and many other details, in an array. For details, see the comment in the image.

Finding References to Objects that prevent garbage collection

Objects that have references from a live object anywhere within the GemStone repository are “live”, and not garbage collected. If you believe you have de-referenced an object, but it does not get garbage collected, you can perform a find references operation to track down references that you were not aware of.

Note that repository wide scan may take considerable time and/or use noticeable system resources. See Multi-Threaded Scan for how these can be tuned.

Using GsBitmaps

These methods return instances of GsBitmap, which do not require temporary object memory and so can return unlimited number of objects. To analyze the results, for small result sets you can send GsBitmap >> asArray to convert the results to an Array. Otherwise, retrieve only a specified number of results using removeCount: (which removes elements from the bitmap) or peekCount: (which does not remove them).

Finding all references

To find references to a particular object, use one of the following messages:

SystemRepository allReferences: anObject
SystemRepository fastAllReferences: anObject

These methods return a GsBitmap of all persistent objects in the repository that reference anObject. These methods performs the search using the Multi-Threaded Scan. allReferences: uses moderate resources; fastAllReferences: uses as much of the system resouces as it can, to complete more quickly.

You can find all references to all instances of a particular class using:

SystemRepository allReferencesToInstancesOfClasses: aClass

These methods all accept one argument, or an array of arguments. If an Array is passed in, the result will be an Array of Arrays mapping argument element to GsBitmaps; the order may not be preserved. See the method comments in the image for more details.

Full reference path

To find a complete reference path to a particular object, you can use methods on GsSingleRefPathFinder. This allows you to determine the complete reference path from a root object to the argument.

NOTE
This method runs in transaction, and may take a considerable time to run. Avoid using it in production systems.

To perform the scan, you create an instance of the GsSingleRefPathFinder for the object or objects, run the scan, and collect/view the results. This must be done on the same node that the Stone is running on.

Steps to find a reference path:

Step 1. Create an instance with default settings:

inst := GsSingleRefPathFinder newForSearchObjects: 
	(Array with: mySearchObject).

Step 2. Run the scan

inst runScan.

Step 3. Build the result

resultObjs := inst buildResultObjects.

buildResultObjects returns a collection of instance of GsSingleRefPathResult, which is a subclass of Array. Each element in the GsSingleRefPathResult represents an element in the reference path; it also has instance variables for the searchOop and status.

Step 4. 4) For display, collect the results as strings:

resultObjs collect: [:e | e resultString].

Steps 2-4 can be done using the GsSingleRefPathFinder method scanAndReport. For example:

(GsSingleRefPathFinder newForSearchObjects: { mySearchObject })
	scanAndReport

These methods locate the first path from a root object to the argument object that is found, but there may be multiple paths.

Note that you cannot find references to a Class or Metaclass using these methods.

Once you have found the references to the unwanted object, set those references to nil. This allows the object to be removed during normal garbage collection.

Example 15.1 Finding reference path

| inst |
inst := GsSingleRefPathFinder newForSearchObjects: 
	{ PlusInfinity }.
inst runScan.
(inst buildResultObjects) 
   collect: [:e | e resultString]
%
 anArray( 'Reference path for search oop 21393665 (Float)
	1   207361 (SymbolDictionary)
	2   1126401 (IdentityCollisionBucket)
	3   1790721 (SymbolAssociation)
	4   21393665 (Float)
')

In this example, PlusInfinity is a Float (#4). It is referred to in a SymbolDictionary (#1), using internal implementation objects (an IdentityCollisionBucket and a SymbolAssociation) that actually have the references.

 

GsSingleRefPathFinder provides a number of instance variables to parameterize the search:

  • maxThreads
  • lockWaitTime
  • pageBufferSize
  • percentCpuLimit
  • maxLimitSetDescendantObjs
  • maxLimitSetDescendantLevels
  • printToLog

Defaults are provided, or you may send messages, for example to perform a less aggressive scan.

Finding large objects that are using excessive space

Repositories usually contain some large objects, such as collections of business objects, but there may also be inadvertent large objects, such as collections that were intended to be temporary or log strings. If you have large objects that are no longer needed, you can free space by explicitly removing these references.

Identify Larger Objects in the Repository

The following methods returns all objects that are over a specified size:

Repository >> allObjectsLargerThan: aSize
Repository >> fastAllObjectsLargerThan: aSize 

These methods return a GsBitmap of all objects in the repository larger than aSize; objects for which you do not have read authorization are not included.

This method performs the repository wide search using the Multi-Threaded Scan. allObjectsLargerThan: uses moderate resources; fastAllObjectsLargerThan: uses as much of the system resouces as it can, to complete more quickly.

Finding named objects that are large

Named objects are Global variables; objects that have a reference by a Symbol name in some user’s SymbolDictionary. While there are some legitimate uses of Globals for environment-wide information, generally using global variables (other than for classes) is not good software engineering practice.

A common pattern is to use a global to keep some objects persistent, such as an expression:

UserGlobals at: #tempCollection put: IdentityBag new

If this collection and its contents is not deleted, it may continue to use space and may not be easily noticed. Using SessionTemps is preferred to avoid this problem.

The following expression causes GemStone to look through the symbol list for each user in AllUsers and gather information on any named objects larger than the SmallInteger aSize. Since it is looking for named references, it does not need to do a repository scan.

topaz 1> printit
AllUsers findObjectsLargerThan: aSize limit: aSmallInt
%

This method locates large collections or strings referenced by name; it will not locate collections stored within the class variables of classes, or in instances of classes. It returns an Array of up to aSmallInt elements, each of the form { aUserId . aKey . anObject }, where anObject is an object larger than aSize defined in the symbol list of aUserId, and aKey is the Symbol associated with that object.

 

Previous chapter

Next chapter