5. Monitoring GemStone

Previous chapter

Next chapter

A properly configured GemStone repository will run normally with little attention. It is still important to monitor the repository, to catch unexpected problems before they become serious. If you have unexpected problems you will need to examine logs, monitor you system, and perform other analysis. The relevant logs and tools are described in this chapter.

GemStone System Logs
details what logs are created by GemStone/S 64 Bit processes create, and where they are located.

Repository Page and Object Audit
provides instructions on how to perform a page audit and and object audit of the repository.

Profiling Repository Contents
describes how to analyze the repository contents

Monitoring Performance
describes how to monitor the performance of the GemStone server and its clients using GemStone Smalltalk methods.

If you decide to keep a GemStone session running for occasional use, be careful not to leave it in an active transaction. A prolonged transaction can cause an excessive commit record backlog and undesirable repository growth, until you either commit or abort.

NOTE
Monitoring on active repositories should be done in manual transaction mode or transactionless mode. See Disk Space and Commit Record Backlogs.

5.1  GemStone System Logs

In addition to transaction logs, GemStone creates three types of log files:

If a GemStone server is running, you can use the gslist utility to locate its logs. Use gslist -x to display the location of the current log file for Stones, NetLDIs, logsenders, logreceivers, and the shared page cache monitors.

The logs for the AIO page servers, free frame page servers, SymbolGem, Page Manager, and Admin and Reclaim Gems are in the same location as the corresponding Stone’s log.

WARNING
The Stone writes several files to the /opt/gemstone/locks or equivalent directory (as discussed here). These lock files are usually regenerated if deleted, but we recommend not removing them manually.Use gslist -c to clear out unnecessary lock files.

See gslist for more details on this command.

GemStone Server Logs

The Stone repository monitor and its child processes each create a log file in a single location. By default, the files are in $GEMSTONE/data and have a name beginning with the Stone name. Table 5.1 shows typical log names for a Stone with the default name of gs64stone. Log names for child processes also include the process id and a descriptive suffix.

Table 5.1 Representative Log Names for GemStone Server Processes

gs64stone.log

Stone repository monitor

gs64stone_14033admingcgem.log

Admin Gem

gs64stone_2963pcmon.log

Shared page cache monitor

gs64stone_2967pgsvrff.log

Free frame page server

gs64stone_2984pgsvraio.log

AIO page server

gs64stone_2987pagemanager.log

Page Manager (Stone thread)

gs64stone_2992reclaimgcgem.log

Reclaim Gem

gs64stone_2994symbolgem.log

Symbol Gem

Several factors can alter the name and location of these logs. The precedence is

1. A path and filename supplied by startstone -l logFile. logfile may be a filename, or a relative or absolute path and filename, to which the account starting the Stone has write permission. If logFile is a filename without a path, logFile is created in the current directory. Logs for the child processes in Table 5.1 are placed in the same directory.

2. A path and filename specified by the GEMSTONE_LOG environment variable. As with startstone -l, this may be set to a filename or to a relative or absolute path and filename. Child process log files are created in the same location.

3. $GEMSTONE/data/gemStoneName.log.

Log file deletion on shutdown

When a GemStone server shuts down, some server process log files are deleted. Other files are retained in case they are needed later for tuning or problem diagnosis. Table 5.2 details the specific behavior for each process’s logs, and how to change that behavior in cases where the behavior is configurable.

Log files are never deleted if the process exits abnormally. In such cases, the log file may be requested when you contact GemStone Technical Support.

Since some logs are not deleted, if you restart GemStone often, you may need to manually remove log files periodically.

Table 5.2 Log file handling by process type 

Stone

The same log file is appended to on restart.

Log file is never deleted.

Shared Page Cache Monitor

A new log file is created on restart, including the process PID.
Log file is never deleted on exit.

Page Manager

(Not an independent process)
A new log file is created when the stone is restarted, including the Stone’s PID. Log file is never deleted on exit.

Symbol Gem

A new log file is created on restart, including the process PID.
Log file is never deleted on exit.

Free frame page servers, AIO page servers

A new log file is created on restart, including the process PID.
Log file is deleted by default on normal exit. To prevent this log from being deleted on normal exit, edit
$GEMSTONE/sys/runpgsvrmain to look like this:

# unset $GEMSTONE_KEEP_LOG.

This will affect all page servers, free frame, AIO page servers, and page servers for remote gems.

Admin Gem

A new log file is created on restart, including the process PID.
Log file is deleted by default on normal exit. To prevent this log from being deleted on normal exit, edit
$GEMSTONE/sys/runadmingcgem to look like this:

# unset $GEMSTONE_KEEP_LOG.

Reclaim Gem

A new log file is created on restart, including the process PID.
Log file is deleted by default on normal exit. To prevent this log from being deleted on normal exit, edit
$GEMSTONE/sys/runreclaimgcgem to look like this:

# unset $GEMSTONE_KEEP_LOG.

Stone Log

The log for the Stone repository monitor is always appended to, and is therefore cumulative across runs by default. This log is the first one you should check when a GemStone system problem is suspected. In addition to possible warnings and error messages, the log records the following useful information:

  • The GemStone version.
  • The configuration files that were read at startup and, if the DUMP_OPTIONS configuration option is set to True, the resulting Stone configuration.
  • Each startup and shutdown of the Stone, the reason for the shutdown, and whether recovery from transaction logs was necessary at startup.
  • Each expansion of a repository extent and its current size.
  • Each opening of a new transaction log.
  • Each startup and shutdown of each GcGem session, and the corresponding processId.
  • Each #abortErrLostOtRoot sent to a Gem.
  • Each suspension and resumption of logins.
  • Certain changes to the login security system.
  • Each time a backup is started and when the backup is completed.

Admin Gem Logs

Each time the Stone repository monitor starts an administrative garbage collection session (Admin Gem) process, a new log is created in the same location as the Stone’s log. The log name is formed using the pattern:

stoneName_PIDadmingcgem.log

where stoneName is the name of the Stone, and PID is the process Id of the Admin Gem process.

This log shows the startup value of the Admin Gem parameters that are stored in GcUser’s UserGlobals, and any changes to them, and records other Admin Gem functions.

Reclaim Gem Log

Each time the Stone repository monitor starts a reclaim garbage collection session (Reclaim Gem) process, a new log is created in the same location as the Stone’s log. The log name is formed using the pattern:

stoneName_PIDreclaimgcgem.log

where stoneName is the name of the stone, and PID is the process Id of the Reclaim Gem process.

This log shows the startup value of the Reclaim Gem parameters that are stored in GcUser’s UserGlobals, and any changes to them, and records other Reclaim Gem functions.

Shared Page Cache Monitor Log

The log for the shared page cache monitor on the Stone’s machine is located in the same directory as the Stone’s log. This log file has a name of the form

stoneName_PIDpcmon.log

Check this log if other messages refer to a shared page cache failure.

When a session logs in from another node, a log is created for the shared page cache monitor on the remote node. This log is located by default in the home directory of the account that started the Stone, but this location can be modified by environment variable settings. The default name is of the form

startshrpcmonPIDNode.log

where PID is the process Id of the monitor process, and Node is the name of the remote node.

Among the items included in the log for the shared page cache monitor are:

  • Its configuration (for remote nodes, this may be different from the configuration on the Stone’s node).
  • The number of processes that can attach (which can limit the number of logins).
  • The UNIX identifiers for the memory region and the semaphore array (these identifiers are helpful in the event you must remove them manually using the ipcrm command).

Free Frame Page Server Log

One or more free frame page servers started up on repository startup. Each one has an individual log file, located in the same directory as the log for the shared page cache monitor. These log files have names of the form

stoneName_PIDpgsvrff.log

where stoneName is the name of the stone, and PID is the process Id of the free frame page server process. These logs ordinarily are not of interest unless they contain an error message.

AIO Page Server Log

One or more AIO page servers are started up on repository startup. Each one has an individual log file, located in the same directory as the log for the shared page cache monitor. These log files have names of the form

stoneName_PIDpgsvraio.log

where stoneName is the name of the stone, and PID is the process Id of the AIO page server process. These logs ordinarily are not of interest unless they contain an error message.

Page Manager Log

The Page Manager is a thread in the Stone, and is not a separate process, but it writes to a separate log for ease of maintenance. The Page Manager log is located in the same directory as the log for the shared page cache monitor. This log file has a name of the form

stoneName_PIDpagemanager.log

where stoneName is the name of the stone, and PID is the process Id of the Stone process. Ordinarily these logs are not of interest unless errors occur or tuning is required.

Symbol Gem Log

The Symbol Gem log is located in the same directory as the Stone’s log. This log file has a name of the form

stoneName_PIDsymbolgem.log

where stoneName is the name of the stone, and PID is the process Id of the Symbol Gem process. Ordinarily these logs are not of interest unless an error has occurred.

Logs Related to Gem Sessions

Except for linked session that are running on the same node as the Stone, login depends on NetLDI services to spawn one or more supporting processes. In each case, the NetLDI creates a log file that includes in its name the identity of the node on which the process is running.

Linked logins do not have separate log files. Log file output is sent to stdout of the linked process.

All RPC logins spawn a Gem session process.

For RPC logins where the Gem is not on the same node as the Stone, or for linked logins that are not on the same node as the Stone, the following additional processes are also spawned:

  • A page server (for the session) to access a repository extent on the server node.
  • A page server (for the Stone) to start or access a shared page cache on the client’s node.
  • A shared page cache monitor (for the Stone) to manage the cache on the client’s node.

By default, the log files for these processes are located in the home directory of the account that owns the corresponding process. For the Gem session process and the page server on the server node, that account ordinarily is the application user. For the shared page cache monitor and page server on the client node, that account is the one that invoked startstone.

You can change the default location by setting #dir or #log in the GEMSTONE_NRS_ALL environment variable for the NetLDI itself or for individual clients (see To Set a Default NRS). Alternatively, when you log in to GemStone, you can specify a different network resource string (NRS) in your login parameters.

Table 5.3 shows typical log names for session-related processes, given a Stone and repository on node1 with a login from a Gem session process on node2.

Table 5.3 Typical Logs Supporting Gem Sessions

Typical Name

GemStone Process

gemnetobject27853node2.log

Gem session process on node2 (serves an RPC session)

pgsvrmain27819node2.log

Page server on node2 that the repository monitor uses to create and access its shared page cache on node2

startshrpcmon27820node2.log

Shared page cache monitor on node2

pgsvrmain12397node1.log

Page server on node1 that the Gem session process uses to access the repository extents on node1

If a process shuts down normally, the log file may be automatically removed. (See Table 5.2 for specific behavior by process type.) Log files are never deleted if the process shuts down abnormally. This way, the log files that remain may provide helpful diagnostic information.

If you want to retain the log even when a Gem session process exits normally, edit the scripts according to the instructions in Table 5.2. If the NetLDI on the client node has a separate $GEMSTONE directory, edit the appropriate scripts in the client’s installation directory.

NetLDI Logs

Each NetLDI creates a log file (netLdiName.log) in /opt/gemstone/log (or an equivalent, as described here) on the node on which it runs. This location and name can be overridden by including the option -llogname when starting the NetLDI. Each NetLDI you start with the same name appends to one log, so it’s a good idea to remove outdated messages occasionally.

By default, the NetLDI log contains only configuration information and error messages. The configuration information reflects the environment at the time the NetLDI was started and the effect of any authentication switches specified as part of the startnetldi command.

In some cases it is helpful to log additional information by starting the NetLDI in debug mode (startnetldi -d). The debug log records each exchange between the NetLDI and a client. Because the log becomes much larger, you probably won’t want to use this mode routinely.

Logsender and logreceiver logs

The logsender and logreceiver processes are started only if you are setting up a hot standby system.

Each logsender and logreceiver creates a log file in /opt/gemstone/log on the node on which it runs. The log file’s name, by default, is logsender_listeningPort.log or logreceiver_listeningPort.log. This location and name can be overridden by including the option -llogname when starting the logsender or logreceiver. Each logsender and logreceiver you start with the same log file name and path—explicitly specified or the default—appends to one log, so you should periodically remove outdated messages.

Localizing timestamps in log files

The timestamps printed in the log headers and in log messages are formatted according to the current system locale. You can override this using the GS_CFTIME environment variable. If this is set in the environment for the process, then the setting is used to control printing in log headers and log messages.

The setting for GS_CFTIME must be a valid strftime format string, and must contain fields for:

  • Month: %m or %b or %B or %h
  • Day: %d
  • Hour: %H, or %I and %p, or %I and %P
  • Minutes: %M
  • Seconds: %S

If the criteria are not met, the default date format based on the system’s LOCALE is used, or otherwise the US-centric date format.

Programmatically adding messages to logs

You can write a message to the stone log using

System addAllToStoneLog: aString

To write a message to the gem log for the current session, use

GsFile gciLogServer: aString

The following writes a log message to the GCI client, such as the topaz console:

GsFile gciLogClient: aString

5.2  Repository Page and Object Audit

This section describes two levels of checks that you can perform on the repository.

Page Audit

Page audits allow you to diagnose problems in the system repository by checking for consistency at the page level.

The pageaudit utility can be run only on a repository that is not in use.

Pageaudit scans the rootpages in a repository, pages used in the bitmap structures referenced by the rootpage, and all other pages (including data pages) to confirm page-level consistency. It does not check that the data on data pages is valid. For that, you need to run object audit; see Object Audit and Repair.

To check for page-level problems, run pageaudit on the repository defined in your ordinary GemStone configuration by issuing this command at the operating system level:

% pageaudit [ gemStoneName ] [ -e exeConfig ] [ -z systemConfig ] [ -f ] [ -d ] [ -l logfile ] [ -h ]

where:

  • gemStoneName is the name of the GemStone repository monitor.
  • systemConfig is the system configuration file.
  • exeConfig is the executable configuration file.
  • logfile is the name of the output file.

All of these arguments are optional in a standard GemStone configuration. If these options are not supplied, pageaudit uses gs64stone-audit for gemStoneName and writes output to stdout.

You can get details of the available options for pageaudit by executing pageaudit -h, which returns usage. For more information, see pageaudit.

In addition to the audit results, pageaudit prints status updates as it is running, and the output includes repository statistics to the screen. For example:

	GemStone is starting a page audit of the Repository.
	Finished auditing reserved pages in extent   0.
	Finished pages past end
	Begin auditing current checkpoint.
	Finished auditing checkpoint bitmaps.
	Finished auditing scavengable pages.
	Start auditing object table pages.
	Finished auditing object table pages.
	Finished auditing commit records.
	Finished auditing alloc pages shadowed.
	Begin auditing data pages.  Count=970
	Finished auditing data pages
PAGE AUDIT STATISTICS paris sun4u (Solaris 2.10 Generic_141444-09)
- 03/12/2014 09:57:26.234 PDT
16384 bytes = 1 GemStone Page
1048576 bytes = 1 Mbytes
Repository Size                          53 Mbytes
Data Pages                                6 Mbytes
Meta Information Pages                    1 Mbytes
Shadow Pages                              0 Mbytes
Free Space in Repository                 44 Mbytes
**** Number of differences found in page allocation = 0.
   Page Audit of Repository completed successfully.

The report contains the following statistics:

Repository Size
The total physical size of the repository; this is the same size that the operating system reports for an extent file.

Data Pages
This includes all pages referenced from the object table.

Meta Information Pages
Pages that contain only internal information about the repository, such as the object table.

Shadow Pages
Pages scheduled for scavenging by the reclaim task.

Free Space in Repository
Free space in the repository is computed as the number of free pages times the size of a page (16 KB). That value reflects the number of pages available for allocation to Gem session processes. It excludes space fragments on partially filled data pages.

If the page audit finds problems, the message to the screen ends with a message like this:

-------------- PAGE AUDIT RESULTS --------------
**** NumberOfFreePages = 980 does not agree with audit
	results = 988
 
**** Problems were found in Page Audit.
**** Refer to recovery procedures in System Administrator's Guide.

If there are problems in the page audit, you will need to restore the repository file from backups. (See the section How to Restore from Backup.)

Object Audit and Repair

Privileges required: SystemControl.

Object audits check the consistency of the repository at the object level. Starting with Object Table, each object is located and validated.

Object audit is performed using multiple threads (lightweight sessions), and can be configured to perform as quickly as possible using a large amount of system resources, or configured to use fewer resources and take longer to run.

Object audit should be run from linked Topaz, and on the same machine as the Stone.

Repository >> objectAudit
objectAudit
is the normal way to perform the audit. You may have other sessions logged in and running simultaneously, but the audit will impact performance. This audit uses two threads and up to 90% of the CPU.

Repository >> fastObjectAudit
fastObjectAudit
is like objectAudit, but is configured to use most or all system resources to complete as quickly as possible. This is useful when running an audit on offline systems.

Repository >> objectAuditWithMaxThreads: maxThreads
percentCpuActiveLimit: aPercent
This method allows you to specify the exact performance/impact parameters for the audit, if neither objectAudit nor fastObjectAudit is satisfactory for your requirements.

Performing the Object Audit

To perform an object audit:

Step 1. Log in to GemStone using linked Topaz (topaz -l).

Step 2. Send one of the audit messages to the repository. For example:

topaz 1> printit
SystemRepository objectAudit
%

The audit involves a number of checks and specific error messages. Checks include:

  • Object corruption — The object header should contain valid (legal) information about the object’s tag size, body size (number of instance variables), and physical size (bytes or OOPs).
  • Object reference consistency — No object should contain a reference to a nonexistent object, including references to a nonexistent class.
  • Identifier consistency — OOPs within the range in use (that is, up to the high-water mark) should be in either the Object Table or the list of free OOPs, and OOPs for objects existing in data pages should be in the Object Table.

If the repository is consistent and no errors are found, the audit will complete with the line:

Object Audit: Audit successfully completed; no errors were detected.

Otherwise, the reasons for failure with the specific problems found are reported to standard output

Error Recovery

If an object audit reports errors, these issues should be addressed. You may want to contact GemStone Technical Support for advice.

The following are general approaches to errors from object audit.

Collect and reclaim garbage and retry

If errors are reported during the object audit, you may wish to perform a markForCollection and reclaimAll and repeat the object audit. This may clear up problems if the object (s) that is (are) corrupt are not referenced from any live objects. Whether this is useful will depend on the particular errors reported.

Restore from backup

The safest approach when you find object audit errors is to restore from backup. GemStone recommends that you make regular backups, run in full transaction logging mode, and archive transaction logs as needed to recover. This would allow you to recover at any time from unexpected problems such as repository corruption.

If you do not have the set of backups and transaction logs that would allow you to restore from a backup and recover later transactions, or if you are in partial transaction logging mode, you can still make and restore a backup. Backups made using fullBackupTo:, when restored, rebuild the internal data structures. Depending on the specific problems found in audit, this may clear up the problem.

Manual repair of invalid object references

Invalid object references can be repaired manually, if you know what the missing data should be, or if the referenced data is not important.

Use the Topaz object identity specification format @identifier to substitute nil or an appropriate reference for an invalid reference.

For example, given an instance of Array with the OOP 51369729, if the element at slot 3 is an object that does not exist, it can be repaired by setting the reference to nil using the following expression:

topaz 1> send @51369729 at: 3 put: nil

Repository repair

You can have GemStone attempt appropriate repairs during the re-scan by invoking Repository>>repair. The following repairs illustrate their nature:

  • nil is substituted for an invalid object reference.
  • Class String is substituted for an invalid class of a byte object, class Array for a pointer object, or class IdentitySet for a nonsequenceable collection object.
  • Oops in the Object Table for which the referenced object does not exist are inserted into the list of free Oops. Oops for which an object exists but which are also in the list of free Oops are removed from the free list.

The repair audits the repository, keeping track of errors. After the initial audit completes, each error found is repaired. A descriptive message is displayed for each repair. The repair will commit periodically to avoid memory issues, and log off when it is complete. For example:

Example 5.1 Repository Repair
topaz 1> run
SystemRepository repair
%
Object [20897537] references class [27554561] which does not exist
Object [27551745] references class [27554561] which does not exist
In object [27553281] of class Widget [26374657], the logical size
42 is invalid
In object [27554049] of class Widget [26374657], the object format
1 disagrees with the class format 0
Object [27554817] references class [27554561] which does not exist
 
Object Audit: 5 errors were found
Repairing error: BadClassId - Object [20897537] references class
[27554561] which does not exist
  Changing class to String [74753]
Repairing error: BadClassId - Object [27551745] references class
[27554561] which does not exist
  Changing class to IdentitySet [73985]
Repairing error: BadLogicalSize - In object [27553281] of class
Widget [26374657], the logical size 42 is invalid
  resetting logialSize to 8
Repairing error: BadFormat - In object [27554049] of class Widget
[26374657], the object format 1 disagrees with the class format 0
  Changing class to String [74753]
Repairing error: BadClassId - Object [27554817] references class
[27554561] which does not exist
  Changing class to Array [66817]
 
[Info]: Logging out at 03/12/2014 09:57:26.234 PDT
ERROR 4061 , The session is terminating abnormally, completed the
repair of 5 objects, forcing logout.
 

5.3  Profiling Repository Contents

Some questions — such as “what is using up all the space in my Repository?”— can only be answered by examining the types and numbers of objects in your repository. To find out this information, you can use methods on GsObjectInventory.

The methods in GsObjectInventory count all instances of all classes in the repository — or in any collection, or in a hidden set, or in a file of disconnected possible garbage objects — and report the results, ordered by the number of instances or by space consumed.

GsObjectInventory performs a multi-threaded scan of the repository, and thus should only be run in session on the same machine as the Stone. To tune the impact of the scan, additional protocol allows you to perform fast scans or to specify the impact levels. For details, see methods in the image.

The scans require the GcLock, and so cannot be run while any garbage collection operation is running, nor can garbage collection operations be started while a GsObjectInventory scan is going on.

The following code will report the number of instances and the space required for all Classes whose total space requirements are more than 10000 bytes.

Example 5.2 Object Inventory
topaz 1> run
GsObjectInventory profileRepository byteCountReportDownTo: 1000
%
   *** GsObjectInventory byteCountReport printed at: 25/02/2014 20:17:27 ***    
Hidden classes are included in this report.
______________________________________________________________
Class                                 Instances          Bytes
______________________________________________________________
String                                    22497        8126360
GsNMethod                                 15289        3005728
Array                                     18921        2945904
GsMethodDictionary                         2570        1292696
Symbol                                    15146         658624
LargeObjectNode                              30         456512
CanonStringBucket                          2016         269984
Class                                      1254         195776
IdentityKeyValueDictionary                 1271         172880
SymbolAssociation                          3923         157416
ExecBlock                                  2312         148128
SymbolDictionary                            766         141008
IdentityCollisionBucket                    1336         131824
SymbolSet                                  3502         103808
GsClassDocumentation                        464          48272
DepListBucket                               751          42064
DateTime                                    634          35528
ClassHistory                                625          30176
GsDocText                                   714          28624
LargeInteger                                  9          22976
TimeZoneTransition                          313          17528
CanonSymbolDict                               1          16176
WordArray                                    13          13920
EqualityCollisionBucket                     128          13248
 

The same profiling with an instance count report is much shorter, since the number of instances, rather than the bytes of space used, limits the results.

topaz 1> run
GsObjectInventory profileRepository instanceCountReportDownTo: 10000
%
 *** GsObjectInventory instanceCountReport printed at: 25/02/2014 20:17:27 ***  
Hidden classes are included in this report.
______________________________________________________________
Class                                 Instances          Bytes
______________________________________________________________
String                                    22497        8126360
Array                                     18921        2945904
GsNMethod                                 15289        3005728
Symbol                                    15147         658680
______________________________________________________________

These reports include instances of hidden classes - these are classes that are used to implement internal GemStone objects, which are invisible to the image. One such class is LargeObjectNode. Instances of LargeObjectNodes are used to implement the tree structures that underlie large collections. To avoid seeing hidden classes - which will include the space used by the hidden class within the root, public object, profile using the method profileRepositoryAndSkipHiddenClasses rather than profileRepository.

For more on GsObjectInventory, see the methods in the image.

5.4  Monitoring Performance

As part of your ongoing responsibilities, you may find it useful to monitor performance of the object server or individual session processes.

GemStone includes graphical tools to allow you to record statistics in file and analyze this data graphically. You can also programmatically access these statistics.

A full list of the statistics that are recorded and are available programmatically can be found in the VSD User’s Guide.

Statmonitor and VSD

GemStone includes the statmonitor utility, which records statistics about GemStone processes to a disk file. You can configure the statistics recorded, how frequently the statistics are collected, and other details. See statmonitor for more information.

Both GemStone-specific and operating system statistics are collected. The operating system statistics include general host information as well as information specific to the individual GemStone processes.

We recommend running statmonitor at all times, as it provides a valuable record of many aspects of system behavior.

To view this data, VSD (Visual Statistics Display) graphically displays the statistics. For more details on using VSD, see the VSD User’s Guide.

Programmatic Access to Cache Statistics

A set of methods on the System class provide a way for you to analyze performance by programmatically examining the statistics that are collected in the shared page cache. This is the same data that is visible using statmonitor and VSD, although statmonitor and VSD can collect additional OS level information. This additional OS level information is also available programmatically; see Host Statistics

A process can only access statistics that are kept in the shared page cache to which it is attached. Sessions that are running on a different node than the Stone use a separate shared cache on that remote node. This means that processes that are on a different node than the Stone, cannot access statistics for the Stone or for other server processes that are attached to the Stone's shared page cache.

Within the shared page cache, GemStone statistics are stored as an array of process slots, each of which corresponds to a specific process. Process slot 0 is the shared page cache monitor. On the Stone’s shared page cache, process slot 1 is the Stone; on remote caches, slot 1 is the page server for the Stone that started the cache. Subsequent process slots are the page servers, Page Manager, Admin and Reclaim Gems, Symbol Gem, and user Gems. The order of these slots depends on the order in which the processes are started up, and is different on remote caches.

The specific set of statistics is different for each type of process that can attach to the shared page cache. The types of processes are numbered:

1 = Shared page cache monitor
2 = Stone
4 = Page server
8 = Gem (including Topaz, GBS, and other GCI applications).

Statistics by name

To obtain the value for a specific statistics for the Stone, the Stone’s SPC monitor, or for the current session, use the following methods:

System class >> stoneCacheStatisticWithName:
System class >> primaryCacheMonitorCacheStatisticWithName:
System class >> myCacheStatisticWithName:

These methods will return the statistics value corresponding to the given name for that process. If the statistics name is not found, it returns nil.

For example, to retrieve the statistics named ‘CommitRecordCount’ for the Stone:

topaz 1> printit
System stoneCacheStatisticWithName: 'CommitRecordCount'.
%
23

To retrieve the current session’s PageReads:

topaz 1> printit
System myCacheStatisticWithName: 'PageReads'.
%
548
All statistics for a process

The general way to retrieve statistics is as an array of values. To understand what the value at each index refers to, there are corresponding description methods to return an array of Strings. Matching the index of the statistic name to the index within the values locates the value for that statistic.

Since the statistics are different for the different types of processes, you will need to use corresponding methods to collect the statistics and the descriptions.

For the Stone, the Gem that is running the code, and the Stone’s shared page cache monitor, no further information is needed to identify them within the cache, so the following pairs of methods can be used:

System cacheStatisticsDescriptionForGem.
System myCacheStatistics.
 
System cacheStatisticsDescriptionForStone.
System stoneCacheStatistics.
 
System cacheStatisticsDescriptionForMonitor.
System sharedPageCacheMonitorCacheStatistics.

For example, while you would normally use stoneCacheStatisticForName:, here is another possible way to get the CommitRecordCount:

topaz 1> printit
| index |
index := System cacheStatisticsDescriptionForStone 
		indexOf: 'CommitRecordCount'.
System stoneCacheStatistics at: index.
%
23

To collect statistics for other Gems, and for page servers, you need to determine the process Id, session Id, or slot of the specific Gem or page server, or the cache name of the Gem. There are a variety of ways you might determine this, but one way is to examine the results of:

System cacheStatisticsForAllSlotsShort

This method returns the name, process Id, session Id, statistics type, and process slot for each process currently attached to the cache. For example:

topaz 1> printit
(System cacheStatisticsForAllSlotsShort) collect: 
	[:ea | ea printString]
%
an Array
  #1 anArray( 'ShrPcMonitor', 7722, 4294967295, 1, 0)
  #2 anArray( 'gs64stone', 7721, 0, 2, 1)
  #3 anArray( 'FreeFrmPgsvr2', 7725, 4294967294, 4, 2)
  #4 anArray( 'AioPgsvr3', 7726, 4294967294, 4, 3)
  #5 anArray( 'pagemgrThread', 7729, 1, 8, 4)
  #6 anArray( 'GcAdmin5', 7734, 2, 8, 5)
  #7 anArray( 'SymbolGem6', 7735, 3, 8, 6)
  #8 anArray( 'GcReclaim6_7', 7733, 4, 8, 7)
  #9 anArray( 'Gem26', 2271, 5, 8, 8)
  #10 anArray( 'Gem27', 16924, 6, 8, 9)

Of course, a Gem may log out between the time you execute this and the time you collect statistics, so be sure that your code handles that condition gracefully.

The methods you use to get the statistics and the corresponding descriptions will depend on how you have determined the specific process you want information about.

By name:

System cacheStatisticsForProcessWithCacheName: aString
(You must manually determine the process type)

or

System cacheStatsForGemWithName: aString.
System cacheStatisticsDescriptionForGem.

By operating system Process Id (PID):

System cacheStatisticsProcessId: aPid.
System cacheStatisticsDescriptionAt: 
	(System cacheSlotForProcessId: aPid).

By process slot:

System class >> cacheStatisticsAt: aProcessSlot
System class >> cacheStatisticsDescriptionAt: aProcessSlot

By session Id:

The page server for a Gem assumes the same sessionId as its Gem.

System gemCacheStatisticsForSessionId: aSessionId.
System cacheStatisticsDescriptionForGem.

or

System cacheStatsForPageServerWithSessionId: aSessionId 
System cacheStatisticsDescriptionForPageServer

For example, to find an aggregate value for TimeInFramesFromFindFree of all Gems in the system:

topaz 1> printit
| gemPids index time |
gemPids := Array new. 
System cacheStatisticsForAllSlotsShort do: 
	[:anArray | 
   (anArray at: 4) = 8 ifTrue: 
		[gemPids add: (anArray at: 2)].
   ].
index := System cacheStatisticsDescriptionForGem indexOf:  
		'TimeInFramesFromFindFree'.
time := 0.
gemPids do: [:aPid | | stats |
   stats := System cacheStatisticsProcessId: aPid.
   stats ifNotNil: [time := time + (stats at: index)].
   ].
time
%
Setting the name for the Gem in the cache

To make it easier for you to track cache statistics for specific Gems, you can explicitly give each Gem a unique name. The method

System cacheName: aString

sets the name for the current Gem session in the cache statistics, thus making it much easier to read the statistics in VSD.

Set the cache name soon after login. If you are collecting statistics information using statmonitor, information may be logged using the default name for the Gem, and you may have two separate lines of data for the same session.

Session Statistics

In addition to the system-generated statistics listed below, GemStone provides a facility for defining session statistics — user-defined statistics that can be written and read by each session, to monitor and profile the internal operations specific to your application.

There are 48 session cache statistic slots available, with names of the form SessionStat01...SessionStat47.

You can use the following methods to read and write the session cache statistics:

System class >> sessionCacheStatAt: anIndex

Returns the value of the statistic at the designated index. anIndex must be in the range -2 to 47. Negative values are reserved for internal use.

System class >> sessionCacheStatAt: anIndex put: aValue

Assigns a value to the statistic at the designated index and returns the new value. anIndex must be in the range -2 to 47. Negative values are reserved for internal use.

System class >> sessionCacheStatAt: anIndex incrementBy: anInt

Increment the statistic at the designated index by anInt, and returns the new value. anIndex must be in the range -2 to 47. Negative values are reserved for internal use.

System class >> sessionCacheStatAt: anIndex decrementBy: anInt

Decrement the statistic at the designated index by anInt, and returns the new value. anIndex must be in the range -2 to 47. Negative values are reserved for internal use.

System class >> sessionCacheStatsForProcessSlot: aProcessSlot

Return an array containing the 48 session statistics for the given process slot, or nil if the process slot is not found or is not in use.

System class >> sessionCacheStatsForSessionId: aSessionId

Return an array containing the 48 session statistics for the given session id, or nil if the session is not found or is not in use.

Global Session Statistics

In addition to the Gem session statistics, GemStone/S 64 Bit provides global session statistics — user-defined statistics that can be written and read by any Gem on any Gem server. Unlike session cache statistics, which are stored in the shared page cache of the machine that the Gem is running on, global session statistics are stored in the shared page cache of the Stone. Global session statistics are not transactional. For a given statistic, every session sees the same value, regardless of its transactional view.

There are 48 global cache statistic slots available, with names of the form GlobalStat01...GlobalStat47.

You can use the following methods to read and write the global cache statistics:

System class >> globalSessionStatAt: aProcessSlot
Returns the value of the statistic at the designated slot (must be in the range 0..47).

System class >> globalSessionStatAt: aProcessSlot put: aValue
Assigns a value to the statistic at the designated slot (must be in the range 0..47) and returns the new value. The value must be a SmallInteger in the range of -2147483648 to 2147483647.

System class >> incrementGlobalSessionStatAt: aProcessSlot by: anInt
Increments the value of the statistic at the designated slot by anInt and returns the new value of the statistic. The value anInt must be a SmallInteger in the range of -2147483648 to 2147483647.

Host Statistics

Host Statistics for processes

Process-level statistics require an OS call, which can cause cache statistics to impact performance. These statistics are not part of the information returned by regular cache statistics interface methods. To get this information, use the following methods.

System class >> hostProcessStatisticsNames
Returns an array of Strings which are the names of the per-process statistics provided by this host.

System class >> hostStatisticsForMyProcess
Returns an array of SmallIntegers which represent the host statistics for this process. The names of each statistic are returned by the #hostProcessStatisticsNames method.

System class >> hostStatisticsForProcess: processId
Returns an array of SmallIntegers which represent the host statistics for the process with the given process ID. The names of each statistic are returned by the #hostProcessStatisticsNames

Specific methods are also available to return the host CPU statistics only:

System class >> hostCpuStatsForProcessId: anInt
Return an Array of two integers as follows:

1 - user mode CPU milliseconds
2 - system mode CPU milliseconds

Both array elements will be -1 if the process slot is out of range or not in use or if this method is not supported for the host architecture.

It is not required that the process with pid anInt is attached to the shared page cache or even is a GemStone process. The method will succeed for any process for which the Gem session executing the method has permission to view the target process’ CPU usage statistics.

System class >> hostCpuStatsForProcessSlot: anInt
For the process using the cache process slot anInt, return an Array of two integers as follows:

1 - user mode CPU milliseconds used
2 - system mode CPU milliseconds used

Both array elements are set to -1 if the process slot is out of range or not in use, or if this method is not supported for the host architecture.

Host Statistics for OS

While most monitoring is of the object server and session processes, it is also useful to monitor the performance of the operating system that is running GemStone. On host platforms that support it, the following methods return statistics provided by the operating system. This is the same information that is available via statmonitor; see statmonitor.

System class>> fetchSystemStatNames
Return an array of Strings with the names of the available OS level statistics. The length is host-dependent. If the host system does not support system statistics, this method returns nil.

System class >> fetchSystemStats
Return an array of Numbers corresponding to the names returned by he #fetchSystemStatNames method. The length of the result array is host dependent. While most elements in the result array will be SmallIntegers, the result may also contain other types of Numbers such as SmallDoubles, Floats, LargeIntegers, etc. If the host system does not support system statistics, this method returns nil.

You can also monitoring specific CPU usage for the host using the following method:

System class >> hostCpuUsage
Returns an Array of 5 SmallIntegers with values between 0 and 100 which have the following meanings:

1 - Percent CPU active (user + system)
2 - Percent CPU idle
3 - Percent CPU user
4 - Percent CPU system (kernel)
5 - Percent CPU I/O wait

On hosts with multiple CPUs, these figure represent the average across all processors. The results of the first call to this method are invalid and should be discarded. Returns nil if the host system does not support collecting CPU statistics.

Previous chapter

Next chapter