7. Monitoring GemStone

Previous chapter

Next chapter

A properly configured GemStone repository will run normally with little attention. It is still important to monitor the repository, to catch unexpected problems before they become serious. If you have unexpected problems you will need to examine logs, monitor you system, and perform other analysis. The relevant logs and tools are described in this chapter.

GemStone Process Logs
details what logs are created by GemStone/S 64 Bit processes, where they are located, and what configuration is possible.

Repository Page and Object Audit
provides instructions on how to perform a page audit and object audit of the repository.

Profiling Repository Contents
describes how to analyze the repository contents

Monitoring Performance
describes how to monitor the performance of the GemStone server and its clients using GemStone Smalltalk methods.

Use caution if keeping a GemStone session running for monitoring purposes. A GemStone session that is in transaction and does not abort can cause an excessive commit record backlog and undesirable repository growth. See Disk Space and Commit Record Backlogs.

7.1  GemStone Process Logs

All GemStone processes create log files, including startup configuration information, tracking details on certain operations, and details for any errors that were encountered.

For some kinds of processes, these log files are only interesting if an error occurs, and these log files are deleted when the process exits (provided no error occurs). For other kinds of processes, information reported in the log files may provide diagnostic information for problems that occur later to other parts of the system. The log files for these kinds of processes are not deleted when the process exits.

The log file names and directory locations, and the log file deletion polices, are all configurable if you would prefer to set up a customized way to manage your log files. What is important is to know where your log files are and monitor these logs for error conditions, and to know how to find the relevant logs if a problem occurs.

GemStone log contents and names are UTF-8 encoded.

Finding log files

By default, GemStone writes log files to a number of specific locations:

  • The Stone logs defaults to the $GEMSTONE/data directory. The Stone log file name and location can be configured in a number of ways; Stone Log for details.
  • The SymbolGem, Page Manager, and Admin and Reclaim Gems are normally in the same location as their Stone’s log, but can also be configured; Admin Gem Log for ways this can be configured.
  • RPC Gem logs, and log files for processes that are running on nodes remote from the Stone are by default in the $HOME directory of the UNIX user. The locations and names can be configured by the NRS used to start the Gem; Gem Logs and logs related to Gem Sessions for the options.
  • The NetLDI log by default is in /opt/gemstone/log; NetLDI Log for ways this can be configured.

The logs for a running Stone or NetLDI and some other services can be found using the gslist utility. gslist -x displays the location of the current log file. For example,

os$ gslist -x gs64stone
gs64stone
  status=  exists
  type=    Stone
  version= 3.7.0
  owner=   gsadmin
  started= Jul 08 10:51
  pid=     1705239
  port=    45119
  options= 
  logfile= /gshost/GemStone3.7/data/gs64stone.log
  sysconf= /gshost/GemStone3.7/data/system.conf
  GEMSTONE=/gshost/GemStone3.7
  exe=/gshost/GemStone3.7/sys/stoned

Some process log files are deleted automatically when the process exits cleanly, to avoid an excessive number of unimportant log files. Details on specific processes describe the applicable log file deletion policy. These policies can be overriden per process (in most cases). You can force or disable delete using the environment variables GS_FORCE_CLEAN_LOG_FILE_DELETE and GS_KEEP_ALL_LOGS. Log files for processes that exited with an error are never deleted, and the NetLDI and Stone logs are never automatically deleted.

Stone Log

The log for the Stone repository monitor is always appended to, and is therefore cumulative across runs by default. This log is the first one you should check when a GemStone system problem is suspected. In addition to possible warnings and error messages, the log records the following useful information:

  • The GemStone version.
  • The configuration files that were read at startup and, if the DUMP_OPTIONS configuration option is set to True, the resulting Stone configuration.
  • Each startup and shutdown of the Stone, the reason for the shutdown, and whether recovery from transaction logs was necessary at startup.
  • Each expansion of a repository extent and its current size.
  • Each opening of a new transaction log.
  • Each startup and shutdown of each GcGem session, and the corresponding processId.
  • Each #abortErrLostOtRoot sent to a Gem.
  • Each suspension and resumption of logins.
  • Certain changes to the login security system.
  • Each time a backup is started and when the backup is completed.
Log name and location

The Stone log by default is stonename.log, where stonename is the name of the running Stone repository monitor. If a specific name was not specified for startstone, the stonename defaults to gs64stone.

The Stone log file name and location are determined in the following precedence:

1. A path and filename supplied by startstone -l logFile. logfile may be a filename, or a relative or absolute path and filename, to which the account starting the Stone has write permission. If logFile is a filename only, or not an absolute path, logFile is created in the current directory or relative to the current directory.

2. A path and filename specified by the GEMSTONE_LOG environment variable. As with startstone -l’s argument, this may be set to a filename or to a relative or absolute path and filename.

3. $GEMSTONE/data/stonename.log.

Log file deletion policy

The Stone log is never deleted; each restart appends to an existing log file of the same name, if one exists.

It is strongly recommended to retain this file over restarts and upgrades; the information in this log file may be useful for problem diagnosis for a significant time. If this file becomes too large, or the log file name or location is changed, we recommend archiving the older Stone logs.

Shared Page Cache Monitor Log

The shared page cache monitor log includes, among other things:

  • Its configuration (for remote nodes, this may be different from the configuration on the Stone’s node).
  • The number of processes that can attach (which can limit the number of logins).
  • The UNIX identifiers for the memory region and the semaphore array (these identifiers are helpful in the event you must remove them manually using the ipcrm command).
Log name and location

The log for the shared page cache monitor on the Stone’s machine is located in the same directory as the Stone’s log. This log file has a name of the form

stoneName_PIDpcmon.log

Check this log if other messages refer to a shared page cache failure.

When a session logs in from another node, a log is created for the shared page cache monitor on the remote node. This log is located by default in the home directory of the account that started the Stone, but this location can be modified by environment variable settings. The default name is of the form

stoneName_PIDpcmon_Node.log

where PID is the process Id of the monitor process, and Node is the name of the remote node.

Log file deletion policy

The shared page cache monitor log is not deleted on exit. A new log is created each time the stone is restarted, and old log files should be manually deleted from time to time.

Admin Gem Log

This log shows the startup value of the Admin Gem parameters that are stored in GcUser’s UserGlobals, and any changes to them, and records other Admin Gem functions.

Log name and location

Each time the Stone repository monitor starts an administrative garbage collection session (Admin Gem) process, a new log is created. By default, this log is in the same location as the Stone’s log. The location of this log file can be set specifically using the environment variable $GEMSTONE_ADMIN_GC_LOG_DIR.

The log name is formed using the pattern:

stoneName_PIDadmingcgem.log

where stoneName is the name of the Stone, and PID is the process Id of the Admin Gem process.

Log file deletion policy

By default, the AdminGem log is not deleted on clean exit.

The Admin Gem is started using the script $GEMSTONE/sys/runadmingem. You may create a customized version of this script, commenting out the line that sets $GEMSTONE_KEEP_LOG, to allow this log to be automatically deleted.

Reclaim Gem Log

This log shows the startup value of the Reclaim Gem parameters that are stored in GcUser’s UserGlobals, and any changes to them, and records other Reclaim Gem functions.

Log name and location

Each time the Stone repository monitor starts a reclaim garbage collection session (Reclaim Gem) process, a new log is created. By default, this log is in the same location as the Stone’s log. The location of this log file can be set specifically using the environment variable $GEMSTONE_RECLAIM_GC_LOG_DIR. in the same location as the Stone’s log.

The log name is formed using the pattern:

stoneName_PIDreclaimgcgem.log

where stoneName is the name of the stone, and PID is the process Id of the Reclaim Gem process.

Log file deletion policy

By default, the Reclaim Gem log is not deleted on clean exit.

The Reclaim Gem is started using the script $GEMSTONE/sys/runreclaimgem. You may create a customized version of this script, commenting out the line that sets $GEMSTONE_KEEP_LOG, to allow this log to be automatically deleted.

Page Manager Log

This log is not usually of interest, unless errors occur or tuning is required.

Log name and location

The Page Manager is a thread in the Stone, and is not a separate process, but it writes to a separate log for ease of maintenance. The Page Manager log is located in the same directory as the log for the Stone. This log file has a name of the form:

stoneName_PIDpagemanager.log

where stoneName is the name of the stone, and PID is the process Id of the Stone process.

Log file deletion policy

This log is not deleted by default on clean exit. Since it is a thread in the stone, it is not started by a specific script, and will only be deleted on clean exit when $GS_FORCE_CLEAN_LOG_FILE_DELETE is set.

Symbol Gem Log

This log is not usually of interest, unless errors occur or tuning is required.

Log name and location

The Symbol Gem log is located in the same directory as the Stone’s log, by default. The location of this log file can be set specifically using the environment variable GEMSTONE_SYMBOL_GEM_LOG_DIR.

The Symbol Gem log file has a name of the form:

stoneName_PIDsymbolgem.log

where stoneName is the name of the stone, and PID is the process Id of the Symbol Gem process.

Log file deletion policy

This log is deleted by default, if the SymbolGem exits cleanly and no (nonfatal) errors were reported during the lifetime of the SymbolGem, and no Symbol GC operations were performed while the SymbolGem was running. You may create a customized version of this script, uncommenting the line that sets $GEMSTONE_KEEP_LOG, to allow this log to be retained.

NetLDI Log

By default, the NetLDI log contains only configuration information and error messages. The configuration information reflects the environment at the time the NetLDI was started and the effect of any authentication switches specified as part of the startnetldi command.

In some cases it is helpful to log additional information by starting the NetLDI in debug mode (startnetldi -d). In this mode, the NetLDI writes a record of each communication to or from all clients to its log. Because the log for NetLDI running in debug mode is much larger, you probably won’t want to use this mode routinely.

Log name and location

The NetLDI writes a log file (netLdiName.log) in /opt/gemstone/log (or an equivalent, as described here) on the node on which it runs.

The startnetldi script allows you to specify a log file name and location using the -l option, and optionally the name netLdiName. If no log file name is specified using the -l argument, the default is /opt/gemstone/log/netLDIName.log.

Log file deletion policy

The NetLDI log file with the specified or default name is appended to, and is never deleted. You should manually remove outdated messages occasionally.

Gem Logs and logs related to Gem Sessions

The log file written by the Gem includes the Gem’s startup configuration details, configuration parameters settings, and login information, as well as the messages generated if an error occurs. This information is important when diagnosing client-related problems.

Normally a Gem log is created on login and continues to be used until the process logs out or otherwise terminates. A Gem log can be closed, and a new open opened, using System class > startNewGemLog: aFileName. This creates (or reopens), a log file named aFileName in the same directory as the previous gem log.

When the RPC or Linked Gem is not running on the same node as the Stone, the login to GemStone also requires other supporting processes to be spawned. Each of these processes has their own log file.

Linked Gems

Linked logins, in which the Gem is part of the client process, do not write a separate log file to disk. The log file output is sent to stdout of the linked process; for example, the linked topaz console. Topaz command such as output push allow this information to be written to disk. See the Topaz User’s Guide for more information.

RPC Gems on Stone’s host

An RPC login spawns a separate Gem session process. When this process is on the same node as the Stone, the RPC Gem can connect directly to the server processes, and does not require further supporting processes to be spawned.

By default, the log file for an RPC Gem is located in the home directory of the account that owns the Gem process, which depends in turn on the NetLDI configuration.

You can change the default location for Gem log files by setting #dir or #log in the GEMSTONE_NRS_ALL environment variable for the NetLDI itself or for individual clients; see Controlling log file directory locations. Alternatively, when you log in to GemStone, you can specify a different network resource string (NRS) in your login parameters.

Log file deletion policy for RPC Gems

The log file for a Gem log is deleted by default on a clean shutdown. If the Gem terminates with an error, then the log file is not deleted. Since a log is created for each RPC login, you should periodically manually examine log files for errors and delete older logs; especially if you configure you system to keep Gem logs.

To configure RPC Gem log files to be kept on clean shutdown.

  • Use the Gem service gemnetobject_keeplog instead of gemnetobject to login. gemnetobject_keeplog works just like gemnetobject, but sets the environment variable $GEMSTONE_KEEP_LOG, and so does not delete the log file.
  • You can set the environment variable $GEMSTONE_KEEP_LOG using an argument to gemnetobject and the GEM_ENV environment variable:
set gemnetid 'gemnetobject -C GEM_ENV=GEMSTONE_KEEP_LOG=1'

gemnetobject and gemnetobject_keeplog are described starting here.

Additional processes for Remote RPC or Linked Gems

For RPC logins where the Gem is not on the same node as the Stone, or for linked logins that are not on the same node as the Stone, the following additional processes are also spawned:

  • A page server on the server node (if one does not exist already for the remote node). This allows the session to access a repository extent on the server node.
  • A page server on the remote node (if one does not already exist for a previous login), to allow the Stone to start or access a shared page cache on the remote client node. The free frame page servers for the remote cache are threads within this process.
  • A shared page cache monitor on the remote node, to manage the remote cache on the client’s node.

The default location for the log files of these processes is based on any settings for #dir or #log that is specified for the Gem, or in the home directory of the account that owns the corresponding process. For the page server on the server node, that account ordinarily is the application user. For the shared page cache monitor and page server on the client node, that account is the one that invoked startstone.

The following table shows typical log names for processes related to remote logins, given a Stone named gs64stone repository on node1 with a login from a Gem session process on node2.

Typical Name

GemStone Process

gemnetobject27853node2.log

Gem session process on node2 (serves an RPC session)

gs64stone_27819cachepgsvr_node2.log

Page server on node2 that the repository monitor uses to create and access its shared page cache on node2

gs64stone_27820pcmon_node2.log

Shared page cache monitor on node2

runpgsvr12397node1.log

Page server on node1 that the Gem session process uses to access the repository extents on node1

Further control over log file location and name

The startnetldi -D option provides a default log file path for forked processes. NRS includes the #dir and #log directives, which allow you to specify the log file name and directory location, including pattern substitution, either in Gem login parameters or using GEMSTONE_NRS_ALL.

The options are described under Controlling log file directory locations.

Logsender and logreceiver Logs

The logsender and logreceiver processes are started only if you are setting up a hot standby system.

Log name and location

Each logsender and logreceiver creates a log file in /opt/gemstone/log on the node on which it runs.

The log file’s name, by default, is logsender_listeningPort.log or logreceiver_listeningPort.log.

This location and name can be overridden by including the option -llogname when starting the logsender or logreceiver.

Log file deletion policy

The logsender and logreceiver log files with the specified or default names are appended to, and are never deleted. You should occasionally manually examine these logs and remove outdated messages.

Other Log Files

Other GemStone processes also create log files, which are only of interest if an error occurs; these logs are deleted by default, and you may only ever see them if you use the $GS_KEEP_ALL_LOGS environment variable. Others are specific to particular utilities, and details are described in separate parts of this manual.

  • Extent pregrow produces log files named stonename_pidpgsvrPreGrow.log in the Stone’s log file directory.
  • pageaudit produces a log file for the audit gem, as well as the log file with audit results.
  • statmonitor, when started from the configuration option, invokes the script runstatmonitor and creates a log file named stonename_pidrunstatmonitor_type.log in the Stone’s log file directory, or for remote caches, in the home directory of the corresponding UNIX process owner.
  • cache warmers, from the configuration file option or using startcachewarmer, produce log files.

Summary of GemStone Process Log Behaviors

This table provides a summary of the various GemStone process log behaviors.

Table 7.1 GemStone process types and log file details

Process Name

Name and location of log file

Script

Stone

stonename.log; the default for stonename is gs64stone. Override with -l argument to startstone or $GEMSTONE_LOG.

Log file is never deleted; the same file is appended to with each restart.

startstone

NetLDI

/opt/gemstone/netldiname.log; the default for netldiname is gs64ldi. Override with -l argument to startnetldi.

Log file is never deleted. The same file is appended to with each restart.

startnetldi

Shared Page Cache Monitor

stonename_pidpcmon.log, in Stone’s log file directory. Remote SPC logs are $HOME/stoneName_PIDpcmon_Node.log.

Not deleted by default.

shrpcmonitor

Page Manager (Thread in Stone)

stonename_pidpagemanager.log, in Stone’s log file directory. Not deleted by default.

(thread in stone)

Symbol Gem

stonename_pidsymbolgem.log in Stone’s log file directory. Override with $GEMSTONE_SYMBOL_GEM_LOG_DIR
Deleted on clean exit by default.

runsymbolgem

Admin Gem

stonename_pidadmingcgem.log in Stone’s log file directory. Override the location with $GEMSTONE_ADMIN_GC_LOG_DIR .

Not deleted by default.

runadmingcgem

Reclaim Gem

stonename_pidreclaimgcgem.log in Stone’s log file directory. Override the location with $GEMSTONE_RECLAIM_GC_LOG_DIR.
Not deleted by default.

runreclaimgcgem

RPC Gem

gemnetobjectpidnode.log in the $HOME directory of the unix user (for gems started using the gemnetobject script). Override with #dir or #log in NRS of login parameters.

Deleted by default on clean exit

gemnetobject (alternate scripts are also provided)

page server on remote node

$HOME/stonename_pidcachepgsvr_node.log. Override with #dir in NRS used for the Gem login.

runcachpgsvr

page server on Stone’s node for remote Gems

$HOME/runpgsvrpidnode.log. Override with #dir in NRS use for the Gem login.

runpgsvr

pageaudit

produces both Gem and Stone logs. “Gem” log deleted by default on clean shutdown; delete behavior controlled by environment variables. “Stone” log holds audit results and is never deleted

runpageauditgem

logsender

logsender_port.log; override with -l argument to startlogsender.

Log file is never deleted. The same file is appended to with each restart.

startlogsender

logreceiver

logreceiver_port.log; override with -l argument to startlogreceiver.

Log file is never deleted. The same file is appended to with each restart.

startlogreceiver

cachewarmer (from config file)

stonename_cachewarmer.log in Stone’s log file directory. Deleted by default on clean exit.

runcachewarmergem

statmonitor (from config file)

Hold statmonitor startup information; statistics are in a separate file. Deleted by default on clean exit.

runstatmonitor

Managing log files

Since some log files are not deleted by default, and the occasional minor error will leave log files around that would normally be deleted automatically on processes exit, the number of log files will accumulate over time. The Stone and NetLDI log files are reopened used each time the process is restarted, and are cumulative, and so these logs will grow indefinitely. So, some maintenance on GemStone log files is required. Your application’s requirements for diagnostics after an incident, as well as your application design, will dictate which log files you need to retain and for how long.

Retaining or deleting all log files

The environment variables GS_KEEP_ALL_LOGS and GS_FORCE_CLEAN_LOG_FILE_DELETE override the individual defaults and configuration for the processes, to (respectively) force all log files to be retained, or all log files (except Stone and NetLDI) to be deleted on clean exit.

Customizing individual process deletion behavior

Many processes may have their log deletion process configured by setting the GEMSTONE_KEEP_LOG environment variable in the service script that starts that process.

Refer to Table 7.1 for specific service script names. Scripts that begin with “run”, and gemnetobject and its variants, are found in the $GEMSTONE/sys directory. To configure the delete behavior:

1. make a copy of the specific script, providing your own name

2. edit the copy to set or unset GEMSTONE_KEEP_LOG.

3. edit $GEMSTONE/sys/services.dat to point the service name to your customized script. For example, if you have created a customized AdminGem script in $GEMSTONE/scripts/myadmingcgemscript, edit services.dat so the lines look something like this:

# runadmingcgem 	        $GEMSTONE/sys/runadmingcgem
runadmingcgem           $GEMSTONE/scripts/myadmingcgemscript

Note that customizations to scripts and services.dat will be lost on upgrade, and you will need to repeat this process after upgrading, to avoid the risk of missing any changes in service or script names and contents.

Localizing timestamps in log files

The timestamps printed in the log headers and in log messages are formatted according to the current system locale. You can override this using the GS_CFTIME environment variable. If this is set in the environment for the process, then the setting is used to control printing in log headers and log messages.

The setting for GS_CFTIME must be a valid strftime format string, and must contain fields for:

  • Month: %m or %b or %B or %h
  • Day: %d
  • Hour: %H, or %I and %p, or %I and %P
  • Minutes: %M
  • Seconds: %S

If the criteria are not met, the default date format based on the system’s LOCALE is used, or otherwise the US-centric date format.

Programmatically adding messages to logs

It may be useful for your application to deliberately write messages to the Stone or Gem logs. For example, if you are performing some automated batch processing, it may be useful to know when this started and completed in relation to other system maintenance tasks such as garbage collection.

You can write a message to the stone log using:

System addAllToStoneLog: aString

To write to the Gem log or console, you may use the following:

GsFile gciLogServer: aString
GsFile gciLogClient: aString

Logging to the server here will write to the Gem log for an RPC session, or to the topaz console or stdout for a linked session. Logging to the client writes to the topaz console or stdout for both linked and RPC clients.

The correct place to log messages depends on your session configuration and the nature of your client application; in general, it is safer to log to the server. However, the RPC Gem log is deleted by default if the session logs out cleanly, so any messages in it will not be retained. See Log file deletion policy for RPC Gems for how to configure your login to preserve log files on clean exit. On the other hand, GUI applications may not provide access to stdout, making messages to the client inaccessible.

7.2  Repository Page and Object Audit

This section describes two levels of checks that you can perform on the repository.

Page Audit

Page audits allow you to diagnose problems in the system repository by checking for consistency at the page level. Page audit can be run only on repository extents that are not in use; shut down your Stone, or make an extent copy backup.

Page audit scans the root pages in a repository, the pages used in the bitmap structures referenced by the rootpage, and all other pages (including data pages) to confirm page-level consistency. While data pages are audited, it does not check that the data on data pages is valid. For that, you need to separately run an object audit; see Object Audit and Repair.

To run page audit, use the pageaudit utility. This utility starts up an audit gem and a Stone repository monitor in audit mode, to perform the audit.

The options to pageaudit are all optional, and include:

-e exeConfig is the executable configuration file.

-z systemConfig is the system configuration file.

-l logfile is the location and name of the output file. If not specified, then the log is written to a file named gemStoneName-pageAudit.log in the standard Stone log file location.

-d specifies to skip audit of data pages.

-f specifies to keep running after an audit error is found, if possible.

-n specifies the number of threads to use; by default, the number of extents plus the number of CPUs. Using a smaller value will cause pageaudit to take more time to complete, but reduces the impact on other processes.

gemStoneName is the name as which the pageaudit repository will run.; if not specified, pageaudit uses gs64stone-pageAudit.

The full set of options is described under pageaudit.

When pageaudit completes, it writes a message to stdout:

Page Audit of Repository completed successfully - no issues found.
For details, see /gemstone/logs/gs64stone-pageAudit.log

The details in the log file include Stone startup and configuration information, and audit steps performed. In addition, it produces statistics on the pages in the repository. For example:

PAGE AUDIT STATISTICS
    RepositorySize                 112.00 Mbytes       7168 pages
    Data Pages                      41.98 Mbytes       2687 pages
    Object Table Pages               1.20 Mbytes         77 pages
    Dependency Map Pages             0.02 Mbytes          1 pages
    Meta Information Pages           0.33 Mbytes         21 pages
    Commit Record Shadow Pages       0.03 Mbytes          2 pages
    Checkpoint Shadow Pages          0.00 Mbytes          0 pages
    Free Space in Repository        68.38 Mbytes       4376 pages
    OT Internal Pages                0.05 Mbytes          3 pages
    OT Leaf Pages                    1.16 Mbytes         74 pages
    Empty OT leaf pages              0.00 Mbytes          0 pages
    Empty data pages                 0.00 Mbytes          0 pages
    Data pages with 25%+ free        6.52 Mbytes        417 pages
    Data pages with 50%+ free        4.56 Mbytes        292 pages
    Free space in data pages         4.77 Mbytes     305.53 pages

If the page audit finds problems, the message to the screen ends with a message like this:

-------------- PAGE AUDIT RESULTS --------------
**** NumberOfFreePages = 980 does not agree with audit
	results = 988
 
**** Problems were found in Page Audit.
**** Refer to recovery procedures in System Administrator's Guide.

If there are problems in the page audit, you will need to restore the repository file from backups. (See the section How to Restore from Backup.)

Object Audit and Repair

Privileges required: SystemControl.

Object audits check the consistency of the repository at the object level. Starting with Object Table, each object is located and validated.

Object audit is performed using multiple threads (lightweight sessions), and can be configured to perform as quickly as possible using a large amount of system resources, or configured to use fewer resources and take longer to run.

Object audit should be run from linked Topaz, and on the same machine as the Stone.

Repository >> objectAudit
objectAudit
runs a complete audit in transaction. You may have other sessions logged in and running simultaneously, at the risk of a commit record backlog if the other sessions are performing commits; and the audit will impact performance. This audit uses two threads and up to 90% of the CPU.

Repository >> objectAuditPartial
objectAuditPartial
can be run to perform regular audits on production systems without the risk of a commit record backlog, at the expense of not detecting certain limited kinds of corruption. You may have other sessions logged in and running simultaneously, but the audit will impact performance. This audit uses two threads and up to 90% of the CPU.

Repository >> fastObjectAudit
fastObjectAudit
is like objectAudit, but is configured to use most or all system resources to complete as quickly as possible. This is useful when running an audit on offline systems.

Repository >> fastObjectAuditPartial
fastObjectAudit
is like objectAuditPartial, but is configured to use most or all system resources to complete as quickly as possible. This is useful when running an audit on offline systems.

Repository >> objectAuditWithMaxThreads: maxThreads
percentCpuActiveLimit: aPercent

Repository >> objectAuditWPartialWithMaxThreads: maxThreads
percentCpuActiveLimit: aPercent
These methods allows you to specify the exact performance/impact parameters for an object audit.

Performing the Object Audit

To perform an object audit:

Step 1. Log in to GemStone using linked Topaz (topaz -l).

Step 2. Send one of the audit messages to the repository. For example:

topaz 1> printit
SystemRepository objectAudit
%

The audit involves a number of checks and specific error messages. Checks include:

  • Object corruption — The object header should contain valid (legal) information about the object’s tag size, body size (number of instance variables), and physical size (bytes or OOPs).
  • Object reference consistency — No object should contain a reference to a nonexistent object, including references to a nonexistent class.

When the audit scan reports objects that do not exist, this is handled immediately, to avoid logical corruption of the repository. The objectAudit removes these OOPs from the free list, to avoid the risk of being reused for another new object by another session, and immediately checkpoints. If the checkpoint fails, or if any of the invalid referenced OOPs has already been reused, the stone will shutdown to avoid any commits for new values for these OOPs.

  • Identifier consistency — OOPs within the range in use (that is, up to the high-water mark) should be in either the Object Table or the list of free OOPs, and OOPs for objects existing in data pages should be in the Object Table.

If the repository is consistent and no errors are found, the audit will complete with the line:

Object Audit: Audit successfully completed; no errors were detected.

Otherwise, the reasons for failure with the specific problems found are reported to standard output

Error Recovery

If an object audit reports errors, these issues should be addressed. You may want to contact GemStone Technical Support for advice.

The following are general approaches to errors from object audit.

Collect and reclaim garbage and retry

If errors are reported during the object audit, you may wish to perform a markForCollection and reclaimAll and repeat the object audit. This may clear up problems if the object (s) that is (are) corrupt are not referenced from any live objects. Whether this is useful will depend on the particular errors reported.

Restore from backup

The safest approach when you find object audit errors is to restore from backup. GemStone recommends that you make regular backups, run in full transaction logging mode, and archive transaction logs as needed to recover. This would allow you to recover at any time from unexpected problems such as repository corruption.

If you do not have the set of backups and transaction logs that would allow you to restore from a backup and recover later transactions, or if you are in partial transaction logging mode, you can still make and restore a backup. Backups made using fullBackupTo:, when restored, rebuild the internal data structures. Depending on the specific problems found in audit, this may clear up the problem.

Attempt repair

GemStone includes the ability to repair invalid references, but this can only repair detectable corruption. If there are a number of errors reported, whatever caused the objects to disappear or become invalid may easily have also introduced undetectable logical corruption. It is not recommended to repair; you should restore from backup, if at all possible.

However, a single invalid reference may not indicate a widespread problem, and repair may allow important data to be recovered.

To manually repair an individual invalid reference, use the Topaz object specification format @identifier to substitute nil or an appropriate reference for an invalid reference.

For example, given an instance of Array with the OOP 51369729, if the element at slot 3 is an object that does not exist, it can be repaired by setting the reference to nil using the following expression:

topaz 1> send @51369729 at: 3 put: nil

The method Repository >> repair will perform an audit and make repairs during the re-scan. The following repairs are done:

  • nil is substituted for an invalid object reference.
  • Class String is substituted for an invalid class of a byte object, class Array for a pointer object, or class IdentitySet for a nonsequenceable collection object.
  • Oops in the Object Table for which the referenced object does not exist are inserted into the list of free Oops.
  • Oops for which an object exists but which are also in the list of free Oops are removed from the free list.

The repair audits the repository, keeping track of errors. After the initial audit completes, each error found is repaired. A descriptive message is displayed for each repair.

7.3  Profiling Repository Contents

Some questions, such as “what is using up all the space in my Repository?”, can only be answered by examining the types and numbers of objects in your repository. To find out this information, you can use methods on GsObjectInventory.

The methods in GsObjectInventory count all instances of all classes in the repository — or in any collection, or in a hidden set, or in a file of disconnected possible garbage objects — and report the results, ordered by the number of instances or by space consumed.

GsObjectInventory performs a multi-threaded scan of the repository, and thus should only be run in session on the same machine as the Stone. To tune the impact of the scan, additional protocol allows you to perform fast scans or to specify the impact levels. For details, see methods in the image.

The following code will report the number of instances and the space required for all Classes whose total space requirements are more than 50000 bytes.

Example 7.2 Object Inventory byteCountReport

topaz 1> printit
GsObjectInventory profileRepository byteCountReportDownTo: 50000
%
   *** GsObjectInventory byteCountReport printed at: 16/07/2023 10:54:49 *** 
Hidden classes are included in this report.
_________________________________________________________________
Class                                    Instances          Bytes
_________________________________________________________________
String                                       32291        8263560
GsNMethod                                    23113        4628608
Array                                        26775        4273072
GsMethodDictionary                            3844        1963336
Symbol                                       20253         909944
CanonStringBucket                             2019         307888
Class                                         1888         294800
IdentityKeyValueDictionary                    1913         260216
SymbolAssociation                             5525         221584
ExecBlock                                     3212         205768
LargeObjectNode                                 16         199072
SymbolDictionary                               991         165584
SymbolSet                                     5081         159200
IdentityCollisionBucket                       1275         136408
_________________________________________________________________
 
 

The same profiling with an instance count report is much shorter, since the number of instances, rather than the bytes of space used, limits the results.

Example 7.3 Object Inventory instanceCountReport

topaz 1> printit
GsObjectInventory profileRepository instanceCountReportDownTo: 10000
%
 *** GsObjectInventory instanceCountReport printed at: 16/07/2023 11:02:01 ***  
Hidden classes are included in this report.
_________________________________________________________________
Class                                    Instances          Bytes
_________________________________________________________________
String                                       32291        8263560
Array                                        26775        4273072
GsNMethod                                    23113        4628608
Symbol                                       20253         909944
_________________________________________________________________
 

Both of these reports include instances of hidden classes, classes that are used to implement internal GemStone objects, which are invisible to the image. One such class is LargeObjectNode. Instances of LargeObjectNodes are used to implement the tree structures that underlie large collections. To avoid seeing hidden classes, profile using the method profileRepositoryAndSkipHiddenClasses.

For more on GsObjectInventory, see the methods in the image.

7.4  Monitoring Performance

Part of administration requires monitoring the health and performance of the Stone, cache, and/or individual session processes.

GemStone includes graphical tools to allow you to record statistics in file and analyze this data graphically. You can also programmatically access these statistics.

Statmonitor and VSD

GemStone includes the statmonitor utility, which records statistics about GemStone processes to a disk file, and the vsd utility to view statistics graphically, including monitoring a live system by invoking statmonitor.

You can configure the processes for which statistics are recorded, how frequently the statistics are collected, and other details. See the options and examples here. Both GemStone-specific and operating system statistics are collected. The operating system statistics include general host information as well as information specific to the individual GemStone processes.

VSD has a rich set of controls for viewing statistics, which are described in the VSD User’s Guide.

Reliable constant monitoring

We recommend running statmonitor at all times, as it provides a valuable record of many aspects of system behavior. If you encounter certain kinds of problems in your application, GemTalk Technical Support will request statmonitor data for the period leading up to the problem, to diagnose possible causes.

You can configure statmonitor to start automatically on stone startup using the STN_STATMONITOR_ARGS. Similar options allow you to automatically start statmonitor on node that run a remote or mid-level shared page cache.

By using the -R or -r argument in combination with the -K argument to statmonitor, you can ensure statmonitor is continuously running while avoiding issues with statmonitor files consuming excessive space. However, ensure that enough files are retained so that when a problem is detected, the relevant statistics files will still be available.

VSD supports monitoring a live system, continually updating the display with the most recent statistics for your selected metrics.

Programmatic Access to Cache Statistics

A set of methods on the System class provide a way for you to analyze performance by programmatically examining the statistics that are collected in the shared page cache. This is the same data that is visible using statmonitor and VSD, although statmonitor and VSD can collect additional OS level information. This additional OS level information is also available programmatically; see Host Statistics

A process can only access statistics that are kept in the shared page cache to which it is attached. Sessions that are running on a different node than the Stone use a separate shared cache on that remote node. This means that processes that are on a different node than the Stone, cannot access statistics for the Stone or for other server processes that are attached to the Stone's shared page cache.

Within the shared page cache, GemStone statistics are stored as an array of process slots, each of which corresponds to a specific process. Process slot 0 is the shared page cache monitor. On the Stone’s shared page cache, process slot 1 is the Stone; on remote caches, slot 1 is the page server for the Stone that started the cache. Subsequent process slots are the page servers, Admin and Reclaim Gems, Symbol Gem, and user Gems. The order of these slots depends on the order in which the processes are started up, and is different on remote caches.

The specific set of statistics is different for each type of process that can attach to the shared page cache. The types of processes that are programmatically accessible are numbered:

1 = Shared page cache monitor
2 = Stone
4 = Gem Page server
8 = Gem (including Topaz, GBS, and other GCI applications).
128 = Page Manager thread
256 = Stone restore thread
512 = A thread within the Gem
1024 = Stone AIO thread
2048 = Stone free frame thread
4096 = Remote cache page server thread
8192 = Remote gem page server thread

Other numbers includes those for shared counters, platform-specific OS system statistics, and so on. The specific process types, process type numbers, and the specific statistics associated with that process type are written in the header portion of the statmonitor data files. All these may vary between GemStone releases, as statistics are added or removed.

The following examples demonstrate access to statistics for the Stone, Shared Page Cache and the current Gem. There are equivalent methods for other process types.

Statistics by name

To obtain the value for a specific statistics for the Stone, the Stone’s SPC monitor, or for the current session, use the following methods:

System class >> stoneCacheStatisticWithName:
System class >> primaryCacheMonitorCacheStatisticWithName:
System class >> myCacheStatisticWithName:

These methods will return the statistics value corresponding to the given statistic name for that process. If the statistics name is not found, it returns nil.

For example, to retrieve the statistics named ‘CommitRecordCount’ for the Stone:

topaz 1> printit
System stoneCacheStatisticWithName: 'CommitRecordCount'.
%
23

To retrieve the current session’s PageReads:

topaz 1> printit
System myCacheStatisticWithName: 'PageReads'.
%
548
All statistics for a process

The general way to retrieve statistics is as an array of values. To understand what the value at each index refers to, there are corresponding description methods to return an array of Strings. Matching the index of the statistic name to the index within the values locates the value for that statistic.

Since the statistics are different for the different types of processes, you will need to use corresponding methods to collect the statistics and the descriptions.

For the Stone, the Gem that is running the code, and the Stone’s shared page cache monitor, no further information is needed to identify them within the cache, so the following pairs of methods can be used:

System cacheStatisticsDescriptionForGem.
System myCacheStatistics.
 
System cacheStatisticsDescriptionForStone.
System stoneCacheStatistics.
 
System cacheStatisticsDescriptionForMonitor.
System sharedPageCacheMonitorCacheStatistics.

For example, while you would normally use stoneCacheStatisticForName:, here is another possible way to get the CommitRecordCount:

topaz 1> printit
| index |
index := System cacheStatisticsDescriptionForStone 
		indexOf: 'CommitRecordCount'.
System stoneCacheStatistics at: index.
%
23

To collect statistics for other Gems, and for page servers, you need to determine the process Id, session Id, or slot of the specific Gem or page server, or the cache name of the Gem. There are a variety of ways you might determine this, but one way is to examine the results of:

System cacheStatisticsForAllSlotsShort

This method returns the name, process Id, session Id, statistics type, and process slot for each process currently attached to the cache. For example:

topaz 1> printit
(System cacheStatisticsForAllSlotsShort) collect: 
	[:ea | ea printString]
%
an Array
  #1 anArray( 'ShrPcMonitor', 7722, 4294967295, 1, 0)
  #2 anArray( 'gs64stone', 7721, 0, 2, 1)
  #3 anArray( 'FreeFrmPgsvr2', 7725, 4294967294, 4, 2)
  #4 anArray( 'AioPgsvr3', 7726, 4294967294, 4, 3)
  #5 anArray( 'pagemgrThread', 7729, 1, 8, 4)
  #6 anArray( 'GcAdmin5', 7734, 2, 8, 5)
  #7 anArray( 'SymbolGem6', 7735, 3, 8, 6)
  #8 anArray( 'GcReclaim6_7', 7733, 4, 8, 7)
  #9 anArray( 'Gem26', 2271, 5, 8, 8)
  #10 anArray( 'Gem27', 16924, 6, 8, 9)

Of course, a Gem may log out between the time you execute this and the time you collect statistics, so be sure that your code handles that condition gracefully.

The methods you use to get the statistics and the corresponding descriptions will depend on how you have determined the specific process you want information about.

By name:

System cacheStatisticsForProcessWithCacheName: aString
(You must manually determine the process type)

or

System cacheStatsForGemWithName: aString.
System cacheStatisticsDescriptionForGem.

By operating system Process Id (PID):

System cacheStatisticsProcessId: aPid.
System cacheStatisticsDescriptionAt: 
	(System cacheSlotForProcessId: aPid).

By process slot:

System class >> cacheStatisticsAt: aProcessSlot
System class >> cacheStatisticsDescriptionAt: aProcessSlot

By session Id:

The page server for a Gem assumes the same sessionId as its Gem.

System gemCacheStatisticsForSessionId: aSessionId.
System cacheStatisticsDescriptionForGem.

or

System cacheStatsForPageServerWithSessionId: aSessionId 
System cacheStatisticsDescriptionForPageServer

For example, to find an aggregate value for TimeInFramesFromFindFree of all Gems in the system:

topaz 1> printit
| gemPids index time |
gemPids := Array new. 
System cacheStatisticsForAllSlotsShort do: 
	[:anArray | 
   (anArray at: 4) = 8 ifTrue: 
		[gemPids add: (anArray at: 2)].
   ].
index := System cacheStatisticsDescriptionForGem indexOf:  
		'TimeInFramesFromFindFree'.
time := 0.
gemPids do: [:aPid | | stats |
   stats := System cacheStatisticsProcessId: aPid.
   stats ifNotNil: [time := time + (stats at: index)].
   ].
time
%
Setting the name for the Gem in the cache

To make it easier for you to track cache statistics for specific Gems, you can explicitly give each Gem a unique name. The method

System cacheName: aString 

sets the name for the current Gem session in the cache statistics, thus making it much easier to read the statistics in VSD.

When using topaz, the -u command line argument will set the cache name for logins from topaz.

Otherwise, set the cache name soon after login. If you are collecting statistics information using statmonitor, information may be logged using the default name for the Gem when the session first logs in, in which case so you may have two separate lines of data for the same session.

Setting GemKind

The method

System setGemKind: anInteger

Is another way to uniquely identify Gems in the cache. This set a cache statistic GemKind to the given number. This is 0 by default; you can set this to any positive integer. GemStone uses negative values for system Gems such as the ReclaimGem and AdminGem.

Session Statistics

In addition to the system-generated statistics listed below, GemStone provides a facility for defining session statistics — user-defined statistics that can be written and read by each session, to monitor and profile the internal operations specific to your application.

There are 48 session cache statistic slots available, with names of the form SessionStat01...SessionStat47.

You can use the following methods to read and write the session cache statistics:

System class >> sessionCacheStatAt: anIndex
Returns the value of the statistic at the designated index. anIndex must be in the range -2 to 47. Negative indexes are reserved for internal use.

System class >> sessionCacheStatAt: anIndex put: aValue
Assigns a value to the statistic at the designated index and returns the new value. anIndex must be in the range -2 to 47. Negative indexes are reserved for internal use.

System class >> sessionCacheStatAt: anIndex incrementBy: anInt
Increment the statistic at the designated index by anInt, and returns the new value. anIndex must be in the range -2 to 47. Negative indexes are reserved for internal use.

System class >> sessionCacheStatAt: anIndex decrementBy: anInt
Decrement the statistic at the designated index by anInt, and returns the new value. anIndex must be in the range -2 to 47. Negative indexes are reserved for internal use.

System class >> sessionCacheStatsForProcessSlot: aProcessSlot
Return an array containing the 48 session statistics for the given process slot, or nil if the process slot is not found or is not in use.

System class >> sessionCacheStatsForSessionId: aSessionId
Return an array containing the 48 session statistics for the given session id, or nil if the session is not found or is not in use.

Global Session Statistics

In addition to the Gem session statistics, GemStone/S 64 Bit provides global session statistics — user-defined statistics that can be written and read by any Gem on any Gem server. Unlike session cache statistics, which are stored in the shared page cache of the machine that the Gem is running on, global session statistics are stored in the shared page cache of the Stone. Global session statistics are not transactional. For a given statistic, every session sees the same value, regardless of its transactional snapshot view.

There are 48 global cache statistic slots available, with names of the form GlobalStat00...GlobalStat47.

You can use the following methods to read and write the global cache statistics:

System class >> globalSessionStatAt: aProcessSlot
Returns the value of the statistic at the designated slot (must be in the range 0..47).

System class >> globalSessionStatAt: aProcessSlot put: aValue
Assigns a value to the statistic at the designated slot (must be in the range 0..47) and returns the new value. The value must be a SmallInteger in the range of -2147483648 to 2147483647.

System class >> incrementGlobalSessionStatAt: aProcessSlot by: anInt
Increments the value of the statistic at the designated slot by anInt and returns the new value of the statistic. The value anInt must be a SmallInteger in the range of -2147483648 to 2147483647.

Host Statistics

Host Statistics for processes

Process-level statistics require an OS call, which can cause cache statistics to impact performance. These statistics are not part of the information returned by regular cache statistics interface methods. To get this information, use the following methods.

System class >> hostProcessStatisticsNames
Returns an array of Strings which are the names of the per-process statistics provided by this host.

System class >> hostStatisticsForMyProcess
Returns an array of SmallIntegers which represent the host statistics for this process. The names of each statistic are returned by the #hostProcessStatisticsNames method.

System class >> hostStatisticsForProcess: processId
Returns an array of SmallIntegers which represent the host statistics for the process with the given process ID. The names of each statistic are returned by the #hostProcessStatisticsNames

Specific methods are also available to return the host CPU statistics only:

System class >> hostCpuStatsForProcessId: anInt
Return an Array of two integers as follows:

1 - user mode CPU milliseconds
2 - system mode CPU milliseconds

Both array elements will be -1 if the process slot is out of range or not in use or if this method is not supported for the host architecture.

It is not required that the process with pid anInt is attached to the shared page cache or even is a GemStone process. The method will succeed for any process for which the Gem session executing the method has permission to view the target process’ CPU usage statistics.

System class >> hostCpuStatsForProcessSlot: anInt
For the process using the cache process slot anInt, return an Array of two integers as follows:

1 - user mode CPU milliseconds used
2 - system mode CPU milliseconds used

Both array elements are set to -1 if the process slot is out of range or not in use, or if this method is not supported for the host architecture.

Host Statistics for OS

While most monitoring is of the object server and session processes, it is also useful to monitor the performance of the operating system that is running GemStone. On host platforms that support it, the following methods return statistics provided by the operating system. This is the same information that is available via statmonitor; see statmonitor.

System class>> fetchSystemStatNames
Return an array of Strings with the names of the available OS level statistics. The length is host-dependent. If the host system does not support system statistics, this method returns nil.

System class >> fetchSystemStats
Return an array of Numbers corresponding to the names returned by he #fetchSystemStatNames method. The length of the result array is host dependent. While most elements in the result array will be SmallIntegers, the result may also contain other types of Numbers such as SmallDoubles, Floats, LargeIntegers, etc. If the host system does not support system statistics, this method returns nil.

You can also monitor specific CPU usage for the host using the following method:

System class >> hostCpuUsage
Returns an Array of 5 SmallIntegers with values between 0 and 100 which have the following meanings:

1 - Percent CPU active (user + system)
2 - Percent CPU idle
3 - Percent CPU user
4 - Percent CPU system (kernel)
5 - Percent CPU I/O wait

On hosts with multiple CPUs, these figure represent the average across all processors. The results of the first call to this method are invalid and should be discarded. Returns nil if the host system does not support collecting CPU statistics.

Memory Statistics in with NetLDI in authenticated mode on Linux

On Linux, some memory statistics are read from /proc/pid/smaps, which has restricted access. statmonitor and gem methods which collect per-process statistics cannot collect smaps statistics for processes owned by other users. When running with the NetLDI in guest mode with captive account, there is no problem, since each gem process is owned by this account.

However, when running with the NetLDI owned by root with the s bit set, Gem processes are owned by the account logging in. The specific memory statistics normally based on smaps will be reported as zero.

If these statistics are required, statmonitor requires additional configuration to be able to read these memory statistics.

  • You may give statmonitor the cap_sys_ptrace capability:
os$ setcap cap_sys_ptrace=pe $GEMSTONE/bin/statmonitor
  • Alteratively, statmonitor can be run as root with s bit set:
os$ cd $GEMSTONE/sys
os$ chown root $GEMSTONE/bin/statmonitor 
os$ chmod u+s $GEMSTONE/bin/statmonitor

Programmatic access

Additional configuration is needed, in a system with Netldi running as root with the s bit, for one Gem to access the memory statistics of another Gem using methods such as System hostStatisticsForProcess:.

For security reasons, it is not recommended to give all Gem processes the cap_sys_ptrace capability; and setting this capability prevents gdb attaching, so pstack and kill -USR cannot write C stack traces. If a Gem must programmatically monitor the memory statistics for other Gem processes, you should configure the monitor Gem to run with a different gem executable than standard Gems in your environment, and give this gem executable the required capability.

1. Make a copy of $GEMSTONE/sys/gem for an RPC Gem, or $GEMSTONE/bin/topaz if this process will run as linked topaz. For example,

os$ cp $GEMSTONE/sys/gem $GEMSTONE/sys/gemTrace
os$ cp $GEMSTONE/bin/topaz $GEMSTONE/bin/topazTrace

2. Give this copy the cap_sys_ptrace capability.

os$ setcap cap_sys_ptrace=pe $GEMSTONE/bin/topazTrace
os$ setcap cap_sys_ptrace=pe $GEMSTONE/sys/gemTrace

3. For an RPC gem, create a custom gemnetobject, and use this for login.

os$ cp $GEMSTONE/sys/gemnetobject $GEMSTONE/sys/gemnetobject_trace

Edit your custom gemnetobject, for example gemnetobject_trace, to set the gemname that invokes this executable, by modifying this line:

gemname="gemTrace" 

Login using your custom gemnetobject in the login parameters:

topaz> set gemnetid gemnetobject_trace

4. If you are running NetLDI with the startnetldi -n option, you must also add an entry to the services.dat file. See theSystem Administration Guide, Appendix A, for more details.

Monitoring with Prometheus

On Linux, the statprom utility allows you to use Prometheus monitoring software to monitor GemStone cache statistics.

Prometheus is an open-source systems monitoring and alerting toolkit, that collects and stores metrics (numeric data) as time series, along with key-value tags. Prometheus is widely used, and the Prometheus Github project has a active developer and user community.

In addition to Prometheus, Grafana can be installed and used for live monitoring of Prometheus data; Grafana provides out of the box support for Prometheus, and no additional configuration is needed to collect the GemStone data from Prometheus.

When statprom is started, it takes an JSON-format argument configuration file that is customized for the cache and monitoring requirements; this includes the port number, the Stone name, and specific statistics to report.

For details on configuring statprom and using it to monitor Prometheus, see the supplemental documentation:

https://docs.gemtalksystems.com/current/monitorwithprometheus

 

Previous chapter

Next chapter