A properly configured GemStone repository will run normally with little attention. It is still important to monitor the repository, to catch unexpected problems before they become serious. If you have unexpected problems you will need to examine logs, monitor you system, and perform other analysis. The relevant logs and tools are described in this chapter.
GemStone Process Logs
details what logs are created by GemStone/S 64 Bit processes, where they are located, and what configuration is possible.
Repository Page and Object Audit
provides instructions on how to perform a page audit and object audit of the repository.
Profiling Repository Contents
describes how to analyze the repository contents
Monitoring Performance
describes how to monitor the performance of the GemStone server and its clients using GemStone Smalltalk methods.
Use caution if keeping a GemStone session running for monitoring purposes. A GemStone session that is in transaction and does not abort can cause an excessive commit record backlog and undesirable repository growth. See Disk Space and Commit Record Backlogs. |
All GemStone processes create log files, including startup configuration information, tracking details on certain operations, and details for any errors that were encountered.
For some kinds of processes, these log files are only interesting if an error occurs, and these log files are deleted when the process exits (provided no error occurs). For other kinds of processes, information reported in the log files may provide diagnostic information for problems that occur later to other parts of the system. The log files for these kinds of processes are not deleted when the process exits.
The log file names and directory locations, and the log file deletion polices, are all configurable if you would prefer to set up a customized way to manage your log files. What is important is to know where your log files are and monitor these logs for error conditions, and to know how to find the relevant logs if a problem occurs.
GemStone log contents and names are UTF-8 encoded.
By default, GemStone writes log files to a number of specific locations:
The logs for a running Stone or NetLDI and some other services can be found using the gslist utility. gslist -x displays the location of the current log file. For example,
os$ gslist -x gs64stone
gs64stone
status= exists
type= Stone
version= 3.7.0
owner= gsadmin
started= Jul 08 10:51
pid= 1705239
port= 45119
options=
logfile= /gshost/GemStone3.7/data/gs64stone.log
sysconf= /gshost/GemStone3.7/data/system.conf
GEMSTONE=/gshost/GemStone3.7
exe=/gshost/GemStone3.7/sys/stoned
Some process log files are deleted automatically when the process exits cleanly, to avoid an excessive number of unimportant log files. Details on specific processes describe the applicable log file deletion policy. These policies can be overriden per process (in most cases). You can force or disable delete using the environment variables GS_FORCE_CLEAN_LOG_FILE_DELETE and GS_KEEP_ALL_LOGS. Log files for processes that exited with an error are never deleted, and the NetLDI and Stone logs are never automatically deleted.
The log for the Stone repository monitor is always appended to, and is therefore cumulative across runs by default. This log is the first one you should check when a GemStone system problem is suspected. In addition to possible warnings and error messages, the log records the following useful information:
The Stone log by default is stonename.log, where stonename is the name of the running Stone repository monitor. If a specific name was not specified for startstone, the stonename defaults to gs64stone.
The Stone log file name and location are determined in the following precedence:
1. A path and filename supplied by startstone -l logFile. logfile may be a filename, or a relative or absolute path and filename, to which the account starting the Stone has write permission. If logFile is a filename only, or not an absolute path, logFile is created in the current directory or relative to the current directory.
2. A path and filename specified by the GEMSTONE_LOG environment variable. As with startstone -l’s argument, this may be set to a filename or to a relative or absolute path and filename.
The Stone log is never deleted; each restart appends to an existing log file of the same name, if one exists.
It is strongly recommended to retain this file over restarts and upgrades; the information in this log file may be useful for problem diagnosis for a significant time. If this file becomes too large, or the log file name or location is changed, we recommend archiving the older Stone logs.
The shared page cache monitor log includes, among other things:
The log for the shared page cache monitor on the Stone’s machine is located in the same directory as the Stone’s log. This log file has a name of the form
stoneName_PIDpcmon.log
Check this log if other messages refer to a shared page cache failure.
When a session logs in from another node, a log is created for the shared page cache monitor on the remote node. This log is located by default in the home directory of the account that started the Stone, but this location can be modified by environment variable settings. The default name is of the form
stoneName_PIDpcmon_Node.log
where PID is the process Id of the monitor process, and Node is the name of the remote node.
This log shows the startup value of the Admin Gem parameters that are stored in GcUser’s UserGlobals, and any changes to them, and records other Admin Gem functions.
Each time the Stone repository monitor starts an administrative garbage collection session (Admin Gem) process, a new log is created. By default, this log is in the same location as the Stone’s log. The location of this log file can be set specifically using the environment variable $GEMSTONE_ADMIN_GC_LOG_DIR.
The log name is formed using the pattern:
where stoneName is the name of the Stone, and PID is the process Id of the Admin Gem process.
By default, the AdminGem log is not deleted on clean exit.
The Admin Gem is started using the script $GEMSTONE/sys/runadmingem. You may create a customized version of this script, commenting out the line that sets $GEMSTONE_KEEP_LOG, to allow this log to be automatically deleted.
This log shows the startup value of the Reclaim Gem parameters that are stored in GcUser’s UserGlobals, and any changes to them, and records other Reclaim Gem functions.
Each time the Stone repository monitor starts a reclaim garbage collection session (Reclaim Gem) process, a new log is created. By default, this log is in the same location as the Stone’s log. The location of this log file can be set specifically using the environment variable $GEMSTONE_RECLAIM_GC_LOG_DIR. in the same location as the Stone’s log.
The log name is formed using the pattern:
where stoneName is the name of the stone, and PID is the process Id of the Reclaim Gem process.
By default, the Reclaim Gem log is not deleted on clean exit.
The Reclaim Gem is started using the script $GEMSTONE/sys/runreclaimgem. You may create a customized version of this script, commenting out the line that sets $GEMSTONE_KEEP_LOG, to allow this log to be automatically deleted.
This log is not usually of interest, unless errors occur or tuning is required.
The Page Manager is a thread in the Stone, and is not a separate process, but it writes to a separate log for ease of maintenance. The Page Manager log is located in the same directory as the log for the Stone. This log file has a name of the form:
stoneName_PIDpagemanager.log
where stoneName is the name of the stone, and PID is the process Id of the Stone process.
This log is not deleted by default on clean exit. Since it is a thread in the stone, it is not started by a specific script, and will only be deleted on clean exit when $GS_FORCE_CLEAN_LOG_FILE_DELETE is set.
This log is not usually of interest, unless errors occur or tuning is required.
The Symbol Gem log is located in the same directory as the Stone’s log, by default. The location of this log file can be set specifically using the environment variable GEMSTONE_SYMBOL_GEM_LOG_DIR.
The Symbol Gem log file has a name of the form:
stoneName_PIDsymbolgem.log
where stoneName is the name of the stone, and PID is the process Id of the Symbol Gem process.
This log is deleted by default, if the SymbolGem exits cleanly and no (nonfatal) errors were reported during the lifetime of the SymbolGem, and no Symbol GC operations were performed while the SymbolGem was running. You may create a customized version of this script, uncommenting the line that sets $GEMSTONE_KEEP_LOG, to allow this log to be retained.
By default, the NetLDI log contains only configuration information and error messages. The configuration information reflects the environment at the time the NetLDI was started and the effect of any authentication switches specified as part of the startnetldi command.
In some cases it is helpful to log additional information by starting the NetLDI in debug mode (startnetldi -d). In this mode, the NetLDI writes a record of each communication to or from all clients to its log. Because the log for NetLDI running in debug mode is much larger, you probably won’t want to use this mode routinely.
The NetLDI writes a log file (netLdiName.log) in /opt/gemstone/log (or an equivalent, as described here) on the node on which it runs.
The startnetldi script allows you to specify a log file name and location using the -l option, and optionally the name netLdiName. If no log file name is specified using the -l argument, the default is /opt/gemstone/log/netLDIName.log.
The log file written by the Gem includes the Gem’s startup configuration details, configuration parameters settings, and login information, as well as the messages generated if an error occurs. This information is important when diagnosing client-related problems.
Normally a Gem log is created on login and continues to be used until the process logs out or otherwise terminates. A Gem log can be closed, and a new open opened, using System class > startNewGemLog: aFileName. This creates (or reopens), a log file named aFileName in the same directory as the previous gem log.
When the RPC or Linked Gem is not running on the same node as the Stone, the login to GemStone also requires other supporting processes to be spawned. Each of these processes has their own log file.
Linked logins, in which the Gem is part of the client process, do not write a separate log file to disk. The log file output is sent to stdout of the linked process; for example, the linked topaz console. Topaz command such as output push allow this information to be written to disk. See the Topaz User’s Guide for more information.
An RPC login spawns a separate Gem session process. When this process is on the same node as the Stone, the RPC Gem can connect directly to the server processes, and does not require further supporting processes to be spawned.
By default, the log file for an RPC Gem is located in the home directory of the account that owns the Gem process, which depends in turn on the NetLDI configuration.
You can change the default location for Gem log files by setting #dir or #log in the GEMSTONE_NRS_ALL environment variable for the NetLDI itself or for individual clients; see Controlling log file directory locations. Alternatively, when you log in to GemStone, you can specify a different network resource string (NRS) in your login parameters.
The log file for a Gem log is deleted by default on a clean shutdown. If the Gem terminates with an error, then the log file is not deleted. Since a log is created for each RPC login, you should periodically manually examine log files for errors and delete older logs; especially if you configure you system to keep Gem logs.
To configure RPC Gem log files to be kept on clean shutdown.
set gemnetid 'gemnetobject -C GEM_ENV=GEMSTONE_KEEP_LOG=1'
gemnetobject and gemnetobject_keeplog are described starting here.
For RPC logins where the Gem is not on the same node as the Stone, or for linked logins that are not on the same node as the Stone, the following additional processes are also spawned:
The default location for the log files of these processes is based on any settings for #dir or #log that is specified for the Gem, or in the home directory of the account that owns the corresponding process. For the page server on the server node, that account ordinarily is the application user. For the shared page cache monitor and page server on the client node, that account is the one that invoked startstone.
The following table shows typical log names for processes related to remote logins, given a Stone named gs64stone repository on node1 with a login from a Gem session process on node2.
The startnetldi -D option provides a default log file path for forked processes. NRS includes the #dir and #log directives, which allow you to specify the log file name and directory location, including pattern substitution, either in Gem login parameters or using GEMSTONE_NRS_ALL.
The options are described under Controlling log file directory locations.
The logsender and logreceiver processes are started only if you are setting up a hot standby system.
Each logsender and logreceiver creates a log file in /opt/gemstone/log on the node on which it runs.
The log file’s name, by default, is logsender_listeningPort.log or logreceiver_listeningPort.log.
This location and name can be overridden by including the option -llogname when starting the logsender or logreceiver.
Other GemStone processes also create log files, which are only of interest if an error occurs; these logs are deleted by default, and you may only ever see them if you use the $GS_KEEP_ALL_LOGS environment variable. Others are specific to particular utilities, and details are described in separate parts of this manual.
This table provides a summary of the various GemStone process log behaviors.
Since some log files are not deleted by default, and the occasional minor error will leave log files around that would normally be deleted automatically on processes exit, the number of log files will accumulate over time. The Stone and NetLDI log files are reopened used each time the process is restarted, and are cumulative, and so these logs will grow indefinitely. So, some maintenance on GemStone log files is required. Your application’s requirements for diagnostics after an incident, as well as your application design, will dictate which log files you need to retain and for how long.
The environment variables GS_KEEP_ALL_LOGS and GS_FORCE_CLEAN_LOG_FILE_DELETE override the individual defaults and configuration for the processes, to (respectively) force all log files to be retained, or all log files (except Stone and NetLDI) to be deleted on clean exit.
Many processes may have their log deletion process configured by setting the GEMSTONE_KEEP_LOG environment variable in the service script that starts that process.
Refer to Table 7.1 for specific service script names. Scripts that begin with “run”, and gemnetobject and its variants, are found in the $GEMSTONE/sys directory. To configure the delete behavior:
1. make a copy of the specific script, providing your own name
2. edit the copy to set or unset GEMSTONE_KEEP_LOG.
3. edit $GEMSTONE/sys/services.dat to point the service name to your customized script. For example, if you have created a customized AdminGem script in $GEMSTONE/scripts/myadmingcgemscript, edit services.dat so the lines look something like this:
# runadmingcgem $GEMSTONE/sys/runadmingcgem
runadmingcgem $GEMSTONE/scripts/myadmingcgemscript
Note that customizations to scripts and services.dat will be lost on upgrade, and you will need to repeat this process after upgrading, to avoid the risk of missing any changes in service or script names and contents.
The timestamps printed in the log headers and in log messages are formatted according to the current system locale. You can override this using the GS_CFTIME environment variable. If this is set in the environment for the process, then the setting is used to control printing in log headers and log messages.
The setting for GS_CFTIME must be a valid strftime format string, and must contain fields for:
If the criteria are not met, the default date format based on the system’s LOCALE is used, or otherwise the US-centric date format.
It may be useful for your application to deliberately write messages to the Stone or Gem logs. For example, if you are performing some automated batch processing, it may be useful to know when this started and completed in relation to other system maintenance tasks such as garbage collection.
You can write a message to the stone log using:
System addAllToStoneLog: aString
To write to the Gem log or console, you may use the following:
GsFile gciLogServer: aString
GsFile gciLogClient: aString
Logging to the server here will write to the Gem log for an RPC session, or to the topaz console or stdout for a linked session. Logging to the client writes to the topaz console or stdout for both linked and RPC clients.
The correct place to log messages depends on your session configuration and the nature of your client application; in general, it is safer to log to the server. However, the RPC Gem log is deleted by default if the session logs out cleanly, so any messages in it will not be retained. See Log file deletion policy for RPC Gems for how to configure your login to preserve log files on clean exit. On the other hand, GUI applications may not provide access to stdout, making messages to the client inaccessible.
This section describes two levels of checks that you can perform on the repository.
Page audits allow you to diagnose problems in the system repository by checking for consistency at the page level. Page audit can be run only on repository extents that are not in use; shut down your Stone, or make an extent copy backup.
Page audit scans the root pages in a repository, the pages used in the bitmap structures referenced by the rootpage, and all other pages (including data pages) to confirm page-level consistency. While data pages are audited, it does not check that the data on data pages is valid. For that, you need to separately run an object audit; see Object Audit and Repair.
To run page audit, use the pageaudit utility. This utility starts up an audit gem and a Stone repository monitor in audit mode, to perform the audit.
The options to pageaudit are all optional, and include:
-e exeConfig is the executable configuration file.
-z systemConfig is the system configuration file.
-l logfile is the location and name of the output file. If not specified, then the log is written to a file named gemStoneName-pageAudit.log in the standard Stone log file location.
-d specifies to skip audit of data pages.
-f specifies to keep running after an audit error is found, if possible.
-n specifies the number of threads to use; by default, the number of extents plus the number of CPUs. Using a smaller value will cause pageaudit to take more time to complete, but reduces the impact on other processes.
gemStoneName is the name as which the pageaudit repository will run.; if not specified, pageaudit uses gs64stone-pageAudit.
The full set of options is described under pageaudit.
When pageaudit completes, it writes a message to stdout:
Page Audit of Repository completed successfully - no issues found.
For details, see /gemstone/logs/gs64stone-pageAudit.log
The details in the log file include Stone startup and configuration information, and audit steps performed. In addition, it produces statistics on the pages in the repository. For example:
PAGE AUDIT STATISTICS
RepositorySize 112.00 Mbytes 7168 pages
Data Pages 41.98 Mbytes 2687 pages
Object Table Pages 1.20 Mbytes 77 pages
Dependency Map Pages 0.02 Mbytes 1 pages
Meta Information Pages 0.33 Mbytes 21 pages
Commit Record Shadow Pages 0.03 Mbytes 2 pages
Checkpoint Shadow Pages 0.00 Mbytes 0 pages
Free Space in Repository 68.38 Mbytes 4376 pages
OT Internal Pages 0.05 Mbytes 3 pages
OT Leaf Pages 1.16 Mbytes 74 pages
Empty OT leaf pages 0.00 Mbytes 0 pages
Empty data pages 0.00 Mbytes 0 pages
Data pages with 25%+ free 6.52 Mbytes 417 pages
Data pages with 50%+ free 4.56 Mbytes 292 pages
Free space in data pages 4.77 Mbytes 305.53 pages
If the page audit finds problems, the message to the screen ends with a message like this:
-------------- PAGE AUDIT RESULTS --------------
**** NumberOfFreePages = 980 does not agree with audit
results = 988
**** Problems were found in Page Audit.
**** Refer to recovery procedures in System Administrator's Guide.
If there are problems in the page audit, you will need to restore the repository file from backups. (See the section How to Restore from Backup.)
Privileges required: SystemControl.
Object audits check the consistency of the repository at the object level. Starting with Object Table, each object is located and validated.
Object audit is performed using multiple threads (lightweight sessions), and can be configured to perform as quickly as possible using a large amount of system resources, or configured to use fewer resources and take longer to run.
Object audit should be run from linked Topaz, and on the same machine as the Stone.
Repository >> objectAudit
objectAudit runs a complete audit in transaction. You may have other sessions logged in and running simultaneously, at the risk of a commit record backlog if the other sessions are performing commits; and the audit will impact performance. This audit uses two threads and up to 90% of the CPU.
Repository >> objectAuditPartial
objectAuditPartial can be run to perform regular audits on production systems without the risk of a commit record backlog, at the expense of not detecting certain limited kinds of corruption. You may have other sessions logged in and running simultaneously, but the audit will impact performance. This audit uses two threads and up to 90% of the CPU.
Repository >> fastObjectAudit
fastObjectAudit is like objectAudit, but is configured to use most or all system resources to complete as quickly as possible. This is useful when running an audit on offline systems.
Repository >> fastObjectAuditPartial
fastObjectAudit is like objectAuditPartial, but is configured to use most or all system resources to complete as quickly as possible. This is useful when running an audit on offline systems.
Repository >> objectAuditWithMaxThreads: maxThreads
percentCpuActiveLimit: aPercent
Repository >> objectAuditWPartialWithMaxThreads: maxThreads
percentCpuActiveLimit: aPercent
These methods allows you to specify the exact performance/impact parameters for an object audit.
Step 1. Log in to GemStone using linked Topaz (topaz -l).
Step 2. Send one of the audit messages to the repository. For example:
topaz 1> printit
SystemRepository objectAudit
%
The audit involves a number of checks and specific error messages. Checks include:
When the audit scan reports objects that do not exist, this is handled immediately, to avoid logical corruption of the repository. The objectAudit removes these OOPs from the free list, to avoid the risk of being reused for another new object by another session, and immediately checkpoints. If the checkpoint fails, or if any of the invalid referenced OOPs has already been reused, the stone will shutdown to avoid any commits for new values for these OOPs.
If the repository is consistent and no errors are found, the audit will complete with the line:
Object Audit: Audit successfully completed; no errors were detected.
Otherwise, the reasons for failure with the specific problems found are reported to standard output
If an object audit reports errors, these issues should be addressed. You may want to contact GemStone Technical Support for advice.
The following are general approaches to errors from object audit.
If errors are reported during the object audit, you may wish to perform a markForCollection and reclaimAll and repeat the object audit. This may clear up problems if the object (s) that is (are) corrupt are not referenced from any live objects. Whether this is useful will depend on the particular errors reported.
The safest approach when you find object audit errors is to restore from backup. GemStone recommends that you make regular backups, run in full transaction logging mode, and archive transaction logs as needed to recover. This would allow you to recover at any time from unexpected problems such as repository corruption.
If you do not have the set of backups and transaction logs that would allow you to restore from a backup and recover later transactions, or if you are in partial transaction logging mode, you can still make and restore a backup. Backups made using fullBackupTo:, when restored, rebuild the internal data structures. Depending on the specific problems found in audit, this may clear up the problem.
GemStone includes the ability to repair invalid references, but this can only repair detectable corruption. If there are a number of errors reported, whatever caused the objects to disappear or become invalid may easily have also introduced undetectable logical corruption. It is not recommended to repair; you should restore from backup, if at all possible.
However, a single invalid reference may not indicate a widespread problem, and repair may allow important data to be recovered.
To manually repair an individual invalid reference, use the Topaz object specification format @identifier to substitute nil or an appropriate reference for an invalid reference.
For example, given an instance of Array with the OOP 51369729, if the element at slot 3 is an object that does not exist, it can be repaired by setting the reference to nil using the following expression:
topaz 1> send @51369729 at: 3 put: nil
The method Repository >> repair will perform an audit and make repairs during the re-scan. The following repairs are done:
The repair audits the repository, keeping track of errors. After the initial audit completes, each error found is repaired. A descriptive message is displayed for each repair.
Some questions, such as “what is using up all the space in my Repository?”, can only be answered by examining the types and numbers of objects in your repository. To find out this information, you can use methods on GsObjectInventory.
The methods in GsObjectInventory count all instances of all classes in the repository — or in any collection, or in a hidden set, or in a file of disconnected possible garbage objects — and report the results, ordered by the number of instances or by space consumed.
GsObjectInventory performs a multi-threaded scan of the repository, and thus should only be run in session on the same machine as the Stone. To tune the impact of the scan, additional protocol allows you to perform fast scans or to specify the impact levels. For details, see methods in the image.
The following code will report the number of instances and the space required for all Classes whose total space requirements are more than 50000 bytes.
topaz 1> printit
GsObjectInventory profileRepository byteCountReportDownTo: 50000
%
*** GsObjectInventory byteCountReport printed at: 16/07/2023 10:54:49 ***
Hidden classes are included in this report.
_________________________________________________________________
Class Instances Bytes
_________________________________________________________________
String 32291 8263560
GsNMethod 23113 4628608
Array 26775 4273072
GsMethodDictionary 3844 1963336
Symbol 20253 909944
CanonStringBucket 2019 307888
Class 1888 294800
IdentityKeyValueDictionary 1913 260216
SymbolAssociation 5525 221584
ExecBlock 3212 205768
LargeObjectNode 16 199072
SymbolDictionary 991 165584
SymbolSet 5081 159200
IdentityCollisionBucket 1275 136408
_________________________________________________________________
The same profiling with an instance count report is much shorter, since the number of instances, rather than the bytes of space used, limits the results.
topaz 1> printit
GsObjectInventory profileRepository instanceCountReportDownTo: 10000
%
*** GsObjectInventory instanceCountReport printed at: 16/07/2023 11:02:01 ***
Hidden classes are included in this report.
_________________________________________________________________
Class Instances Bytes
_________________________________________________________________
String 32291 8263560
Array 26775 4273072
GsNMethod 23113 4628608
Symbol 20253 909944
_________________________________________________________________
Both of these reports include instances of hidden classes, classes that are used to implement internal GemStone objects, which are invisible to the image. One such class is LargeObjectNode. Instances of LargeObjectNodes are used to implement the tree structures that underlie large collections. To avoid seeing hidden classes, profile using the method profileRepositoryAndSkipHiddenClasses.
For more on GsObjectInventory, see the methods in the image.
Part of administration requires monitoring the health and performance of the Stone, cache, and/or individual session processes.
GemStone includes graphical tools to allow you to record statistics in file and analyze this data graphically. You can also programmatically access these statistics.
GemStone includes the statmonitor utility, which records statistics about GemStone processes to a disk file, and the vsd utility to view statistics graphically, including monitoring a live system by invoking statmonitor.
You can configure the processes for which statistics are recorded, how frequently the statistics are collected, and other details. See the options and examples here. Both GemStone-specific and operating system statistics are collected. The operating system statistics include general host information as well as information specific to the individual GemStone processes.
VSD has a rich set of controls for viewing statistics, which are described in the VSD User’s Guide.
We recommend running statmonitor at all times, as it provides a valuable record of many aspects of system behavior. If you encounter certain kinds of problems in your application, GemTalk Technical Support will request statmonitor data for the period leading up to the problem, to diagnose possible causes.
You can configure statmonitor to start automatically on stone startup using the STN_STATMONITOR_ARGS. Similar options allow you to automatically start statmonitor on node that run a remote or mid-level shared page cache.
By using the -R or -r argument in combination with the -K argument to statmonitor, you can ensure statmonitor is continuously running while avoiding issues with statmonitor files consuming excessive space. However, ensure that enough files are retained so that when a problem is detected, the relevant statistics files will still be available.
VSD supports monitoring a live system, continually updating the display with the most recent statistics for your selected metrics.
A set of methods on the System class provide a way for you to analyze performance by programmatically examining the statistics that are collected in the shared page cache. This is the same data that is visible using statmonitor and VSD, although statmonitor and VSD can collect additional OS level information. This additional OS level information is also available programmatically; see Host Statistics
A process can only access statistics that are kept in the shared page cache to which it is attached. Sessions that are running on a different node than the Stone use a separate shared cache on that remote node. This means that processes that are on a different node than the Stone, cannot access statistics for the Stone or for other server processes that are attached to the Stone's shared page cache.
Within the shared page cache, GemStone statistics are stored as an array of process slots, each of which corresponds to a specific process. Process slot 0 is the shared page cache monitor. On the Stone’s shared page cache, process slot 1 is the Stone; on remote caches, slot 1 is the page server for the Stone that started the cache. Subsequent process slots are the page servers, Admin and Reclaim Gems, Symbol Gem, and user Gems. The order of these slots depends on the order in which the processes are started up, and is different on remote caches.
The specific set of statistics is different for each type of process that can attach to the shared page cache. The types of processes that are programmatically accessible are numbered:
1 = Shared page cache monitor
2 = Stone
4 = Gem Page server
8 = Gem (including Topaz, GBS, and other GCI applications).
128 = Page Manager thread
256 = Stone restore thread
512 = A thread within the Gem
1024 = Stone AIO thread
2048 = Stone free frame thread
4096 = Remote cache page server thread
8192 = Remote gem page server thread
Other numbers includes those for shared counters, platform-specific OS system statistics, and so on. The specific process types, process type numbers, and the specific statistics associated with that process type are written in the header portion of the statmonitor data files. All these may vary between GemStone releases, as statistics are added or removed.
The following examples demonstrate access to statistics for the Stone, Shared Page Cache and the current Gem. There are equivalent methods for other process types.
To obtain the value for a specific statistics for the Stone, the Stone’s SPC monitor, or for the current session, use the following methods:
System class >> stoneCacheStatisticWithName:
System class >> primaryCacheMonitorCacheStatisticWithName:
System class >> myCacheStatisticWithName:
These methods will return the statistics value corresponding to the given statistic name for that process. If the statistics name is not found, it returns nil.
For example, to retrieve the statistics named ‘CommitRecordCount’ for the Stone:
topaz 1> printit
System stoneCacheStatisticWithName: 'CommitRecordCount'.
%
23
To retrieve the current session’s PageReads:
topaz 1> printit
System myCacheStatisticWithName: 'PageReads'.
%
548
The general way to retrieve statistics is as an array of values. To understand what the value at each index refers to, there are corresponding description methods to return an array of Strings. Matching the index of the statistic name to the index within the values locates the value for that statistic.
Since the statistics are different for the different types of processes, you will need to use corresponding methods to collect the statistics and the descriptions.
For the Stone, the Gem that is running the code, and the Stone’s shared page cache monitor, no further information is needed to identify them within the cache, so the following pairs of methods can be used:
System cacheStatisticsDescriptionForGem.
System myCacheStatistics.
System cacheStatisticsDescriptionForStone.
System stoneCacheStatistics.
System cacheStatisticsDescriptionForMonitor.
System sharedPageCacheMonitorCacheStatistics.
For example, while you would normally use stoneCacheStatisticForName:, here is another possible way to get the CommitRecordCount:
topaz 1> printit
| index |
index := System cacheStatisticsDescriptionForStone
indexOf: 'CommitRecordCount'.
System stoneCacheStatistics at: index.
%
23
To collect statistics for other Gems, and for page servers, you need to determine the process Id, session Id, or slot of the specific Gem or page server, or the cache name of the Gem. There are a variety of ways you might determine this, but one way is to examine the results of:
System cacheStatisticsForAllSlotsShort
This method returns the name, process Id, session Id, statistics type, and process slot for each process currently attached to the cache. For example:
topaz 1> printit
(System cacheStatisticsForAllSlotsShort) collect:
[:ea | ea printString]
%
an Array
#1 anArray( 'ShrPcMonitor', 7722, 4294967295, 1, 0)
#2 anArray( 'gs64stone', 7721, 0, 2, 1)
#3 anArray( 'FreeFrmPgsvr2', 7725, 4294967294, 4, 2)
#4 anArray( 'AioPgsvr3', 7726, 4294967294, 4, 3)
#5 anArray( 'pagemgrThread', 7729, 1, 8, 4)
#6 anArray( 'GcAdmin5', 7734, 2, 8, 5)
#7 anArray( 'SymbolGem6', 7735, 3, 8, 6)
#8 anArray( 'GcReclaim6_7', 7733, 4, 8, 7)
#9 anArray( 'Gem26', 2271, 5, 8, 8)
#10 anArray( 'Gem27', 16924, 6, 8, 9)
Of course, a Gem may log out between the time you execute this and the time you collect statistics, so be sure that your code handles that condition gracefully.
The methods you use to get the statistics and the corresponding descriptions will depend on how you have determined the specific process you want information about.
System cacheStatisticsForProcessWithCacheName: aString
(You must manually determine the process type)
System cacheStatsForGemWithName: aString.
System cacheStatisticsDescriptionForGem.
By operating system Process Id (PID):
System cacheStatisticsProcessId: aPid.
System cacheStatisticsDescriptionAt:
(System cacheSlotForProcessId: aPid).
System class >> cacheStatisticsAt: aProcessSlot
System class >> cacheStatisticsDescriptionAt: aProcessSlot
The page server for a Gem assumes the same sessionId as its Gem.
System gemCacheStatisticsForSessionId: aSessionId.
System cacheStatisticsDescriptionForGem.
System cacheStatsForPageServerWithSessionId: aSessionId
System cacheStatisticsDescriptionForPageServer
For example, to find an aggregate value for TimeInFramesFromFindFree of all Gems in the system:
topaz 1> printit
| gemPids index time |
gemPids := Array new.
System cacheStatisticsForAllSlotsShort do:
[:anArray |
(anArray at: 4) = 8 ifTrue:
[gemPids add: (anArray at: 2)].
].
index := System cacheStatisticsDescriptionForGem indexOf:
'TimeInFramesFromFindFree'.
time := 0.
gemPids do: [:aPid | | stats |
stats := System cacheStatisticsProcessId: aPid.
stats ifNotNil: [time := time + (stats at: index)].
].
time
%
To make it easier for you to track cache statistics for specific Gems, you can explicitly give each Gem a unique name. The method
System cacheName: aString
sets the name for the current Gem session in the cache statistics, thus making it much easier to read the statistics in VSD.
When using topaz, the -u command line argument will set the cache name for logins from topaz.
Otherwise, set the cache name soon after login. If you are collecting statistics information using statmonitor, information may be logged using the default name for the Gem when the session first logs in, in which case so you may have two separate lines of data for the same session.
System setGemKind: anInteger
Is another way to uniquely identify Gems in the cache. This set a cache statistic GemKind to the given number. This is 0 by default; you can set this to any positive integer. GemStone uses negative values for system Gems such as the ReclaimGem and AdminGem.
In addition to the system-generated statistics listed below, GemStone provides a facility for defining session statistics — user-defined statistics that can be written and read by each session, to monitor and profile the internal operations specific to your application.
There are 48 session cache statistic slots available, with names of the form SessionStat01...SessionStat47.
You can use the following methods to read and write the session cache statistics:
System class >> sessionCacheStatAt: anIndex
Returns the value of the statistic at the designated index. anIndex must be in the range -2 to 47. Negative indexes are reserved for internal use.
System class >> sessionCacheStatAt: anIndex put: aValue
Assigns a value to the statistic at the designated index and returns the new value. anIndex must be in the range -2 to 47. Negative indexes are reserved for internal use.
System class >> sessionCacheStatAt: anIndex incrementBy: anInt
Increment the statistic at the designated index by anInt, and returns the new value. anIndex must be in the range -2 to 47. Negative indexes are reserved for internal use.
System class >> sessionCacheStatAt: anIndex decrementBy: anInt
Decrement the statistic at the designated index by anInt, and returns the new value. anIndex must be in the range -2 to 47. Negative indexes are reserved for internal use.
System class >> sessionCacheStatsForProcessSlot: aProcessSlot
Return an array containing the 48 session statistics for the given process slot, or nil if the process slot is not found or is not in use.
System class >> sessionCacheStatsForSessionId: aSessionId
Return an array containing the 48 session statistics for the given session id, or nil if the session is not found or is not in use.
In addition to the Gem session statistics, GemStone/S 64 Bit provides global session statistics — user-defined statistics that can be written and read by any Gem on any Gem server. Unlike session cache statistics, which are stored in the shared page cache of the machine that the Gem is running on, global session statistics are stored in the shared page cache of the Stone. Global session statistics are not transactional. For a given statistic, every session sees the same value, regardless of its transactional snapshot view.
There are 48 global cache statistic slots available, with names of the form GlobalStat00...GlobalStat47.
You can use the following methods to read and write the global cache statistics:
System class >> globalSessionStatAt: aProcessSlot
Returns the value of the statistic at the designated slot (must be in the range 0..47).
System class >> globalSessionStatAt: aProcessSlot put: aValue
Assigns a value to the statistic at the designated slot (must be in the range 0..47) and returns the new value. The value must be a SmallInteger in the range of -2147483648 to 2147483647.
System class >> incrementGlobalSessionStatAt: aProcessSlot by: anInt
Increments the value of the statistic at the designated slot by anInt and returns the new value of the statistic. The value anInt must be a SmallInteger in the range of -2147483648 to 2147483647.
Process-level statistics require an OS call, which can cause cache statistics to impact performance. These statistics are not part of the information returned by regular cache statistics interface methods. To get this information, use the following methods.
System class >> hostProcessStatisticsNames
Returns an array of Strings which are the names of the per-process statistics provided by this host.
System class >> hostStatisticsForMyProcess
Returns an array of SmallIntegers which represent the host statistics for this process. The names of each statistic are returned by the #hostProcessStatisticsNames method.
System class >> hostStatisticsForProcess: processId
Returns an array of SmallIntegers which represent the host statistics for the process with the given process ID. The names of each statistic are returned by the #hostProcessStatisticsNames
Specific methods are also available to return the host CPU statistics only:
System class >> hostCpuStatsForProcessId: anInt
Return an Array of two integers as follows:
1 - user mode CPU milliseconds
2 - system mode CPU milliseconds
Both array elements will be -1 if the process slot is out of range or not in use or if this method is not supported for the host architecture.
It is not required that the process with pid anInt is attached to the shared page cache or even is a GemStone process. The method will succeed for any process for which the Gem session executing the method has permission to view the target process’ CPU usage statistics.
System class >> hostCpuStatsForProcessSlot: anInt
For the process using the cache process slot anInt, return an Array of two integers as follows:
1 - user mode CPU milliseconds used
2 - system mode CPU milliseconds used
Both array elements are set to -1 if the process slot is out of range or not in use, or if this method is not supported for the host architecture.
While most monitoring is of the object server and session processes, it is also useful to monitor the performance of the operating system that is running GemStone. On host platforms that support it, the following methods return statistics provided by the operating system. This is the same information that is available via statmonitor; see statmonitor.
System class>> fetchSystemStatNames
Return an array of Strings with the names of the available OS level statistics. The length is host-dependent. If the host system does not support system statistics, this method returns nil.
System class >> fetchSystemStats
Return an array of Numbers corresponding to the names returned by he #fetchSystemStatNames method. The length of the result array is host dependent. While most elements in the result array will be SmallIntegers, the result may also contain other types of Numbers such as SmallDoubles, Floats, LargeIntegers, etc. If the host system does not support system statistics, this method returns nil.
You can also monitor specific CPU usage for the host using the following method:
System class >> hostCpuUsage
Returns an Array of 5 SmallIntegers with values between 0 and 100 which have the following meanings:
1 - Percent CPU active (user + system)
2 - Percent CPU idle
3 - Percent CPU user
4 - Percent CPU system (kernel)
5 - Percent CPU I/O wait
On hosts with multiple CPUs, these figure represent the average across all processors. The results of the first call to this method are invalid and should be discarded. Returns nil if the host system does not support collecting CPU statistics.
On Linux, some memory statistics are read from /proc/pid/smaps, which has restricted access. statmonitor and gem methods which collect per-process statistics cannot collect smaps statistics for processes owned by other users. When running with the NetLDI in guest mode with captive account, there is no problem, since each gem process is owned by this account.
However, when running with the NetLDI owned by root with the s bit set, Gem processes are owned by the account logging in. The specific memory statistics normally based on smaps will be reported as zero.
If these statistics are required, statmonitor requires additional configuration to be able to read these memory statistics.
os$ setcap cap_sys_ptrace=pe $GEMSTONE/bin/statmonitor
os$ cd $GEMSTONE/sys
os$ chown root $GEMSTONE/bin/statmonitor
os$ chmod u+s $GEMSTONE/bin/statmonitor
Additional configuration is needed, in a system with Netldi running as root with the s bit, for one Gem to access the memory statistics of another Gem using methods such as System hostStatisticsForProcess:.
For security reasons, it is not recommended to give all Gem processes the cap_sys_ptrace capability; and setting this capability prevents gdb attaching, so pstack and kill -USR cannot write C stack traces. If a Gem must programmatically monitor the memory statistics for other Gem processes, you should configure the monitor Gem to run with a different gem executable than standard Gems in your environment, and give this gem executable the required capability.
1. Make a copy of $GEMSTONE/sys/gem for an RPC Gem, or $GEMSTONE/bin/topaz if this process will run as linked topaz. For example,
os$ cp $GEMSTONE/sys/gem $GEMSTONE/sys/gemTrace
os$ cp $GEMSTONE/bin/topaz $GEMSTONE/bin/topazTrace
2. Give this copy the cap_sys_ptrace capability.
os$ setcap cap_sys_ptrace=pe $GEMSTONE/bin/topazTrace
os$ setcap cap_sys_ptrace=pe $GEMSTONE/sys/gemTrace
3. For an RPC gem, create a custom gemnetobject, and use this for login.
os$ cp $GEMSTONE/sys/gemnetobject $GEMSTONE/sys/gemnetobject_trace
Edit your custom gemnetobject, for example gemnetobject_trace, to set the gemname that invokes this executable, by modifying this line:
gemname="gemTrace"
Login using your custom gemnetobject in the login parameters:
topaz> set gemnetid gemnetobject_trace
4. If you are running NetLDI with the startnetldi -n option, you must also add an entry to the services.dat file. See theSystem Administration Guide, Appendix A, for more details.
On Linux, the statprom utility allows you to use Prometheus monitoring software to monitor GemStone cache statistics.
Prometheus is an open-source systems monitoring and alerting toolkit, that collects and stores metrics (numeric data) as time series, along with key-value tags. Prometheus is widely used, and the Prometheus Github project has a active developer and user community.
In addition to Prometheus, Grafana can be installed and used for live monitoring of Prometheus data; Grafana provides out of the box support for Prometheus, and no additional configuration is needed to collect the GemStone data from Prometheus.
When statprom is started, it takes an JSON-format argument configuration file that is customized for the cache and monitoring requirements; this includes the port number, the Stone name, and specific statistics to report.
For details on configuring statprom and using it to monitor Prometheus, see the supplemental documentation:
https://docs.gemtalksystems.com/current/monitorwithprometheus