GemStone/S 64 Bitâ„¢ 3.6.6 is a new version of the GemStone/S 64 Bit object server. Version 3.6.6 provides new features and fixes for a number of significant bugs. We recommend everyone using or planning to use GemStone/S 64 Bit upgrade to this new version.
These Release Notes include changes between the previous version of GemStone/S 64 Bit, v3.6.5, and v3.6.6. If you are upgrading from a version prior to 3.6.5, review the release notes for each intermediate release to see the full set of changes.
The Installation Guide has not been updated for this release. For installation, upgrade and conversion instructions, use the Installation Guide for version 3.6.2.
GemStone/S 64 Bit version 3.6.6 is supported on the following platforms:
Distributions for Solaris/x86 and Solaris/SPARC are no longer available.
For more information and detailed requirements for each supported platforms, please refer to the GemStone/S 64 Bit Installation Guide for that platform.
The following versions of GBS are supported with GemStone/S 64 Bit version 3.6.6:
The GemStone/S 64 Bit v3.6.6 distribution includes VSD version 5.6. The previous version of GemStone/S 64 Bit, v3.6.5, included VSD v5.5.4. VSD version 5.6 includes several bug fixes and new features. For details on the changes, see the Release Notes for VSD v5.6.
VSD v5.6 is included with the GemStone distribution, and can also be downloaded as a separate product. For details or to download, go to https://gemtalksystems.com/vsd/.
The previous implementation of Repository >> objectAudit could in some cases miss problems (#50419). The earlier implementation aborted as needed after an initial scan, to avoid creating a commit record backlog. While it could safely run while the repository is in use, it could not perform a definitive scan to ensure there were no invalid references.
As of v3.6.6, objectAudit will now run in transaction. While running in transaction allows objectAudit to perform a complete audit, it does risk a commit record backlog if it is run on a system that is in use. The earlier behavior is available using new methods.
If you use objectAudit on active systems, you should review your scripts or monitor for commit record backlog.
The following methods provide the old objectAudit behavior (aborting if necessary during the scan), now described as partial.
fastObjectAuditPartial
Similar to objectAuditPartial except executes as fast as possible.
objectAuditPartial
Similar to objectAudit, except that it aborts after processing the scavengable pages and audits the rest of the repository aborting as needed to avoid causing a commit record backlog. Does not check all object table references for validity, but will detect most references to non-existent objects.
objectAuditPartialWithMaxThreads: numThreads
Similar to objectAuditPartial except that it allows specifying the number of threads to use for the scan.
Normally, you should always ensure that GemStone image and executables are the same version; that is, you should always run upgradeImage immediately after starting a stone using executables from a new version. However, in the hotstandby upgrade process, you may temporarily run the slave system on a mismatched system.
In previously releases, no error nor warning was raised at login time if the first two version digits matched, e.g. there was no error or warning message on login if 3.6.5 executables were started with a 3.6.4 image; errors would occur only over major releases, e.g. a 3.5.x image and 3.6.x executables.
Now, this error is always signalled to avoid accidentally using the wrong executable/image combination. This is a fatal error for any user other than SystemUser. With a mismatch between executables and image version, you may only login as SystemUser.
During the upgrade process, after the stone has started but before upgradeImage has completed, it is helpful for cache warmers to be able to log in, since cache warming may improve upgrade performance. It is now allowed for cache warmers to login before upgradeImage is complete, for major version upgrades as well as minor version upgrades.
The method AbstractDictionary >> at:ifPresent: was added in v3.5 with code to support X509 logins, and inadvertently removed from the image during code cleanup for v3.6. This method has been restored.
If a String is compared to a ByteArray using =, and they both contain the same bytes (e.g. ' ' = #[32]), in previous releases, the comparison returned true; while a similar comparison between a ByteArray and a String (.e.g #[32] = ' ') returned false. (#49031)
In previous releases, Time has included microsecond resolution; however, methods to support using microsecond times were limited.
The following methods have been added:
asStringUs
Returns a String that expresses the receiver in local time in the format HH:MM:SS.ssssss where sss are microseconds.
addMicroseconds: anInteger
Returns a Time that describes a time of day anInteger microseconds later than that of the receiver.
subtractMicroseconds: anInteger
Returns a Time that describes a time of day anInteger milliseconds earlier than that of the receiver.
copydbf -i on a programmatic backup now includes an additional line if the repository was in partial logging mode or has tranlogs set to /dev/null:
Repository in partial logging mode or writing tranlogs to /dev/null
The calculations of the number of large memory pages was not entirely accurate, and could produce too low values for very large caches. (#50216).
In addition, the -r option has been added to largememorypages, which calculates the page requirements for a remote cache, which is somewhat smaller than for the Stone’s cache.
The multithreaded stopnetldi utility did not poll and wait efficiently between threads, and is now much faster.
The handling of the madvise kernel parameters setting from GemStone code was insufficient to reliably allow a temporary object cache to use transparent huge pages. This has been adjusted so that transparent huge pages will be used with TOC sizes of 400MB or greater, if /sys/kernel/mm/transparent_hugepage/enabled is set to "madvise."
Character codePoint: 16rA0 is the non-blocking space . If this character is included in a string, previously topaz displayed this the same as a standard space, code point 16r20. Now, non-blocking spaces will be displayed as '\ua0' to avoid confusion.
White space characters outside of the ASCII range, which includes codePoint 16rA0 and others, are not considered whitespace by the GemStone compiler.
In some cases, GEM_HALT_ON_ERROR may be set in a Gem's configuration, but when the error occurs, the gem log reports the error but does not prints stacks.
Client logins invoke the GCI login function GciLoginEx_(), which includes an argument, haltOnError. If this is set to 0 it disables the GEM_HALT_ON_ERROR from the gem configuration; it should be set to -1 to specify using the configuration value.
The GciLogin() call incorrectly invoked GciLoginEx_() with a 0. (#50275)
GCI applications or client applications such as GBS and Jadeite may be calling GciLoginEx_() with a 0 for the haltOnError argument. This requires a change in the application code.
If the value of STN_GEM_TIMEOUT is zero, the login timeout is now 1 minute rather than 5 minutes.
The following methods have been added, which report the SessionId and other information for active cache warmers. This allows you to detect when startup cache warming is complete.
System class >> cacheWarmerSessions
Returns an Array of sessionIds of cache warmer sessions.
System class >> cacheWarmerSessionsReport
Returns a String describing cache warmer sessions.
The timestamp printed in the statmonitor data file header is now formatted as ISO 8601, rather than the GemStone’s legacy DateTime format.
SharedCounters (AppStats) and PersistentCounters are customer-usable cache statistics which are recorded directly in the cache, unlike SessionCacheStats. These are collected by specifying the statmonitor -n and -B flags, respectively.
In previous releases, the numbering of SharedCounters in smalltalk methods was off by one for the numbering in the statmonitor/VSD. This has been corrected; e.g., for an expression such as
System sharedCounter: 1 setValue: 0.
Previously, this statistic would have the name sharedCounter2 (under the process AppStat) now it is sharedCounter0001.
The number of shared counters is defined in the configuration file, and defaults to 1900; the leading 0 allows correct sorting.
SharedCounters are now recorded correctly as 64 bit signed integers.
The offset for Persistent counters was correct, but these have also been reformatted for sorting; names that were previously similar to PersistentCounter001 (under the process PersistentCounters) are now PersistentCounter0001. Since the maximum number of persistent counters is 1536, this ensures sorting in VSD will be correct.
The following private methods have been removed:
IdentityBag >> _basicAdd:
IdentityBag >> _rcIncludes:
IdentityBag >> _rcIncludesValue:
Integer >> _floatParts
Repository >> _objectAuditWithMaxThreads:waitForLock:
pageBufSize:percentCpuActiveLimit:csvFile:repair:
SmallInteger >> _floatParts
System class >> _sessionsReportExcluding:
Previously, when a slave system was being converted into normal mode, you had to perform a stopContinuousRestore, and then login again to perform the commitRestore.
Now, sending commitRestore will stop continuous restore if it is running, and then perform the commit restore. It is unnecessary to perform a separate stopContinuousRestore.
The following methods have been added to make hot standby failover simpler.
Repository >> commitRestoreForFailoverAfterWaitingUpTo: seconds
On a slave stone, waits up to seconds for the failover record from the master stone to be replayed. If seconds is 0, this method checks for the fail over record once and does not block. If seconds is -1, this method waits forever for the fail over record. If the failover record is detected, then a commitRestore is executed and the session is terminated with a RestoreLogSuccess error (4048). If the failover record is not detected within the timeout, returns false.
Repository >> failoverFromMasterFinished
On a slave stone, returns a boolean indicating if the failover record from the master has been replayed. Returns true if the failover record has been replayed or false otherwise.
Repository >> waitForFailoverFromMasterUpToSeconds: seconds
On a slave stone, waits up to seconds for the fail over record from the master stone to be replayed. If seconds is 0, this method returns a status immediately and does not block. If seconds is -1, this method waits forever for the failover record. Returns true if the fail over record has been replayed or false if the timeout expires.
It is now possible to perform a GemStone upgrade of a hot standby system, by upgrading the slave, failing over to make this the master, and allowing the former master, new slave to be upgraded via transaction logs.
This does not require changes in the originating version, and this process can be used to upgrade from earlier versions to v3.6.6. However, some methods that make this process easier are not present in the earlier version, and thus may not be available at the point where they will be useful.
This upgrade process has been recently developed and may need refinement, and has not been verified in a multiple slave system. It has been tested for upgrade from 3.6.5 and 3.6.4 to 3.6.6.
1. Stop the slave system’s logreceiver and the slave stone. Restart both using the v3.6.6 binaries, and restart the continuous restore. Do not perform the upgradeImage yet.
The slave will continue to restore transaction from the master, although logins will report version mismatch errors. Login as SystemUser is allowed. These errors are temporary until the upgrade process is complete.
You should not attempt to execute other code on the slave; the version mismatch means that methods (other than those required for running as a slave) may fail unexpectedly. This include user errors; if for example you enter an incorrect path for continuous restore, the error message cannot be handled or provide details.
2. On the master, which is still on the original version, perform failOverToSlave. This stops further commits, checkpoints, and puts the now former master into restore mode.
Commits are now disallowed on the master.
3. Wait for the master’s failover transaction to be replayed on the slave. You can determine this using SystemRepository restoreStatusInfo at: 13, which returns a non-zero timestamp when the failover transaction is replayed. On the slave system, execute:
[ 0 == (SystemRepository restoreStatusInfo at: 13) ]
whileTrue: [ System _sleepMs: 500 ].
In future upgrades, Repository >> waitForFailoverFromMasterUpToSeconds: can be used here; but since the slave’s image has not yet been upgraded, this method is not available.
Once the failover transaction has been replayed, stop the master stone and the logsender. The former-master stone may be left running to service read-only operations, until Step 6.
4. On the slave, stop the logreceiver, and perform the stopContinuousRestore and do commitRestore. This makes the now former slave into the new master.
5. On the new master (the former slave), perform the upgradeImage and any other required upgrade steps.
The new master is now available for logins.
Start the logsender on this new master.
6. On the former master, now slave system restart the stone and logreceiver using the 3.6.6 binaries, and start continuous restore (continuousRestoreFromArchiveLogs:). Do not perform upgradeImage.
As the transactions coming from the new master are replayed on the new slave, the new slave will be upgraded to v3.6.6. There will be a lag after the transactions are restored before next checkpoint completes, and the version is updated. Until this time, logins to the new slave system will continue to see a version mismatch error.
A master stone's logsender did not correctly service multiple slave systems logreceivers. If multiple logreceivers were connected to a single logsender, the data sent to the logreceivers was out of sync, and the logreceivers reported validation errors. (#50064)
After a failOverToSlave from NodeA to NodeB and a subsequent second failOverToSlave from NodeB back to NodeA, there were issues on the new master NodeA.
NodeA's transient free oop list could have contained the oops of committed objects, which could result in corruption; this issue was cleared by a stone restart. (#50147)
The master stone NodeA was also left with commits suspended, which required executing resumeCommits. (#50155)
When the STN_TRAN_LOG_DIRECTORIES directory of the slave system in a hotstandby system contains obsolete tranlogs, starting continuous restore may crash when attempting to read logs. In addition, now if a previously running continuous restore (hotstandby) has stopped due to a failure, the restoreStatus includes "continuous restore failed". (#49863).
When a Gem crashes, the Shrpcmon performs recovery to clean up spin locks. In versions before v3.6.5, the Shrpcmon process could SEGV if an address into the process table was out of range (bug #49988, fixed in v3.6.5). Additional codepaths outside of the Shpcmon, such as the Stone, which could also SEGV on an attempt to access in invalid offset; these are fixed in this version. (#50005)
The shared page cache monitor frame lock recovery may crash under some circumstances with ongoing page reads, resulting in the message "Freezing shared page cache" (#50082)
Gems may encounter a GCI protocol error, due to socket read returning bytes that contain an incomplete LGC_PAD_N packet (#50045)
When the Gem configuration parameter #GemAutoServiceSigAbort is set to false, sessions could still receive signal #3007/TransactionBacklog. (#50142)
The multithreaded scan operation that executes listInstances scan may miss objects, if one or more objects are shadowed (that is, they are updated during the scan) and the scan aborts its view. (#50407)
During the voteNotDead stage of garbage collection, the revote requires keeping a closure of candidate dead objects; this could grow to consume excessive memory on a system that had many changes, such as after upgrade or index rebuild. (#50362)
The primitive that supports Array and OrderedCollection >> copyFrom:to: (prim 817), contains an unsafe object allocation. There is a race condition, if a scavenge is triggered by an object faulting into memory after the new object is allocated; the new object could be initialized to zero rather than OOP_NIL, causing a Gem SEGV #(50277)
A timing condition when the Stone shuts down during a window in the Gem login process may leave the Gems alive after the Stone is shut down. (#50204)
With very large temporary object memory, a Gem make take considerable time to verify memory before shutting down. The verification is not critical, and is now skipped in the standard (fast) environment. (#50314)
After a crash, it was possible for restart to have incorrect state for new epoch objects and write set union. (#50470)
There is a code path in which an a class can be found dead by MFC or epoch, when there is only one object referecing that class, due to non-thread-safe update. (#50460)
The multithreaded scan operations update values for a number of session stats; it was unnecessarily also clearing cache statistics that it did not update. (#40411)
To see session statistics that are in use for a specific scan operation, open the statistics in VSD and apply the relevant aliases; this provides internally-useful labels for stats that are updated by that scan.
The statistics TimeInUpdateUnionsCommit, TimeProcessingCommit, and TimeStoneCommit may be incorrectly understated or reported as zero on a fast system; these were rounded to ms prior to summing, and thus lost elapsed times in microsecond ranges. (#50422)
If the network connection table was full, including with sessions that were zombies or in login or logout status, new sessions attempting to log in could fail login with the error (depending on version), "the maximum number of users is already logged in". Now, the Stone is more aggressive about processing zombies when the table is full. (#49927)
There is a race condition between the multiple threads of the cache warmer, which could result in a state flag being set incorrectly, such that all threads were waiting and no progress was made. (#50258)
After invoking GsTsExternalSession >> nbLogin, either waitForReadReady, waitForReadReadyTimeOut:, waitForResult, or waitForResultForSeconds:, should be called, to detect when the login has completed.
waitForResult resulted in a SIGSEGV.
waitForReadReady, waitForReadReadyTimeOut:, waitForResultForSeconds: errored with call not in progress, but did not crash. (#50203)
If a repository has GsPackagePolicy enabled (generally a Seaside or GsDevKit application), and the Locale does not specify a period (US-style) decimalPoint, previousVersion testing failed, and upgradeImage reported an error. (#50063)
The ObsoleteClasses dictionary contains classes that are no longer part of the GemStone kernel, but that are retained so that upgraded images that contain objects of these classes continue to work. These classes were unnecessarily being updated during upgradeImage, which for some upgrade paths produced errors in upgrade. (#50327)
It was possible for programmatic backup, possibly due to hardware issues, to produce a backup file that included OOPs that were higher than the OOP high water for the backup file (which is stored in the backup when the backup is initiated). Restoring this backup produced errors and potentially crashed or corrupted the Gem and/or Stone. Now, OOPs higher than the high water mark are detected and rejected. (#50081)
The SymbolGem’s log file could excessively bloat with log messages, particularly commit messages related to symbol garbage collection. (#50112)
Setting the name recorded in statmonitor data using the topaz command set cachename, in linked topaz, did not actually update the name of the process as recorded in statmonitor. (#50053)
If a class has no instance variables, instVarNames is an empty Array, which is canonicalized to the object with oop 233217, which is in SystemObjectSecurityPolicy. If this class is sent objectSecurityPolicy:, the code was incorrectly attempting to reassign the object security policy of this empty Array, which reported a SecurityError. (#49921)
Dividing an Integer by SmallInteger minimumValue resulted in an IntenalError, StoreSmallInt out of range. (#50111)
If the parseLiterals: argument to Number class >> parseLiterals:exponent: was nil, rather than a valid character, it resulted in an error. Now, the method completes without effect as documented. (#49817)
For some input values, the method ScaledDecimal >> asFloat and Fraction asFloat could return slightly incorrect values. These method have been reimplemented, and return the closest floating point number, rounding to nearest even in case of a tie. (#50071, #50170)
The login log, enabled by the configuration parameter STN_LOGIN_LOG_ENABLED, was intended to record failed logins as well as logins that succeeded. However, failed logins were not being recorded. (#50169)
When a Gem is executing GsFile read primitives, and the Stone was shutdown, the Gem was not interrupted, which could leave the shared memory allocated. (#50186)
The errors reported when a Gem is unexpectedly disconnected has been improved to make debugging easier: now the specific code and the affected socket are included in the error messages. (#50242)
The methods Repository >> reclaimAllWait: and waitForVoteStateIdleSecs: check if voting is occurring. This was checking the wrong voteState result. This bypassed reporting on specific sessions that were holding up voting. (#50114).
The ExportedDirtyList is used in GBS to correctly handle the state of objects replicated to the client, independent of having references on the client. However, the process of performing an ICU-based unicode comparison (regardless of whether the server is in Unicode Comparison Mode), caused a bit to be unset and the object was not returned in the ExportedDirtyList. (#50266)
When the client executable (linked topaz or Gem) has the s-bit set, and is started by a user other then the owner of the executable, System class >> performOnServer: failed due to permission errors on temporary files. (#50191)
The messages composed by subclassResponsibility and subclassResponsibility: reported the receiver, rather than the class of the receiver; in addition subclassResponsibility: incorrectly parsed the selector for class methods. (#50072)
The configuration parameter STN_TRAN_LOG_DEBUG_LEVEL, when set to values greater than 1, cause additional information to be written to the transaction logs; this should only be done as instructed by GemTalk Engineering. In v3.6.6, the output when STN_TRAN_LOG_DEBUG_LEVEL is set to level 1 includes page allocation information that previously required a higher level. The additional transaction log space requirement at level 1 is expected to be on the order of 1% of additional space over the requirement for level 0. (#50241)