4. Bug Fixes

Previous chapter

The following bugs were present in v3.7.1 and are fixed in v3.7.2.

Gem out-of-memory issues

Incorrect handling of code_gen space in Gem memory

The code_gen space, used to keep temporary instances of methods, had deficiencies in handling large amounts of methods, which could result in out of memory errors. Specifically, instances of GsNMethod in the code_gen memory area were not pruned correctly during in-memory garbage collection, and in-memory GC of code_gen is not sufficiently aggressive enough under heavy load conditions (#51019, #51066)

Computed size of code_gen too small

The calculation of the code_gen size based on GEM_TEMPOBJ_CACHE_SIZE in 3.7.x created a smaller size area than in earlier versions.

A configuration parameter has been added to control the code_gen size; see GEM_TEMPOBJ_CODE_SIZE

Abort failed to clear references from modified committed objects to in-memory objects

No-longer valid references from committed objects to in-memory objects were not cleared on abort, which resulted in the subsequent in-memory garbage collection not freeing as much memory as was possible. (#51073)

Memory issues from fillFrom:resizeTo:with:

The primitive invoked by Array >> fillFrom:resizeTo:with: and KeyValueDictionary >> fillFrom:resizeTo:with: was subject to out of memory errors. The primitive has been removed; for details on how this has been handled, see fillFrom:resizeTo:with: no longer usable. (#50990).

AlmostOutOfMemory handlers not effective if needed while in primitive

Defining exception handlers for AlmostOutOfMemory or AlmostOutOfMemoryError allows you to catch conditions in which your Gem would otherwise run out of temporary object memory and exit, and perform some actions to make more memory available. However, if the operation that required more memory than is currently available occurred when the code is executing in a primitive call, neither AlmostOutOfMemory nor AlmostOutOfMemoryError handlers are able to intercept the exception and take action to avoid an exit with out of memory.

Now, some primitives that are more likely to see this condition now perform checking before attempting to allocate memory, including the primitives invoked by methods such as String >> copy, String >> addAll:, and String >> ,. If the result would consume more than 16K bytes of temp object memory, it checks for available space in temporary object memory. If space is insufficient, a VM memory mark/sweep is executed, and if space is still insufficient, it signals a not-resumable AlmostOutOfMemoryError; regardless of the state of either of AlmostOutOfMemoryError class >> enabled or AlmostOutOfMemoryError class >> threshold. An AlmostOutOfMemoryError handler will need to determine how to handle the situation; it is recommended to use AlmostOutOfMemoryError rather than AlmostOutOfMemory (which is a kind of Notification). (#51076)

Issues in Native code generation

Results of ifTrue/ifFalse blocks

Issues have been found in native code that caused code errors or SEGV. These issues are exposed by expressions that assign the result of ifTrue/ifFalse expressions that return Array constructors. (#51093, #51094)

Incorrect handling of AlmostOutOfStack after process switch

When a GsProcess switch occurred, the stack limit is set too low for a GsProcess that is executing within the yellowZone handling an AlmostOutOfStack or AlmostOutOfStackError. (#51168)

Conflict handling in Reduced-conflict collections

A number of improvements have been made in RC collections to avoid commit conflicts. In particular, RcIdentityBag has internal changes. See Improvements to reduced-conflict collections for more information.

Failure to detect write conflict after reduced-conflict replay

When two sessions make changes to the same object in an reduced-conflict collection, the second session to commit sees a write-write conflict during commit, selectively aborts, and replays the change. If there was another non-RC commit meanwhile to this same object, such that it should result in a second write-write conflict, it was not detected by the replay. This caused a later Stone crash with errors such as "page NNN already shadowed". (#50235)

RcIdentityBag conflicts with unexpected larger sessionIds

When a new session, that has a sessionId greater than any sessionId that has previously made changes to an RcIdentityBag, adds or removes an element, an internal structure may need to be enlarged. If another session was also making potentially conflicting changes to the RcIdentityBag, the replay did not handle this correctly and could trigger a concurrency conflict. (#51033)

RcKeyValueDictionary rebuildTable likely to cause commit conflicts

When multiple sessions are modifying an RcKeyValueDictionary and adding keys that were not previously present, the internal structure supporting the dictionary may require rebuilding to a larger size, to accommodate the additional keys. If a commit conflict occurs, the rebuild table is replayed, but it was likely to encounter further commit conflicts. (#51032)

Issues related to reclaim

ReclaimGem parameter #reclaimMinFreeSpaceMb not respected

This GcGem parameter is designed to suspend reclaim activity when the amount of free space in the repository is low, to avoid using up all free space. The check for this condition was being not done at the point where it would have been effective, and did not avoid reclaim bringing down free space. This parameter has been removed; see Improved reclaim behavior with low free space. (#51260, #51258)

After commitRestore, ReclaimGem may not be restarted

Under rare conditions, after a commitRestore, the automatic restart of the ReclaimGem may not be performed. (#51001).

Signal handler chaining may cause SEGV in linked session

If the java library is loaded into a process before the GemStone linked library, then handling of a SEGV from Java may cause a SEGV in the GemStone code. Loading libjvm into a linked or RPC gem it is not an issue, since the libjvm signal handler will run first and the SEGV is not likely to occur; this bug primarily affects GemBuilder for Java. The GemStone VM does not produce SEGVs normally, unless native code is disabled (in which case SEGV is used to implement AlmostOutOfStack). (#51106).

Improved performance of restoreFromTranlogs

During restoreFromTranlogs, the restoreFromTranlogs did not track the volume of changes, and thus did not commit often enough. When the number of modified data pages and/or write set became very large, commit is inefficient. restoreFromTranlogs now tracks the commit size, and commits more frequently. (#50960)

Insufficient information on extent write errors that succeed on retry

When a write to the extent encounters a write error, it retries the write, which may succeed. These errors are noted in the stone log in case they indicate I/O issues. Now, more detailed information is provided about the write error. (#51105)

Risk of stuck spin lock on LostOt

If a LostOT or SIGTERM occurs when a Gem is performing a multithreaded operation, and a slave thread is in the middle of a cache read, this thread may have left a stuck spin lock when it terminated. (#50984)

Issues related to repository scan operations

Possible missing results from findReferencesToInstancesOfClasses, etc.

The C code that supports the multithreaded repository scan methods findReferencesToInstancesOfClasses:, allReferencesToInstancesOfClasses:, and related methods may theoretically miss some results due to a missing mutex. (#50929)

GsObjectInventory overstated String bytes used, especially for large strings

GsObjectInventory reports the number of bytes used by instances of particular classes in the repository. For Strings, when the repository contains many large Strings, the number of bytes used was considerably overstated; overages by as much as 8x have been observed. (#51053)

Listing instances in memory did not observe limit argument

The methods Repository >> _listInstancesInMemory:limit:option:, and methods that invoke them such as Repository >> listInstances:limit:toDirectory:withMaxThreads:maxCpuUsage:memory:, did not observe the limit: argument, and returned all instances. (#51080)

Repository >> objectsInMemoryLargerThan: not working

The method Repository >> objectsInMemoryLargerThan: incorrectly returned empty results. (#51125)

Class >> instancesInMemory may incorrectly return an empty Array

Due to an C heap memory initialization issue, Class >> instancesInMemory could return an empty Array when there are instances in memory to return. This has been observed only on macOS on Apple silicon (ARM) in internal testing. (#50680)

Numerics issues

Values of different classes of numeric values that are close but not equal may compare as equal

When numbers of different classes, such as 0.3s1 (ScaledDecimal) and 0.3 (SmallDouble) are compared, one value must be coerced for the comparison. This did not always preserve the difference when the values are close. Now, precise comparisons are done. Some values that were close in value and previously compared as equal may now not be equal. This affects comparisons done using =, ~=, >, >=, <, and <=. (#50599)

Non-numeric arguments to comparison operators

Using a non-numeric argument to <, <=, >=, > now signals an ArgumentTypeError, rather than a MessageNotUnderstood or other error. (#50599)

Incorrect results from -1 bitShift: with arguments greater than 60

When the argument to -1 bitShift: is greater than 60, the result was incorrect, due to a failure to overflow to a large integer. Ths only affected a receiver of -1. (#51097)

Some kinds of Number did not understand #isZero

Kinds of Integer, BinaryFloat and DecimalFloat did not understand #isZero. (#51193)

OffsetError from ScaledDecimal >> kind

The ScaledDecimal >> kind method yielded an OffsetError, rather than an expected kind (#normal, #zero, etc.). (#51133)

Issues with GsUuidV4 comparison operators with unexpected argument types

The method GsUuidV4 >> < SEGVed when the argument is certain specials (such as SmallDoubles). (#51121)

The methods GsUuidV4 >> = and ~= returned ArgumentErrors when the argument is a special (such as SmallIntegers). (#51120)

DateTime newWithDate:time: truncated Time to seconds

Instances of Time support microsecond resolution. When creating a DateTime using newWithDate:time: or newWithDate:time:timeZone:, the Time argument was truncated to seconds, although DateTime supports down to millisecond resolution. Note that a sub-millisecond resolution of the Time instance is still lost. (##50971)

In solo mode, transactionMode: could cause abort

In solo mode, sending transactionMode: should have no effect, since commits are disallowed. This operation incorrectly aborted the session (performed a soloAbort). (#50986)

Stone continued running if ShrPcMon SEGV or other death

If the shared page cache monitor exits normally, such as by a kill -TERM, the Stone is also shut down. However, with a SEGV or kill -9, the Stone continued running and processing session commands, although logins would fail. (#51104)

NetLDI in root mode reported owner as regular user

When the NetLDI is running in root mode (with the executable owned by root with the s bit set), the owner was reported as the user that started the NetLDI, not as root. Now, gslist -x will report the owner as root. (#50995)

File descriptor leak in GsHostProcess

There was a file descriptor leak in GsHostProcess that was fixed in earlier versions; this fix was not complete, and additional leaks have been found and closed. (#47630)

This results in behavior changes in read:* and readWillNotBlock messages to the GsHostProcess’s child’s sockets. For more details, see Change in behavior in 3.7.2 that may affect existing application code.

Cache Statistics Issues

SEGV on programmatic cache statistics access by name for ProcessName

Executing System stoneCacheStatisticWithName: 'ProcessName', or other by-name programmatic cache statistics named 'ProcessName', resulted in a SEGV. (#51122)

Memory corruption from hostEasyStatistics*

The System class methods hostEasyStatisticsForProcess: and hostEasyStatisticsForMyProcess were added in 3.7.1, for a more performant partial set of statistics. The C code supporting this function had a risk of corrupting static C memory. (#50998)

System cacheStatsForGemWithName: some stats incorrectly 0

When using System cacheStatsForGemWithName: to access statistics values, the values for a number of statistics were incorrectly reported as 0, vs. the correct values reported by System myCacheStatistics. The specific incorrect stats that were incorrect varied over GS64 versions. (#51130)

Cache stats of -1 may be returned as 4294967295 by stoneCacheStatistics

Some statistics, such as CommitQueueThreshold that are intended to be reported as -1 were reported as 4294967295 by methods such as System class >> stoneCacheStatistics.

In hot standby, currentSessionNames may intermittently show an internal session

In a hot standby setup, the Stone uses an internal session to manage the restore into the slave. This session was intermittently visible from System class >> currentSessionNames, but was not identified. In particular, after a failover this session may not have been completely logged out. (#51226)

Repository >> addTransactionLog:size: did not correctly handle size argument

When this methods is executed to add a tranlog directory programmatically, updated configuration lines for STN_TRAN_LOG_DIRECTORIES and STN_TRAN_LOG_SIZES are added to the configuration file used by the stone (normally system.conf).

STN_TRAN_LOG_SIZES can be specified in units of KB, MB, or GB. The bug is that the line written to system.conf uses the units in the existing STN_TRAN_LOG_SIZES setting, which may not be in MB. While the internal value in the Stone is correct, on Stone restart, the incorrect unit is read and applied, and may result in unexpected tranlog sizes. (#51005)

LostOT may result in errors described as PageLocate error

During LostOT handling, if a remote session does not respond and is marked as invalid, the specific error reporting depends on the process that first encounters the invalid session. This could be reported as a PageLocate error, which was unnecessarily alarming. This is now reported more clearly as the result of a LostOT (#51084, #50722)

Issues affected X509-Secure GemStone

Working-set cache warming was not supported for X509-Secured caches

When using the X509-Secure GemStone feature, warming caches that are started by the NetLDI are supported via the configuration parameter NETLDI_WARMER_ARGS. This did not support working-set cache warming, in which active pages are tracked and preferentially warmed on restart. Now, you may include -w interval argument in this configuration parameter.

Note that, in order to warm a mid-level cache, NETLDI_PORT_RANGE and STN_PGSVR_PORT_RANGE must match.

GemStoneX509Parameters >> extraGemArgs: were ignored

When using GemStoneX509Parameters >> extraGemArgs: to apply gem options such as a larger temp obj cache, the composition of the Gem command was not correct, and these arguments were not used. (#51221)

Multithreaded scan operations could hang in a remote session

Operations such as listInstances that perform a multithreaded scan, hung in an X509 gem that is running on host remote from the Stone. (#51220)

System currentUserSessionCount included hostagent

System class >> currentUserSessionCount incorrectly counted the hostagent as a user session. (#51219)

SEGV handler now includes raw Smalltalk stack details

Previously, if the Gem signalled a SEGV, even if it was executing Smalltalk code, the Smalltalk stack was not printed. Now, this is automatically included in the log. (#51055)

Darwin error "attempt to create a CByteArray or CPointer that would reference VM memory"

In some cases with very large TOC on Darwin, this error was seen. This issue is related to incorrect handling of the result of a dlopen() call. (#51060)

Object >> _primitiveAt:put: could incorrectly grow non-indexable objects

Objects that are instances of classes that are not indexable can still be grown using Object >> _primitiveAt:put:. This should not happen in normal use, and this method is private and should not be invoked directly. (#50970)

Previous chapter