2. Bug Fixes

Previous chapter

The following bugs were present in v3.7 and are fixed in this version.

Sending #value to special resulted in SEGV

Sending #value to a special (such as a SmallInteger) resulted in a SEGV in the Gem process. (#50748)

startstone recovery may fail with tranlogs on raw partitions

After an unclean Stone shutdown, such as a kill -9, on restart the Stone performs recovery by reading the current tranlog that was being written before the shutdown. When tranlogs are configured as a list of raw partitions, there was a codepath in which the stone may fail to find the tranlogs needed for recovery; in this case, the stone will not start unless startstone -N is used. (#50843)

Stone may shutdown on checkpoint when tranlog disks are full

When the tranlog disks are full, and the stone has paused checkpoints waiting for tranlog disk space to become available, if the checkpoint timeout interval expires and a checkpoint initiated, the stone shut down. (#50744).

A similar issue, bug #49704, was fixed in v3.7; this is a slightly different code path. The Stone may only shutdown due to lack of tranlog disk space during startstone or when restoring from log/backup.

Garbage collection issues

Performance regression in markForCollection

In v3.6, code changes were introduced to avoid thread-safety issues in garbage collection leaf caches. This change resulted in significantly slower performance (for some configurations) in v3.6.x, vs. 3.5.x and earlier. The fix has been rewritten and performance is now significantly faster than pre-3.6 releases as well as faster than 3.6.x versions. (#50770).

markForCollectionWait: with argument of -1 gets prim failure

Executing Repository >> markForCollectionWait: with an argument of -1 (meaning wait forever), resulted in a primitive failure. (#50769)

reclaimAll may fail to reclaim some dead

It was possible for the reclaimGem to retain an old copy of the deadObjs and not actually reclaim all dead. (#50798)

There is additional information in the progress printed information, and reclaimAll signals a Repository error if it is stuck; that is, if there is no progress after 10 minutes.

C mutex deadlock possible from ProfMonitor/SIGUSR1

There was a small timing window in which was it possible for a Gem to deadlock on a C mutex, when processing a VM interrupt. This occurred periodically with ProfMonitor running, and when the Gem is sent SIGUSR1. (#50729)

NetLDI issues

NetLDI shutdown timing issue resulted in unable to restart

When a NetLDI services a request to fork a gem, it forks a child process and then does an exec. During the period after the fork and before the exec, a thread in the child process makes system calls to collect information for printing the log file banner. If the NetLDI is stopped using either kill -TERM or stopnetldi while a thread is making a system call, this could have resulted in a deadlock in the child.

The fix removes the system call in which the deadlock has been observed. The forked child also now immediately closes its inherited file descriptor for the NetLDI's listening socket, rather than waiting for the close-on-exec of that socket to take effect.

Since informational system calls are no longer made during the forked period, the Gem log headers now show a reduced section for "NetLDI Child Task", which includes only the Gem command line. The system information is already duplicated in the header information printed by the Gem. (#50697)

netldid executes without $GEMSTONE but does not function

The netldid executable started up without the $GEMSTONE environment variable correctly set, however, it did not function correctly. Now, this case will error on startup attempt. (#50782)

GsSecureSocket instance method did not properly used chained certificate file

The method:

GsSecureSocket >> useCertificateFile:withPrivateKeyFile:privateKeyPassphrase:

did not handle chained certificates; only the leaf certificate of a chained cert file would be used. (#50900)

The related class methods did work correctly with chained certs.

GsSocket >> writeWillNotBlockWithin: does not handle nil returned by writeWillNotBlock

It was possible for GsSocket >> writeWillNotBlock to return a nil, which resulted in an error in GsSocket >> writeWillNotBlockWithin:. (#50775)

Unicode string copyReplaceAll:with:, insertAll:at:, broken with String arguments

When the receiver is a unicode string and the argument is a legacy string, the methods copyReplaceAll:with:, insertAll:at:, and any methods that invoke these methods, would see a Message Not Understood error. This is related to changes in string conversion introduced in v3.7. (#50805)

Collection issues

Array >> replaceFrom:to:with:startingAt: may not have copied elements correctly

Using replaceFrom:to:with:startingAt: where the source for the replace is the same as the receiver and the starting position in the source for the replace is less than the target starting index, the replace must copy in reverse. This case did not correctly perform all the copy operations for large collections, and produced incorrect results. (#50721)

Dictionary become: does not update internal variables

After sending become: to a Dictionary containing EqualityCollisionBuckets, the back references from the EqualityCollectionBucket to the Dictionary incorrectly was the become: argument dictionary, not the receiver. (#50717)

Numeric and Time related issues

Strings with leading '-0.' lose sign when converted using asNumber or Number fromString/Stream:

Converting a string containing a decimal number that is less than zero and greater than -1, such as '-0.1', using unspecific conversion methods asNumber, Number fromString:, or fromStream:, resulted in a positive value, the absolute value of the correct result. (#50751)

Duration new created an uninitialized instance

Duration new now returns Duration zero rather than an uninitialized instance. (#50710)

Issues related to upgrade

upgradeImage reports success even if upgrade fails

The upgradeImage script returned true in 3.7, even when the upgrade actually did not succeed. (#50750)

Excessive attempts to start AdminGem during upgrade

During upgrade, after the Stone has been started but before upgradeImage has been run, the Stone continuously attempted to start the Admin GcGem, which failed due to the version mismatch until upgradeImage completes. (#50773)

copydbf returns wrong version for secondary extents after upgrade

After upgrade, the dbf version in the extent is updated with the new version; this is the version reported by copydbf -i. This update was incorrectly only applied to the primary extent. For repositories with multiple extents, copydbf -i on any of the secondary extents reported the old version. (#50749)

Session methods must be disabled for upgrade

Session methods are a feature used by GsDevKit and GLASS, but should not be enabled in the ordinary base GemStone without knowledgable assistance. If GsPackagePolicy current is inadvertently enabled, it causes a number of problems. This is not a supported configuration for upgrade; now, upgrade will error if GsPackagePolicy current enabled returns true. (#50453)

Upgrade created instances of ObsoleteMetaclass

The upgrade process created instances of ObsoleteMetaclass. This did not affect operation, since they were automigrated when faulted into memory. (#50810)

GsNMethod >> recompileFromSource loses method category

The method GsNMethod>>recompileFromSource changed the method category to '(as yet unclassified)'. (#50731)

perform:withArguments: may crash on bad arguments

When the arguments to Object >> perform:withArguments: is bad (for example, nil), native code did not handle this, and the session was terminated. Linked topaz reported Segmentation fault (core dumped), but an RPC gem log did not report a message. (#50864)

Spurious OutOfMemory when in-memory pom_gen scavenge fails

When pom_gen scavenge failed, in-memory GC may have signaled an OutOfMemory error when not actually out of memory. (#50711)

Fileout format reverted to older style in v3.7

The class fileout format lines for removing existing methods was updated in v3.6 to use topaz removeAllMethods commands. In v3.7, this was inadvertently reverted to the older style using doits. This has been corrected, and fileouts from v3.7.1 images include topaz removeAllMethods commands. (#50732)

GsExternalSession >> executeString: fails if attempting to return an empty String

When the result of executing GsExternalSession executeString: cmdString is an empty String, it resulted in an ArgumentError with "source of copy goes past end of source memory". (#50882)

defaultGemNRSFromCurrent may be incorrect

The method GsNetworkResourceString >> defaultGemNRSFromCurrent composes a gem login parameter, such as !@hostname#netldi:33333!gemnetobject. This used a netldi port specified by GEMSTONE_NRS_ALL in the session’s environment, which could have been different than the NetLDI actually used to spawn the session.

By restricting a NetLDI to have a matching port and GEMSTONE_NRS_ALL (see startnetldi now requires port to match GEMSTONE_NRS_ALL), this is now handled correctly. (#50046)

GLASS/Seaside auto-migrate may error after GC operation

In the GLASS/Seaside environment, Class versioning (including that done by upgradeSeasideImage, as well as code development), automatically migrated instances to the new Class version, which invokes a repository scan operation to detect the instances. This scan operation may conflict with voting or promote operations that occurs as part of an Epoch or after a markForCollection. (#50078)

ensure blocks in superDoit scripts do not execute on error

Ensure blocks in the doit Block of a superDoit script were not executed if the code encountered an error. (#50874)

Issues with GsProcess terminate

Overlapping terminate request semantics

If a GsProcess that is in the process of terminating another GsProcess was itself terminated, it could have resulted in an error such as "termination already started', or potentially no error.

Now, the in-process terminations of the other GsProcesses are tracked. If a GsProcess receives a request to terminate, it will first wait for its own earlier termination requests to complete or timeout. (#50754, #50756)

SEGV when a terminated GsProcess is sent terminate

From GBS, it was possible to interrupt a GsProcess that was in the process of being terminated, and send another #terminate message to it. This SEGVed if the timing was such that the GsProcess had already been terminated. (#50702)

Gem failed to respond to SIGTERM on 1-core VM

On a 1-core virtual machine, a gem may have failed to honor a SIGTERM. (#50844)

CodeModificaton required to list Pragmas with session methods enabled

When session methods are enabled (this is a feature that is used in Seaside and GsDevKit), finding or enumerating Pragmas required the user to have #CodeModification privilege; which should not be needed for a view-only operation. (#50620)

waitstone may return before Stone is ready to accept logins

Under some conditions, observed with startstone -R, waitstone can return with the message "stone is ready", while the Stone is still in startup and not ready to accept logins. (#50642)

Array with:* primitive failure error reports incorrect

The Array >> with:with:with:[with:]+ methods invoke a primitive. If this primitive fails, the error message did not correctly report the number of with: arguments in the selector and/or the argument list. (#50786)

 

Previous chapter