"
A hash tree internal node is an internal node of a B+tree-like structure used as 
part of the implementation of collections like TreeDictionary and TreeSet.

Unlike conventional B+trees, allows duplicate keys.
Keys here are known as ""hashes"" because they're derived from a permutation of 
what is returned by the #hash of a key object for the collection.

The permutation typically produces hashes that are close to evenly distributed
in the range [0..2^60). The even distribution makes it efficient to search the node 
by making a ""good guess"" as to where a given hash will be, and do a linear search 
from there. It is typical for the ""good guess"" to be optimal in over 90% of searches, 
and for most of the remaining few percent to find the hash within one cache line
of the initial probe. The shortness of the search and the small number of cache line
reads make this search strategy considerably faster than the binary search that 
sorted trees need to use when keys are not uniformly distributed.

In the atypical case where there are a large number of hash collisions (identical 
hashes) searches get slower, degrading in the worst case to overall linear search,
which is also the case in any collection containing many elements with identical
lookup keys.

Internal Representation and Key Implementation Points.
  Instance Variables
  multiplier:  <SmallInteger> A cache of a scaled approximation of 
                  tally / (highestHash - lowestHash)
               Used in computing the ""good guess"" of initial search index
  					
  preShift:  <SmallInteger> If (targetHash - lowestHash) has more than 30 significant bits 
         (which is typical) this shift is used to discard the low-order bits, 
          to prevent overflow out of the SmallInteger range when multiplied by multiplier.
         Always non-positive.  Used in computing the ""good guess"" of initial search index

  postShift:  <SmallInteger> Since multiplier is scaled by a power of two to become
         an integer with a reasonable number of significant bits (typically 29 or 30), 
         this shift is used to undo that shift when computing the ""good guess"" index.

  tally:   <SmallInteger> How many child pointers I currently contain.
  
  lowestHash:  <SmallInteger> A cache of (self at: 1), the lowest hash value that any
  	      of my children can contain.

  Note that multiplier, preShift, postShift, and lowestHash are the most frequently 
  accessed named instvars, used for every search. Therefore, they, along with tally, which
  is accessed every time my size changes, are placed first in the object, so that
  they will be in the same cache line as the object header and not incur the cost of
  another cache line read on these operations.

  highestHash: 	<SmallInteger> A cache of    (self at: (tally * 2 + 1)) , 
               the highest hash value that any of my children can contain.

  collection	<TreeDictionary | TreeSet> The collection that I am part of.

  My size is the maximal odd number to fit in a GemStone 16KB page.
  Indexed instvars are alternating hash SmallIntegers and child pointers. 
  If tally is 5 (indicating there are five children) it will look like

    1    2    3    4   5   6    7    8   9   10  11  12...
  | H | C | H | C | H | C | H | C | H | C | H | nils...
  
  The lowestHash (@1) and highestHash (@11) values are redundant; they
  duplicate the hash values to the left and right of the child pointer 
  in the parent. Duplicating them here simplifies the search loop logic.
  
Implementation Points

  The main invariant is that all entries in the subtree of a child pointer
  will have a hash >= to the hash to the left of the child pointer, and a hash <= 
  the hash to the right of the child pointer. 
  Note that since there's an equal on both sides, the left and right hashes 
  may be the same. Hash tree collections support unlimited hash collisions, so this
  is necessary. If there are enough keys with identical hashes, the lowestHash 
  and highestHash of an internal node may be equal.
"
Class {
	#name : 'HtInternalNode',
	#superclass : 'Array',
	#instVars : [
		'multiplier',
		'preShift',
		'postShift',
		'tally',
		'lowestHash',
		'highestHash',
		'collection'
	],
	#category : 'HashTree-Core'
}

{ #category : 'instance creation' }
HtInternalNode class >> forCollection: aCollection [

	^ (self new: self nodeBasicSize)
		  initialize;
		  collection: aCollection;
		  yourself
]

{ #category : 'instance creation' }
HtInternalNode class >> nodeBasicSize [
	"Since named instvar structure is 
	| hash | child | hash | ... | hash | child | hash |
	we want an odd basicSize.
	Max size to fit in a page is 2034, including named instvers.
	I have seven named instvars, so max size is 2027"

	^ 2027
]

{ #category : 'walker access' }
HtInternalNode >> absorbLeftSibling: otherInternalNode [
	"Copy all elements in otherInternalNode to myself.
	otherInternalNode must be the sibling adjacent to my left
	(lower hash values).
	Sender is responsible for assuring that I have sufficient room.
	otherInternalNode is about to be discarded, don't bother to 
	remove elements from it."

	| otherTally roomNeeded |
	otherTally := otherInternalNode tally.
	roomNeeded := otherTally * 2.
"
	self maxHashIndex to: 1 by: -1 do: [ :i | 
		self at: i + roomNeeded put: (self at: i) ].
	self replaceFrom: 1 to: roomNeeded with: otherInternalNode startingAt: 1.
"
  self _insertAt: 1 from: otherInternalNode fromStart: 1 fromEnd: roomNeeded 
       numToMoveDown: self maxHashIndex .
	tally := tally + otherTally.
	lowestHash := otherInternalNode lowestHash.
	self computeConstants
]

{ #category : 'walker access' }
HtInternalNode >> absorbRightSibling: otherInternalNode [
	"Copy all elements in otherInternalNode to myself.
	otherInternalNode must be the sibling adjacent to my right
	(higher hash values).
	Sender is responsible for assuring that I have sufficient room.
	otherInternalNode is about to be discarded, don't bother to 
	remove elements from it."

	| otherTally newTally |
	otherTally := otherInternalNode tally.
	newTally := tally + otherTally.
	self
		replaceFrom: (tally + 1) * 2
		to: newTally * 2 + 1
		with: otherInternalNode
		startingAt: 2.
	highestHash := otherInternalNode highestHash.
	tally := newTally.
	self computeConstants
]

{ #category : 'accessing' }
HtInternalNode >> appendHash: hash [
	"Add the given hash value after my last entry,
	making it the highest hash for this node.
	Used in initializing new root nodes."

	self at: self highestHashIndex put: hash
]

{ #category : 'accessing' }
HtInternalNode >> appendSortedChild: childNode [
	"Add childNode and its lower hash as my new last entry.
	The sender is responsible for sending this message in sorted order.
	Used for initializing a new root node."

	| hashIndex |
	tally = 0 ifTrue: [ lowestHash := childNode lowestHash ].
	hashIndex := self highestHashIndex.
	self
		at: hashIndex put: childNode lowestHash;
		at: hashIndex + 1 put: childNode.
	tally := tally + 1
]

{ #category : 'auditing' }
HtInternalNode >> auditOnto: stream for: aCollection lowestHash: parentLow highestHash: parentHigh [
	"Report any problems onto stream.
	Answer the total number of key/value pairs found in my subtree."

	| identifier pairCount previousHash nextHash |
	identifier := 'InternalNode ' , self asOop printString , ' '.

	self _isLarge
		ifTrue: [ 
			stream
				nextPutAll: identifier , 'is a large object, should fit in one page.';
				lf ].

	highestHash = parentHigh
		ifFalse: [ 
			stream
				nextPutAll:
						identifier , 'highestHash should be equal to parent hash of '
								, parentHigh printString , ' but is ' , highestHash printString;
				lf ].

	lowestHash = parentLow
		ifFalse: [ 
			stream
				nextPutAll:
						identifier , 'lowestHash should be equal to parent hash of '
								, parentLow printString , ' but is ' , lowestHash printString;
				lf ].

	highestHash < lowestHash
		ifTrue: [ 
			stream
				nextPutAll:
						identifier , 'lowestHash ' , lowestHash printString
								, ' is greater than highestHash ' , highestHash printString;
				lf ].

	(multiplier class ~~ SmallInteger or: [ multiplier < 0 ])
		ifTrue: [ 
			stream
				nextPutAll:
						identifier , 'multiplier should be a non-negative SmallInteger, but is '
								, multiplier printString;
				lf ]
		ifFalse: [ 
			multiplier = 0
				ifFalse: [ 
					(multiplier highBit between: 29 and: 30)
						ifFalse: [ 
							stream
								nextPutAll:
										identifier , 'multiplier out of expected range, is ' , multiplier printString;
								lf ] ] ].

	postShift class ~~ SmallInteger
		ifTrue: [ 
			stream
				nextPutAll:
						identifier , 'postShift should be a SmallInteger, but is '
								, postShift printString;
				lf ]
		ifFalse: [ 
			postShift = 0
				ifFalse: [ 
					(postShift between: -58 and: -20)
						ifFalse: [ 
							stream
								nextPutAll:
										identifier , 'postShift should be between -58 and -20, but is '
												, postShift printString;
								lf ] ] ].

	preShift class ~~ SmallInteger
		ifTrue: [ 
			stream
				nextPutAll:
						identifier , 'preShift should be a SmallInteger, but is ' , preShift printString;
				lf ]
		ifFalse: [ 
			(preShift between: -30 and: 0)
				ifFalse: [ 
					stream
						nextPutAll:
								identifier , 'preShift should be between -30 and 0, but is '
										, preShift printString;
						lf ] ].

	tally class ~~ SmallInteger
		ifTrue: [ 
			stream
				nextPutAll:
						identifier , 'tally should be a SmallInteger, but is ' , tally printString;
				lf ]
		ifFalse: [ 
			(tally between: 2 and: 1013)
				ifFalse: [ 
					stream
						nextPutAll:
								identifier , 'tally should be between 2 and 1013, but is ' , tally printString;
						lf ] ].

	collection == aCollection
		ifFalse: [ 
			stream
				nextPutAll:
						identifier , 'collection is ' , collection printString , ' with oop '
								, collection asOop printString
								, ' should be the Collection I belong to';
				lf ].

	lowestHash = (self at: 1)
		ifFalse: [ 
			stream
				nextPutAll:
						identifier , 'lowestHash is ' , lowestHash printString
								, ', should be equal to first indexed instvar, which is '
								, (self at: 1) printString;
				lf ].

	pairCount := 0.
	previousHash := lowestHash.
	1 to: tally * 2 - 1 by: 2 do: [ :hashIndex | 
		| currentHash child |
		currentHash := self at: hashIndex.
		nextHash := self at: hashIndex + 2.
		currentHash >= previousHash
			ifFalse: [ 
				stream
					nextPutAll:
							identifier , 'Hash at ' , hashIndex printString , ' with value '
									, currentHash printString
									, ' is out of order, should be >= previous hash with value '
									, previousHash printString;
					lf ].
		nextHash >= currentHash
			ifFalse: [ 
				"This check is really only necessary for the highest hash."
				stream
					nextPutAll:
							identifier , 'Hash at ' , (hashIndex + 2) printString , ' with value '
									, nextHash printString
									, ' is out of order, should be >= previous hash with value '
									, currentHash printString;
					lf ].
		child := self at: hashIndex + 1.
		pairCount := pairCount
			+
				(child
					auditOnto: stream
					for: aCollection
					lowestHash: currentHash
					highestHash: nextHash).
		previousHash := currentHash ].

	highestHash = nextHash
		ifFalse: [ 
			stream
				nextPutAll:
						identifier , 'highestHash is ' , highestHash printString
								, ', should be equal to indexed instvar at tally * 2 + 1, which is '
								, nextHash printString;
				lf ].

	(tally + 1) * 2 to: self size do: [ :i | 
		| found |
		found := self at: i.
		found
			ifNotNil: [ 
				stream
					nextPutAll:
							identifier , 'indexed instvar at free index ' , i printString
									, ' should be nil, but is ' , found printString;
					lf ] ].

	^ pairCount
]

{ #category : 'walker access' }
HtInternalNode >> bulkCopyFrom: otherNode startIndex: startIndex endIndex: endIndex [
	"Should be able to use primitive for the actual copy."

	| myIndex |
	myIndex := 1.
	startIndex to: endIndex do: [ :theirIndex | 
		self at: myIndex put: (otherNode at: theirIndex).
		myIndex := myIndex + 1 ].
	lowestHash := self at: 1.
	highestHash := self at: myIndex - 1.
	tally := (myIndex - 1) // 2.
	self computeConstants
]

{ #category : 'private' }
HtInternalNode >> clearFrom: startIndex to: endIndex [

	startIndex to: endIndex do: [ :i | self at: i put: nil ]
]

{ #category : 'accessing' }
HtInternalNode >> collection: aCollection [
	collection := aCollection
]

{ #category : 'initializing' }
HtInternalNode >> computeConstants [
	"Set multiplier, preShift, and postShift, based on lowestHash, 
    highestHash, and tally, so that the expression in 
    lowestChildIndexForHash:

    ((hash - lowestHash bitShift: preShift) * multiplier bitShift: postShift)
		bitOr: 1
		
    closely approximates the ideal expression
    
    tally * (hash - lowestHash) // (highestHash - lowestHash) * 2 + 1
    
    but using only SmallInteger intermediate results, and avoiding
    division since it is expensive at the machine-instruction level."

	| range rangeShift shiftedRange rangeBits tallyBits tallyShift shiftedTally |
	range := highestHash - lowestHash.
	range = 0
		ifTrue: [ 
			multiplier := preShift := postShift := 0.	"All hashes in the node are the same, set up for linear search"
			^ self ].
	rangeShift := 30 - range highBit min: 0.	"-30 to 0, commonly -30 to ~ -20"
	shiftedRange := (range bitShift: rangeShift) + 1.	"+ 1 to make sure we round multiplier in the lower direction"
	rangeBits := shiftedRange highBit.	"2 to 31, 31 for root nodes, 30 for most others"
	tallyBits := tally highBit.	"2 to 10"
	tallyShift := rangeBits + 29 - tallyBits.	"21 to 58, commonly 49 to 58"
	shiftedTally := tally bitShift: tallyShift.	"highBit is 31 to 60, commonly 59 or 60"
	multiplier := shiftedTally // shiftedRange.	"highBit 29 or 30"
	preShift := 60 - multiplier highBit - range highBit min: 0.	"-30 to 0. Commonly -30 to ~ -20"
	postShift := rangeShift - preShift - tallyShift + 1. "either -tallyShift or -tallyShift + 1"
]

{ #category : 'copying' }
HtInternalNode >> copyForCollection: coll [
	| copy |
	copy := self shallowCopy.
	copy
		collection: coll;
		postCopyForCollection: coll.
	^ copy
]

{ #category : 'accessing' }
HtInternalNode >> highestHash [
	^highestHash
]

{ #category : 'accessing' }
HtInternalNode >> highestHash: aSmallInteger [
	highestHash := aSmallInteger
]

{ #category : 'private' }
HtInternalNode >> highestHashIndex [

	^ tally * 2 + 1
]

{ #category : 'initializing' }
HtInternalNode >> initialize [

	super initialize.
	tally := 0
]

{ #category : 'initializing' }
HtInternalNode >> initializeFromHashLimits [

	| highest range highBit ddivisor sshift |
	highest := self at: self highestHashIndex.
	range := highest - lowestHash.
	range = 0 ifTrue: [ "All hash values in node will be identical due to large
		numbers of collisions. Just set up for linear search."
		sshift := -60.
		ddivisor := 1.
		^ self ].
	highBit := range highBit. "range < 2 ** highBit"
	sshift := 60 - highBit min: 5.
	ddivisor := range bitShift: sshift - 5
]

{ #category : 'walker access' }
HtInternalNode >> insertChildNode: node atIndex: childIndex [

	| hashIndex nMove |
	hashIndex := childIndex - 1.
  nMove := self maxHashIndex - hashIndex + 1 .
  self _insertAt: hashIndex value: node lowestHash value: node numToMoveDown: nMove . 
	tally := tally + 1.
	self computeConstants
]

{ #category : 'testing' }
HtInternalNode >> isDegenerate [
	"An internal node, even if it's the root, must have at least two children."

	^ tally = 1
]

{ #category : 'testing' }
HtInternalNode >> isFull [

	"self basicSize // 2"

	^ tally >= 1013
]

{ #category : 'testing' }
HtInternalNode >> isLeaf [

	^ false
]

{ #category : 'walker access' }
HtInternalNode >> lowestChildIndexForHash: targetHash [
	"Where is the child pointer for my first child that could 
	have entries for the given hash?"

	| searchIndex foundHash |
	searchIndex := ((targetHash - lowestHash bitShift: preShift) * multiplier
		bitShift: postShift) bitOr: 1.
	foundHash := self at: searchIndex.
	foundHash < targetHash
		ifTrue: [ 
			| limit |
			"Search right"
			limit := self maxHashIndex.
			[ 
			searchIndex < limit
				and: [ 
					searchIndex := searchIndex + 2.
					foundHash := self at: searchIndex.
					foundHash < targetHash ] ] whileTrue: [  ].
			foundHash >= targetHash
				ifTrue: [ ^ searchIndex - 1 ].
			self error: 'Searching internal node for hash it cannot contain' ]
		ifFalse: [ 
			"Search left"
			[ 
			searchIndex > 1
				and: [ 
					searchIndex := searchIndex - 2.
					foundHash := self at: searchIndex.
					foundHash >= targetHash ] ] whileTrue: [  ].
			foundHash <= targetHash
				ifTrue: [ ^ searchIndex + 1 ].
			self error: 'Searching internal node for hash it cannot contain' ]
]

{ #category : 'accessing' }
HtInternalNode >> lowestHash [

	^ lowestHash
]

{ #category : 'private-indexing' }
HtInternalNode >> maxChildIndex [
	"Index for at: of the last child pointer I currently contain"

	^ tally * 2
]

{ #category : 'private-indexing' }
HtInternalNode >> maxHashIndex [
	"Index for at: of the last child pointer I currently contain"

	^ tally * 2 + 1
]

{ #category : 'accessing' }
HtInternalNode >> objectSecurityPolicy: anObjectSecurityPolicy [
	super objectSecurityPolicy: anObjectSecurityPolicy.
	2 to: tally * 2 do: [ :i | (self at: i) objectSecurityPolicy: anObjectSecurityPolicy ]
]

{ #category : 'accessing' }
HtInternalNode >> percentFull [

	^ tally * 100 // 1013
]

{ #category : 'copying' }
HtInternalNode >> postCopyForCollection: coll [
	2 to: tally * 2 by: 2 do: [ :i | self at: i put: ((self at: i) copyForCollection: coll) ]
]

{ #category : 'walker access' }
HtInternalNode >> removeLeftwardChildAtIndex: childIndex [
	"Remove the child at childIndex, and the key to its 'left' (lower index).
	Sender is responsible for ensuring that childIndex is valid and does not
	refer to the leftmost child (childIndex 2)."

	| lastIndex |
	tally := tally - 1.
	lastIndex := tally * 2 + 1.
	self
		replaceFrom: childIndex - 1
		to: lastIndex
		with: self
		startingAt: childIndex + 1.
	self
		at: lastIndex + 1 put: nil;
		at: lastIndex + 2 put: nil.
	self computeConstants
]

{ #category : 'walker access' }
HtInternalNode >> removeRightwardChildAtIndex: childIndex [
	"Remove the child at childIndex, and the key to its 'right' (higher index).
	Sender is responsible for ensuring that childIndex is valid and does not
	refer to the rightmost child (childIndex tally * 2)."

	| lastIndex |
	tally := tally - 1.
	lastIndex := tally * 2 + 1.
	self
		replaceFrom: childIndex
		to: lastIndex
		with: self
		startingAt: childIndex + 2.
	self
		at: lastIndex + 1 put: nil;
		at: lastIndex + 2 put: nil.
	self computeConstants
]

{ #category : 'accessing' }
HtInternalNode >> soleChild [
	^ self isDegenerate
		ifFalse: [ self error: 'I do not have a sole child.' ]
		ifTrue: [ self at: 2 ]
]

{ #category : 'walker access' }
HtInternalNode >> split [
	"Split into two equal-size nodes. The left-hand (lower hash) node will be me,
	and the right-hand node will be new. Answer the new node."

	| newNode firstHalfCount sharedHashIndex endIndex |
	newNode := self class forCollection: collection.
	firstHalfCount := tally bitShift: -1.

	sharedHashIndex := firstHalfCount * 2 + 1.
	endIndex := self maxHashIndex.

	newNode bulkCopyFrom: self startIndex: sharedHashIndex endIndex: endIndex.

	self clearFrom: sharedHashIndex + 1 to: endIndex.
	highestHash := self at: sharedHashIndex.
	tally := firstHalfCount.
	self computeConstants.
	^ newNode
]

{ #category : 'accessing' }
HtInternalNode >> tally [
	^tally
]
