Game Script Implementation Notes (Draft)

I’ve previously blogged about the “Shimlang” language. This is not a continuation of that design. Instead, this page solidifies some of the internal implementation details of the programming language.

A large amount of virtual memory (4 GB) will be dedicated for the interpreter. All memory operations are done relative to the base address of this memory chunk. A dump of this memory should be sufficient to completely restore the state of a game written in the language. This includes loading state from some point in the past of a gameplay session to be able to easily reproduce errors and tweak behaviour.

This memory is divided into 2^24 64-bit addresses.

A ShimValue is one of:

Unit () - return value of functions that don’t return
Null - Empty of an optional
32-bit integer
32-bit float
Boolean
Set
Dict
List
Range
String (multiple internal representations?)

Why not have variable-length data?

This complicates things like lists that have contiguous chunks of memory where the fields need to be consistently-sized. This also applies to objects whose fields are expected to have a particular number of slots.

Maybe in the future if these things are typed they can be variably sized?

ShimID

A ShimID is a u24 that points into the VM memory space. It can point to anything whether that’s raw bytes (for lists/dicts/sets) types, bytecode, closures, etc.

These ID’s absolutely cannot be used to write data to the VM memory space. We don’t support reading the ID either (outside of debugging) since we may want to support compaction, which requires updating ID’s.

ShimValue

These variants can fit where a ShimValue is expected. All of these values can be copied/moved without issue since nothing will point to the address of a ShimValue.

Tombstone:
0000_0000 0000_0000 0000_0000 0000_0000  0000_0000 0000_0000 0000_0000 0000_0000
A value that should never be readable by a program
A null address is used by dicts/sets to more efficiently handle deletions.

ID:
0000_0000 XXXX_XXXX XXXX_XXXX XXXX_XXXX  XXXX_XXXX YYYY_YYYY YYYY_YYYY YYYY_YYYY
Data: Address of object

Null:
1111_1111 XXXX_XXXX XXXX_XXXX XXXX_XXXX  XXXX_XXXX XXXX_XXXX XXXX_XXXX XXXX_XXXX

Unit:
0000_0001 XXXX_XXXX XXXX_XXXX XXXX_XXXX  XXXX_XXXX XXXX_XXXX XXXX_XXXX XXXX_XXXX
Data: Reserved for future use?

Boolean:
0000_0010 XXXX_XXXX XXXX_XXXX XXXX_XXXX  XXXX_XXXX XXXX_XXXX XXXX_XXXX XXXX_XXXY
Data: Only one bit used

i32:
0000_0011 XXXX_XXXX XXXX_XXXX XXXX_XXXX  YYYY_YYYY YYYY_YYYY YYYY_YYYY YYYY_YYYY
Data: Only 32 bits used
Maybe every int could be a bigint if a high bit is set?

f32:
0000_0100 XXXX_XXXX XXXX_XXXX XXXX_XXXX  YYYY_YYYY YYYY_YYYY YYYY_YYYY YYYY_YYYY
Data: Only 32 bits used

set:
0000_0101 XXXX_XXXX XXXX_XXXX XXXX_XXXX  XXXX_XXXX XXXX_XXXX XXXX_XXXX XXXX_XXXX
Data: Address of set

dict:
0000_0110 XXXX_XXXX XXXX_XXXX XXXX_XXXX  XXXX_XXXX XXXX_XXXX XXXX_XXXX XXXX_XXXX
Data: Address of dict

list:
0000_0111 XXXX_XXXX XXXX_XXXX XXXX_XXXX  XXXX_XXXX XXXX_XXXX XXXX_XXXX XXXX_XXXX
Data: Address of list

range:
0000_1000 YYYY_YYYY YYYY_YYYY YYYY_YYYY  YYYY_ZZZZ ZZZZ_ZZZZ ZZZZ_ZZZZ ZZZZ_ZZZZ
Maybe flags could make this a 0..n or 1..n or address range?
Extents are limited to i26 since that's the number of bits available. If for
some reason more is needed then it needs to be handled without the convenience
of a range.

string:
0000_1001 XXXX_XXXX XXXX_XXXX XXXX_XXXX  XXXX_XXXX YYYY_YYYY YYYY_YYYY YYYY_YYYY
Data: Address of string

Here’s how those data types are defined:

string: A 32-bit string containing the length followed by that number of bytes
TODO: String interning, more efficient slices

range: Two 32-bit integers

dict: Uses order dict described here: https://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html

Game Script Implementation Notes (Draft)

String (multiple internal representations?)

Why not have variable-length data?

ShimID

ShimValue