Proposal for a Network of Ideas
Jonathan Riehl
The Wild Idea Preserve

        Using this format, UID’s are limited to the set of 32-bit unsigned integers.  The value of these integers is unambiguous for any given database using data files it created.  The primary issue for the given format is the storage format of vectors of UID’s.  While the easiest solution would be to use the description column in the atoms table to hold arrays of 32-bit integers, this would cause each 7-bit ASCII character in a description to inflate in size from one byte to four.  For the purpose of constraining memory consumption, UID’s are further constrained to be 31-bit integers.  The binary data held in the description column is then encoded based on the setting of the most significant bit for the first byte of a UID reference.  If the most significant bit is zero (0,) then the UID reference is a one byte, 7-bit ASCII reference.  If the most significant bit is one (1,) then the UID reference is a four byte, 31-bit, unsigned little endian integer (where the least significant byte is the last eight bits of the binary integer, and the most significant bit of all 32-bits is ignored or masked if moving the value from 31-bit to 32-bit storage.)  The following diagram illustrates the binary storage format of the description column: