C# Tips and Tricks

Caro-Kann · Dec 16, 2021

This is more of a C/C++ trick than a C# one, but it works basically the same and is what I've been doing to read in data for the Biq files:

Let's say you have a byte array "myArr" containing file data. You know that at offset 123456, there's a section header called "XMPL". This is 4 chars long, so it takes 4 bytes. The next thing in the file is a integer saying how many instances of XMPL there are. In this case, we'll say 200. Next, the actual XMPL data starts, which is always of the format: integer, integer, integer, float, short, short, integer, double, integer. The slow and clunky way of reading in the data would be like this:

struct XMPL {

public int myInt1;
public int myInt2;
public int myFloat;
public int myShort1;
public int myShort2;
public int myInt4;
public int myDouble;
public int myInt5;

}

int offset = 123456;
int xmplCount = readInt32(myArr, offset + 4);
offset += 8;
Xmpl = new XMPL[xmplCount];
for (int i = 0; i < xmplCount; i++) {

Xmp\[i\].myInt1 = readInt32(myArr, offset);
Xmpl\[i\].myInt2 = readInt32(myArr, offset + 4);
Xmpl\[i\].myInt3 = readInt32(myArr, offset + 8);
Xmpl\[i\].myFloat = readInt32(myArr, offset + 12);
Xmpl\[i\].myShort1 = readInt32(myArr, offset + 16);
Xmpl\[i\].myShort2 = readInt32(myArr, offset + 18);
Xmpl\[i\].myInt4 = readInt32(myArr, offset + 20);
Xmpl\[i\].myDouble = readInt32(myArr, offset + 24);
Xmpl\[i\].myInt5 = readInt32(myArr, offset + 32);
offset += 36;

}

This code would would run fine and work, but it's repetitive. Each member of XMPL had to be referenced twice, and offset values had to be kept track of manually. The code also contained many more branch operations, array accesses, and function calls than is necessary. None of these are terribly expensive on their own, but when you need to read in a lot of data- either because you have one big file or many smaller ones- data reading can start to get bogged down. In comparison, this is the basic idea behind what I've been doing for reading in the Biq data:

struct XMPL {

public int myInt1;
public int myInt2;
public int myInt3;
public int myFloat;
public int myShort1;
public int myShort2;
public int myInt4;
public int myDouble;
public int myInt5;

}

int offset = 123456;
int xmplCount = readInt32(myArr, offset + 4);
Xmpl = new XMPL[xmplCount];
fixed (void* dataPtr = myArr, xmplPtr = Xmpl) {

byte* dataStart = (byte*)dataPtr;
int dataLength = xmplCount * sizeof(XMPL);
Buffer.MemoryCopy(dataStart + offset + 8, xmplPtr, dataLength, dataLength);

}

This code would produce the exact same output- a populated Xmpl array- as the previous example. How it works is that it first fixes the Xmpl and dataPtr arrays in place so that the C# runtime doesn't mess with their memory footprints. This is the main difference from doing this in C/C++, where no fixing is required. It makes the code a little more verbose (along with using Buffer.MemoryCopy instead of memcpy), but it's just one of the tradeoffs between languages (the main benefit of the C# way is that unsafe pointer logic is isolated).

Next, instead of copying the data one field at a time, or even one struct at a time, we can copy the entire chunk of data- all 200 instances of XMPL- at once. This works because the memory layout of the data in the file already matches the memory layout of the structs. Buffer.MemoryCopy takes 4 arguments. The first is the location of the data to be copied from. We take the address of the start of the myArr array, add the offset to that, and then in this example add another 8 to skip over the "XMPL" header and 4-byte length integer. The next argument is the memory-copying target, xmplPtr. Arguments 3 and 4 are both the length of the data, which is the number of XMPL instances times the memory length of one XMPL instance. This can either be derived by hand or by simply taking sizeof(XMPL).

There are many more caveats and useful tidbits about how this is extended further, but this is the basic idea. It is the ideal way to read in fixed-length structured data in the parts of the project where that is relevant. In future posts I'll try to also go through explanations of how this works with string data, flag bits, dynamic-length data, what the [StructLayout(LayoutKind.Sequential, Pack=1)] interop means, and how this can technique can be made even more efficient by skipping the memory copy.

C# Tips and Tricks

Caro-Kann

Chieftain

Similar threads