Just starting to take some notes on the file format; this is by no means complete, but I figure it's better to have this stuff written down somewhere.
The mod tools refer to the overall objects these files contain (the master objects that contain everything else, that are basically equivalent to the files themselves) as TransferAgents.
This data is always the same, regardless of which version of the editor I seem to use. Here's the opening 48 bytes, in hexadecimal:
27 00 00 00 44 69 76 65 72 73 69 6f 6e 73 20 45 6e 74 65 72 74 61 69 6e 6d 65 6e 74 20 4f 62 6a 65 63 74 73 20 46 69 6c 65 32 00 02 00 00 00 00
Here's my best guess for what it means:
|0x00-0x03||Appears to be a 4-byte integer that matches the length of the following string (39 characters, counting the null terminator).|
|0x04-0x2a||The null-terminated string “Diversions Entertainment Objects File2”.|
|0x2b||This byte's value is 0x02. Doesn't appear to be a length of anything; I'd guess more likely to be a version marker, but since the header doesn't change…|
|0x2c-0x2f||Null. Since there are four (instead of three) null bytes following the previous, it's unlikely to be a 4-byte integer (although it could be one followed by a null byte as a separator?).|
After that, we get into data that doesn't stay constant, but the stuff immediately following tends to be pretty similar:
|0x30-0x33||Appears to be the length of the “object directory” at the top of the file. The DEObjects file first lists every object it contains and what type it is, then follows that up with the actual contents of those objects. The latter section seems to start exactly this number of bytes later (or this points to the end of the file, if it contains nothing).|
|0x34||Always seems to be 0x09; no idea what it means.|
|0x35||This is the first occurance of a variable-sized length value. If the length value at 0x30 is small enough, this is a single-byte integer. Otherwise, this is 0xFF and the following four bytes are a 4-byte integer. Whether it takes up 1 byte or 5, this appears to be identical to the value at 0x30, just minus the intervening bytes. The format appears to be full of redundancies like this.|
(if 0x35 was 0xFF)
|4-byte integer length value as described in the previous entry.|
|0x36/0x3a||Always seems to be 0x04.|
|0x37/0x3b||Probably another variable-size length value, but confirming this would require naming a TransferAgent something ridiculously long. Seems to measure the length of the name, plus the length of the TransferAgent's type identifier, plus some… extra data.|
|4-byte length; measures the length of the TransferAgent's name (plus null character).|
|0x3c/0x40||First character of the TransferAgent name.|
At this point, trying to list addresses becomes even more meaningless, due to variable lengths, so I'll just list things as I understand them.
|String||The rest of the TransferAgent name, including null character.|
|4-byte integer||The length of the TransferAgent's type identifier.|
|String||The TransferAgent's type identifier (again, null-terminated). This appears to be “TransferAgent\Win\File” prior to some version between v2180 and v2194, when it became “TransferAgent\File”. The v2172 executable seems to read the file either way, so this type doesn't seem to be particularly rigorously enforced.|
|16 bytes|| The next 16 bytes appear to be a GUID for identifying this TransferAgent. For instance, a file I created (named “MK17Pilot”) has the following bytes in this section:
22 00 9A C8 CC DF DA 43 A2 C1 94 BC 3B AE 4B 1D
If I then look at the PhoenixRegistry entry governing whether or not the editor is allowed to save that file, it stores it in a bool named “MK17Pilot_22009ac8-ccdf-da43-a2c194bc3bae4b1d”, which, yes, matches that byte sequence.
|9 bytes||According to the length value at 0x37/0x3b, the next 9 bytes are also related to the TransferAgent somehow. I'm unsure how; in my aforementioned “MK17Pilot” test file, they're all null. Likewise in some empty files I created in multiple different versions of the editor. In v2172's Pilots.DEObjects file, they go “00 04 CD CD CD 00 00 00 00”, and I have no idea what that's supposed to mean.|
|1 byte||In every file I've checked, the next byte is always 0x04.|
|4 bytes||The next four bytes serve an unknown purpose; in every editor-created file, they're null. In Pilots.DEObjects, they're “CD CD CD CD”.|
|1 byte||Always seems to be 0x01.|
|1/5-byte integer||Measures the length of the upcoming blob of data.|
|Blob of data||No idea what purpose it serves.|
|4 bytes||Seems to be a 4-byte integer, although it doesn't appear to be the length of anything. In my “MK17Pilot” test file, it's 0x04. In Pilots.DEObjects, it's 0x14A (330), which coincidentally happens to be the address at which it's found. In Desert.DEObjects, it's 0x2D6 (726), when it's found at address 0x33E (830). Trying to jump down that many bytes winds up in the middle of other data, so I have no idea what it's measuring.|
At this point we seem to hit the directory listing for every object in the file. Each listing follows this format:
|1 byte||Seems to always be 0x04.|
|1/5-byte integer||At least, I'm assuming this is a variable-length integer; I have yet to encounter a directory entry long enough to need 255 bytes. Measures the length of this entry, not counting the content address.|
|4-byte integer||Length of the name of this object.|
|String||Null-terminated name of this object.|
|4-byte integer||Length of the type of this object.|
|String||Null-terminated type of this object.|
|16 bytes||GUID to identify this object.|
|1 byte||Seems to always be null.|
|4-byte integer||Not sure what this number means. Low values in my MK17Pilot test file (between 5 and 10), high values (in the millions) in DE-created files.|
|4 bytes||Seem to always be null.|
|1 byte||Seems to always be 0x04.|
|4 bytes||Absolute byte offset for locating the content of this object. Tested with several objects in both self-created and DE-created files; seems reliable.|
The last piece of information I've successfully figured out is that, at some point, the editor started saving ValueStrings in a different format than v2172 expects to read them in (this is why the biographies of custom pilots won't show up in v2172, by the way). In the “content” section of the file, strings start out like this:
|4-byte integer||Length of the total content.|
|1 byte||The value 0x01.|
|Variable-length integer||Between 0-254, single byte. Higher than that, 0xFF followed by a 4-byte integer. This is the length of the string's name.|
|4-byte integer||The length of the string's name, again, different from the previous value only in that it doesn't include itself (this sort of redundancy seems to happen a lot).|
|String||Name of the ValueString (e.g. “PilotDescription” for a pilot biography), terminated by a null character.|
This is where they diverge. In Pilots.DEObjects, the next byte reads 0x02. Anything I make in the editor instead reads 0x03 afterwards. As such, I tend to call them “type 2” versus “type 3” strings.
After that, we get another double-length (a variable-length integer for length followed by a 4-byte integer with the exact same value, minus 4 for not including itself), and then what happens next depends on the type of string. If it's a “type 2” string, you get a perfectly ordinary sequence of characters ending in a null terminator.
If it's a “type 3” string, you instead get yet another length value (measuring the string in characters instead of bytes), and each character is separated by a null byte for unknown reasons (maybe they're 16-bit characters and it's intended for unicode support?). Additionally, the editor seems to assign a larger buffer than actually needed to contain the string, as the null terminator (which is itself surrounded by null byte separators) seems to be followed by random junk data that happened to be stored in memory (which is, again, separated with null bytes) until you hit the “actual” length of the object.
Both versions seem to have the same footer, which I have had no luck deciphering:
06 00 00 00 02 00 01 00 01 00 03 00 00 00 01 01 00 00 00 00 00
Note that this footer is not included in the object's length; not even the overall length at the start of the object (and yet, the footer appears to be present even if there is no object after the ValueString, and seems to be associated specifically with ValueStrings in some way, because other objects seem to have different footers).
While it appears, at first blush, to be a series of length-encoded values (treating “06 00 00 00” as a length of 6, skipping that many bytes, treating “03 00 00 00” as a length of 3, skipping that many bytes, and finally treating the last four null bytes as a length of 0 takes you to the end of the footer), this may just be a coincidence; other footers for other objects don't work out so neatly (although I don't want to give specific examples in case it turns out I haven't actually identified another object's footer correctly; I feel reasonably confident in this ValueString footer only because I've stared at so many of them recently).
It's possible that the footers are simply junk data; given that object content appears to be located via absolute byte offsets, anything past the object's content length but before the next object's content starts may not be relevant at all.
The format seems wildly inefficient; lots of redundant length values, plus storing string lengths and null-terminating them. Additionally, at least some parts of files appear to be “junk data”; little bits of whatever was left in memory, shoved into the gap between where the data actually ends and where the length value says it should end. This makes it very hard to reverse-engineer the format; it's hard to tell the difference between actual (albeit hard-to-decipher) data and pure junk. At some point, the only way to gather more data may be to try creating intentionally malformed DEObjects files and seeing which ones crash the engine and which don't.