This project was originally envisioned to be a C# version of SC2Gears. The goal was to provide an extremely detailed look at the SC2 Replay File format.
Serialized Data (most up to date):
0 (ArrayWithKeys) - Base Array |--- 0 (BinaryData) - Starcraft II Replay 11 |--- 1 (ArrayWithKeys) - Data Array | |--- 0 (NumberInVLF) - Unknown | |--- 1 (NumberInVLF) - Version: Major | |--- 2 (NumberInVLF) - Version: Minor | |--- 3 (NumberInVLF) - Version: Patch | |--- 4 (NumberInVLF) - Version: Build | `--- 5 (NumberInVLF) - Unknown, might be Revision? Seems close to Build |--- 2 (NumberInVLF) - Unknown `--- 3 (NumberInVLF) - Game Length (in 1/16 seconds)
Extended information, all correct except for where noted at the end.
Attribute | Location (d) | Hex Value | Interpretation |
---|---|---|---|
Id | x0-x3 | 4D 50 51 1B | MPQ\x1B |
UserDataMaxSize | x4-x7 | 00 02 00 00 | 512 |
HeaderOffset | x8-x11 | 00 04 00 | 1024 |
UserDataSize | x12-x15 | 3C 00 00 00 | 60 |
DataType | x16 | 05 | DataType indicated upcomming data is of type Array with keys |
NumberOfElements | x17 | 08 | Indicates 4 elements in array (VLF) |
Index | x18 | 00 | Sets index to 0 |
DataType | x19 | 02 | DataType indicated upcomming data is of type Binary Data |
NumberOfElements | x20 | 2C | Indicates 22 elements in the upcomming array |
StarCraftII | x21-x42 | 53 74 61 72 43 72 61 66 74 20 49 49 20 72 65 70 6C 61 79 1B 31 31 | A bunch of hex values which resolve to a byte array of ASCII values which end up as “Starcraft II Replay 11” |
Index | x43 | 02 | Sets index to 1 |
DataType | x44 | 05 | DataType indicated upcomming data is of type Array with keys |
NumberOfElements | x45 | 0C | Indicates 6 elements in array (VLF) |
Index | x46 | 00 | Sets index to 0 |
DataType | x47 | 09 | DataType indicated upcomming data is of type VLF |
Version | x48 | 02 | Major Version |
Index | x49 | 02 | Sets index to 1 |
DataType | x50 | 09 | DataType indicated upcomming data is of type VLF |
Version | x51 | 02 | Minor Version |
Index | x52 | 04 | Sets index to 2 |
DataType | x53 | 09 | DataType indicated upcomming data is of type VLF |
Version | x54 | 00 | Patch Version |
Index | x55 | 06 | Sets index to 3 |
DataType | x56 | 09 | DataType indicated upcomming data is of type VLF |
Version | x57 | 00 | Revision Version |
Index | x58 | 08 | Sets index to 4 |
DataType | x59 | 09 | DataType indicated upcomming data is of type VLF |
Version | x60 | EA | Build Version |
Unknown | x61-x75 | FB 01 0A 09 DA F0 01 04 09 04 06 09 FE 9E 05 FB | I'm unsure after this part. Nothing seems to add up correctly. |
0 (ArrayWithKeys) - Base array |--- 0 (SimpleArray) - Array of player structs | `--- x (ArrayWithKeys) - Player struct | |--- 0 (BinaryData) - Player name | |--- 1 (ArrayWithKeys) - Probably some further details | | |--- 0 (NumberInVLF) - Unknown | | |--- 1 (NumberOfFourBytes) - Unknown | | |--- 2 (NumberInVLF) - Unknown | | `--- 4 (NumberInVLF) - RealID | |--- 2 (BinaryData) - Localized race name | |--- 3 (ArrayWithKeys) - Array of player color values | | |--- 0 (NumberInVLF) - Alpha | | |--- 1 (NumberInVLF) - Red | | |--- 2 (NumberInVLF) - Green | | `--- 3 (NumberInVLF) - Blue | |--- 4 (NumberInVLF) - Unknown | |--- 5 (NumberInVLF) - Unknown | |--- 6 (NumberInVLF) - Handicap | |--- 7 (NumberInVLF) - Team | `--- 8 (NumberInVLF) - Unknown |--- 1 (BinaryData) - Localized map name |--- 2 (BinaryData) - Unknown |--- 3 (ArrayWithKeys) - Array containing map preview file names | `--- 0 (BinaryData) - Map preview file name |--- 4 (NumberOfOneByte) - Unknown |--- 5 (NumberInVLF) - Save time of the replay |--- 6 (NumberInVLF) - Unknown |--- 7 (BinaryData) - Unknown |--- 8 (BinaryData) - Unknown |--- 9 (BinaryData) - Unknown |--- 10 (SimpleArray) - Likely something about the map file | |--- 0 (BinaryData) - Unknown | `--- 1 (BinaryData) - Unknown |--- 11 (NumberOfOneByte) - Unknown |--- 12 (NumberInVLF) - Unknown `--- 13 (NumberInVLF) - Unknown
0x0BBF 0x1 - Part 0x2 - Part 0x01F4 0x1 - Humn 0x2 - Humn 0x0BB9 0x1 - Zerg 0x2 - Terr 0x07DC 0x1 - T3 0x2 - T1 0x07E2 0x1 - T3 0x2 - T4 | 0x07D7 0x1 - T1 0x2 - T2 0x0BBB 0x1 - 100 0x2 - 100 0x07D6 0x1 - T3 0x2 - T4 0x07D2 0x1 - T2 0x2 - T1 0x0BBA 0x1 - tc04 0x2 - tc07 | 0x07D8 0x1 - T1 0x2 - T2 0x07D4 0x1 - T1 0x2 - T2 0x07D3 0x1 - T2 0x2 - T2 0x0BBC 0x1 - Medi 0x2 - Medi 0x07DB 0x1 - T1 0x2 - T2 | 0x07D5 0x1 - T1 0x2 - T2 0x03E8 0x10 - Dflt 0x0BC0 0x1 - Obs 0x2 - Obs 0x0BC2 0x10 - yes 0x07E1 0x1 - T3 0x2 - T4 0x07D0 0x10 - t2 |
Ok so I've been working on this most of the day. I've finished the InitData file as well as the AttributesEvents file! It looks like I'm 100% done with the game metadata about the players. Here's all the information I have: SC2Inspector ReplayDetails Locals
The attributes.events file has kind of an interesting format. This one bit of code pretty much does all the work:
uint NumAttribs = BinaryReader.ReadUInt32(); uint AttribHeader; uint AttribId; int PlayerId; string AttribVal; int NumSlots; for (int i = 0; i < NumAttribs; i++) { AttribHeader = BinaryReader.ReadUInt32(); AttribId = BinaryReader.ReadUInt32(); PlayerId = BinaryReader.ReadByte(); AttribVal = Conversion.ReverseString(Encoding.Default.GetString(BinaryReader.ReadBytes(4))).Replace("\0", String.Empty); if (!AttribDict.ContainsKey(AttribId)) { AttribDict.Add(AttribId, new Dictionary<int, string>()); } AttribDict[AttribId].Add(PlayerId, AttribVal); } if (NumAttribs == 0) { throw new Exception("Zero attributes."); }
I run that code after I read four bytes from the beginning. It splits everything out into what can be used as a multidimensional associative array. Here's some sample data: replay.attributes.events Sample Data. There's some information in there which I'm not sure about either. Either way this segment is done. I FINALLY think that tomorrow I can go ahead and start on replay.events!
Time to update this table:
Filename | Purpose |
---|---|
replay.details | Contains sometimes inaccurate (?) data about players including their name, RealId, Race, Map name, save time, etc. |
replay.initData | Contains information about who is playing (names), as well as the Realm, Map hash, and some sort of account identifier. |
replay.attribute.events | Contains detailed information about the players, their race, difficulty, color, team info, game speed, etc. |
replay.game.events | Actions |
replay.message.events | Chat, Ping |
replay.smartcam.events | Presumably player cameras |
replay.sync.events | Presumably consistency checks |
Committed r7.
Ok so I've been slacking on my documentation. I have fully parsed all of the documented fields in the replay.details file. Here is what the output looks like: ReplayDetails Locals Window
Wow ok, so after a 45min battle of trying to get that file to upload with the new Dokuwiki install, we're back on track. I ran into an issue with ParseVLFNumber() where it wouldn't spit out (what I expected to be) a VERY large number representing the timestamp of the replay. I went around and around and decided that the issue was that ParseVLFNumber() (modified from sc2replay-csharp) was using ints for everything. Obviously this number was too big for the int so I modified ParseVLFNumber() to use longs for everything. I don't expect to see a number bigger than a long, but who knows. That allowed me to pull out the date successfully. I had some issues determining whether the timestamp was UTC or the user's local time. I discovered the timestamp is UTC and that there is an additional timezone field which tells how much of an UTC offset the recorder had.
I also modified InspectorViewModel and ReplayViewModel to allow for multiple replays to be loaded. I'm very happy with how fast the program is (not that the files are that big, but it is complex) at the moment. I've finished replay.details. I think I need to move on to InitData next, but I'm not sure.
Because I went back and modified MPQ\x1B UserData retrieval I was able to successfully extract the actual game playtime. The value that is given is in 1/16s of a second. I do not know where this value comes from. I do know that the current version of Sc2gears (8.8) is displaying incorrect game lengths. It seems to be applying ~1/22 of a second to the values. I verified this by viewing a replay in the StarCraft II client itself. The game time is 20:02 and SC2gears reports it as 14:28. Weird.
Committed r6.
I've taken some time to update the wiki page. I've added the references section and went back and converted the old MPQ\x1B information from raw byte “queries” into the serialized data parser. Everything is MUCH cleaner now.
I've been successful so far in decrypting the replay.details file. I spent some time and developed a serialized data parser. I've been able to piece together the format of the replay.details file: replay.details File Format
I've finally gotten to the point where I've read and enumerated all of the files in the MPQ. I think I did a really good job laying out my classes. I have a MPQArchive class which looks like this (displaying listfile contents): MPQArchive Locals Window
Now I have to determine what exactly is in each replay.* file.
Filename | Purpose |
---|---|
replay.details | Basic metadata |
replay.initData | Unknown |
replay.attribute.events | Unknown |
replay.game.events | Actions |
replay.message.events | Chat, Ping |
replay.smartcam.events | Presumably player cameras |
replay.sync.events | Presumably consistency checks |
Ok well I took a short break to hang out with my roommates but I've finally be able to retrieve the raw uncompressed data for each file! I'm fairly sure this means that the next step is to start parsing the actual SC2 data! Right here I'd like to give a big shout out to Foole, the author of “mpqtool” (http://code.google.com/p/mpqtool/). This stuff is very complex and I don't understand a lot about how to decrypt the hash tables and such. The code from his tool has helped me IMMENSELY. Looking at his copyright it looks like most of his code is based off StormLib by Ladislav Zezula. Thanks to both of you!
I've spent some time getting my comments up to date. I've also begun using SharpZipLib (http://www.icsharpcode.net/opensource/sharpziplib/) to do the BZip2/GZip decompression. I went into the project wanting to do everything myself and I think I've done good so far. Writing a decompressor for BZip2/GZip would be a project within itself. I think it's good that I've done the MPQ stuff myself. I can make changes should Blizzard change the format of the SC2Replay file.
Once again I'm looking over my code and it doesn't seem like it's TOO complicated. I've gotten very good at manipulating bytes and streams. Actually lastnight I discovered the Hexadecimal display for Locals/Watches this has been immensely helpful for looking over the variable vales and comparing them to the actual file's data. I has been using “(listfile)” as the file to test my decryption and everything on. The file is very short (only a few hundred bytes). I extracted the file with Ladik's MPQ Editor and look at the data in a hex editor. I compared the data to the RawContents variable that is supposed to have the decompressed contents of the file. They matched! I then changed the file I'd been retrieving to the one with all the data: “replay.game.events”. I was very excited to see that the length of the byte[] matched the size of the file and that the first several bytes match as well as the last several bytes. I think that's enough to call this portion of the project done. The only things I expect to have to deal with MPQ files are maybe playing with “(listfile)” or “(attributes)”.
I'm going to take a short break then see what I can decipher from the other replay.game.* files. Committed r5.
Well I've been working on this for about an hour and I decided to redo the block stuff. Now both the BlockTable and the HashTable are Hashtables with Hash and Block objects stored within. This makes it easier to resolve the HashTable entries to their BlockTable entries. I've been checking all my numbers against “Ladik's MPQ Editor” (amazing tool) and everything is coming out perfectly. The next step is to read the actual file information. I'm tempted to make a MPQFile class but I think I'll store all the data within MPQBlock and keep everything in one place (this already has file flags, compressed size, location, etc). Onward! Committed r3.
Finally making some promising progress! I believe I last left off where I had just finished parsing MPQ\x1B and the User Data. I've since made quite a bit of progress parsing MPQ\x1A (forever after known as “The MPQ File”). I was able to successfully parse the Header from the MPQ file, which told me where and how long the Hash Table and Block Tables were. I was then able to decipher the Hash Table and Block Tables. I was THEN successfully able to take a filename (for instance “(listfile)”), hash it, compare the hash to the hash table and find it's hash. I believe I can then use the BlockIndex in the Hash Table and look it up in the Block Table to find the file's location, which can then be “cut out” and decrypted or decompressed.
I've used more code than I'd like from other people, but most of this is bit-shifting magic that is beyond me. These being most of the Hash Table encryption/decryption. I think I'm finally getting close to the actual SC2 stuff. I used MPQExtractor to take a look at some of the replay.whatever files and they're in their own internal format, which should be a daunting task to figure out. I just added a bunch of useful resources in case I can't find them again.
I feel like I should be writing more, but when I look over the code it's pretty simple in hindsight. It's simply using BinaryReader to read a bunch of numbers from a binary file. The only complex parts are the decryption parts which I find very confusing. I think I'll write a BROAD bit of pseudo-code about how this thing works:
Open SC2Replay rile Read first 4 bytes If first 4 bytes are MPQ\x1B Then Read User Data from MPQ1B header (Version, etc), most importantly the Header Offset (usually 1024) End If Advance Byte Buffer to Header Offset (1024) Read next 4 bytes If first 4 bytes are MPQ\x1A Then Read information about MPQ1A header (ArchiveSize, MPQVersion, HashTable & BlockTable Size & Position, etc) End If Advance Byte Buffer to beginning of HashTable Read next (HashTableSize * 16) Bytes Decrypt HashTable Read the following over and over (16 bytes total) from the decrypted stream: Name1, Name2, Locale, BlockIndex Store the above in a C# HashTable so we can easily access it Advance Byte Buffer to beginning of BlockTable Read next (BlockTableSize * 16) Bytes Decrypt BlockTable Read the following over and over (16 bytes total) from the decrypted stream: FileOffset, (compute FilePos), Compressed Size, FileSize, Flags Store the above in a C# List so we can easily access it
In order for the Hashtable to be super optimized the filename is converted to a number (hash) which is then stored. This hash contains information about which blockindex the block data resides in. The blockindex then tells us where the data for the filename is in the file. It's also important to note that there are apparently several different Hash Types.
From what I can determine:
Hash Type | Purpose |
---|---|
0x000 | Hashes an Index |
0x100 | Hashes Name1 |
0x200 | Hashes Name2 |
0x300 | Hashes Table Data |
I'm not 100% sure where these Hash Types factor into the encryption. There appears to be two seeds and then the Hash Type is the offset for the hash. I don't know what this means. See SC2Inspector.MPQLogic.MPQUtilities.HashString(string input, int offset).
It seems like I done an incredible amount of work in just one day. I've been at this for over 12 hours it looks like. Oh well, I'm off to bed since it's 6am. I _REALLY_ hope I resume this project tomorrow. Not even going to close all my VS/Chrome windows.
Committed r2.
Ok, so I've gotten a decent amount done. I was looking at code for an MPQ parser and it looks like the SC2Replay files have a slightly different format. They start with MPQ\x1B then 1024 bytes into the file have another MPQ\x1A which actually starts the normal MPQ file. MPQ1B seems to be a StarCraft II only option for displaying additional metadata without having to read the file.
Using the following data taken from a random SC2Replay:
I was able to determine the following: MPQ\x1B Format
VLF represents a “Variable Length Format” integer.
Additionally, SC2Replay files have a quirk concerning the way integers are stored. An integer consists of a variable number of bytes in Big Endian order. When parsing an integer, the first i.e. most significant bit of a byte indicates that the succeeding byte is counted towards the integer's value. After parsing all bytes of a number, the least significant bit of the result indicates the sign. Extract this bit and shift the number's value to the right by one. If the bit is set, change the sign to negative, otherwise leave it positive.
Source: http://trac.erichseifert.de/warp/wiki/SC2ReplayFormat#VariableLengthFormat
I've taken the following code from a C# SC2Replay client to do this VLF for me:
private static int ParseVLFNumber(BinaryReader reader) { var bytes = 0; var first = true; var number = 0; var multiplier = 1; while (true) { var i = reader.ReadByte(); number += (i & 0x7F) * (int)Math.Pow(2, bytes * 7); if (first) { if ((number & 1) != 0) { multiplier = -1; number--; } first = false; } if ((i & 0x80) == 0) { break; } bytes++; } return (number / 2) * multiplier; }
This took almost five hours to decipher with a LOT of help from various sources around the internet. I can't find it anymore but I thought I read somewhere that the game length was supposed to be in the header, but this could be incorrect. I would think the game length would be with the game recording date, players, colors, etc.
Next step is to work on extracting the different files from the ACTUAL MPQ (MPQ\x1A) then parse through the details there.
Decided to start on this project. I've added the ViewModelBase and made some changes to App.xaml and App.xaml.cs. Committed r1.