This is an old revision of the document!
Table of Contents
SC2Inspector - C#
Project Log
12/29/2011 @ 05:57
Finally making some promising progress! I believe I last left off where I had just finished parsing MPQ\x1B and the User Data. I've since made quite a bit of progress parsing MPQ\x1A (forever after known as “The MPQ File”). I was able to successfully parse the Header from the MPQ file, which told me where and how long the Hash Table and Block Tables were. I was then able to decipher the Hash Table and Block Tables. I was THEN successfully able to take a filename (for instance “(listfile)”), hash it, compare the hash to the hash table and find it's hash. I believe I can then use the BlockIndex in the Hash Table and look it up in the Block Table to find the file's location, which can then be “cut out” and decrypted or decompressed.
I've used more code than I'd like from other people, but most of this is bit-shifting magic that is beyond me. These being most of the Hash Table encryption/decryption. I think I'm finally getting close to the actual SC2 stuff. I used MPQExtractor to take a look at some of the replay.whatever files and they're in their own internal format, which should be a daunting task to figure out. I just added a bunch of useful resources in case I can't find them again.
I feel like I should be writing more, but when I look over the code it's pretty simple in hindsight. It's simply using BinaryReader to read a bunch of numbers from a binary file. The only complex parts are the decryption parts which I find very confusing. I think I'll write a BROAD bit of pseudo-code about how this thing works:
Open SC2Replay rile Read first 4 bytes If first 4 bytes are MPQ\x1B Then Read User Data from MPQ1B header (Version, etc), most importantly the Header Offset (usually 1024) End If Advance Byte Buffer to Header Offset (1024) Read next 4 bytes If first 4 bytes are MPQ\x1A Then Read information about MPQ1A header (ArchiveSize, MPQVersion, HashTable & BlockTable Size & Position, etc) End If Advance Byte Buffer to beginning of HashTable Read next (HashTableSize * 16) Bytes Decrypt HashTable Read the following over and over (16 bytes total) from the decrypted stream: Name1, Name2, Locale, BlockIndex Store the above in a C# HashTable so we can easily access it Advance Byte Buffer to beginning of BlockTable Read next (BlockTableSize * 16) Bytes Decrypt BlockTable Read the following over and over (16 bytes total) from the decrypted stream: FileOffset, (compute FilePos), Compressed Size, FileSize, Flags Store the above in a C# List so we can easily access it
In order for the Hashtable to be super optimized the filename is converted to a number (hash) which is then stored. This hash contains information about which blockindex the block data resides in. The blockindex then tells us where the data for the filename is in the file. It's also important to note that there are apparently several different Hash Types.
From what I can determine:
Hash Type | Purpose |
---|---|
0x000 | Hashes an Index |
0x100 | Hashes Name1 |
0x200 | Hashes Name2 |
0x300 | Hashes Table Data |
I'm not 100% sure where these Hash Types factor into the encryption. There appears to be two seeds and then the Hash Type is the offset for the hash. I don't know what this means. See SC2Inspector.MPQLogic.MPQUtilities.HashString(string input, int offset).
It seems like I done an incredible amount of work in just one day. I've been at this for over 12 hours it looks like. Oh well, I'm off to bed since it's 6am. I _REALLY_ hope I resume this project tomorrow. Not even going to close all my VS/Chrome windows.
Committed r2.
12/28/2011 @ 23:46
Ok, so I've gotten a decent amount done. I was looking at code for an MPQ parser and it looks like the SC2Replay files have a slightly different format. They start with MPQ\x1B then 1024 bytes into the file have another MPQ\x1A which actually starts the normal MPQ file. MPQ1B seems to be a StarCraft II only option for displaying additional metadata without having to read the file.
Using the following data taken from a random SC2Replay:
I was able to determine the following:
Attribute | Location (d) | Hex Value | Interpretation |
---|---|---|---|
Id | x0-x3 | 4D 50 51 1B | MPQ\x1B |
UserDataMaxSize | x4-x7 | 00 02 00 00 | 512 |
HeaderOffset | x8-x11 | 00 04 00 | 1024 |
UserDataSize | x12-x15 | 3C 00 00 00 | 60 |
DataType | x16 | 05 | DataType indicated upcomming data is of type Array with keys |
NumberOfElements | x17 | 08 | Indicates 4 elements in array (VLF) |
Index | x18 | 00 | Sets index to 0 |
DataType | x19 | 02 | DataType indicated upcomming data is of type Binary Data |
NumberOfElements | x20 | 2C | Indicates 22 elements in the upcomming array |
StarCraftII | x21-x42 | 53 74 61 72 43 72 61 66 74 20 49 49 20 72 65 70 6C 61 79 1B 31 31 | A bunch of hex values which resolve to a byte array of ASCII values which end up as “Starcraft II Replay 11” |
Index | x43 | 02 | Sets index to 1 |
DataType | x44 | 05 | DataType indicated upcomming data is of type Array with keys |
NumberOfElements | x45 | 0C | Indicates 6 elements in array (VLF) |
Index | x46 | 00 | Sets index to 0 |
DataType | x47 | 09 | DataType indicated upcomming data is of type VLF |
Version | x48 | 02 | Major Version |
Index | x49 | 02 | Sets index to 1 |
DataType | x50 | 09 | DataType indicated upcomming data is of type VLF |
Version | x51 | 02 | Minor Version |
Index | x52 | 04 | Sets index to 2 |
DataType | x53 | 09 | DataType indicated upcomming data is of type VLF |
Version | x54 | 00 | Patch Version |
Index | x55 | 06 | Sets index to 3 |
DataType | x56 | 09 | DataType indicated upcomming data is of type VLF |
Version | x57 | 00 | Revision Version |
Index | x58 | 08 | Sets index to 4 |
DataType | x59 | 09 | DataType indicated upcomming data is of type VLF |
Version | x60 | EA | Build Version |
Unknown | x61-x75 | FB 01 0A 09 DA F0 01 04 09 04 06 09 FE 9E 05 FB | I'm unsure after this part. Nothing seems to add up correctly. |
VLF represents a “Variable Length Format” integer.
Additionally, SC2Replay files have a quirk concerning the way integers are stored. An integer consists of a variable number of bytes in Big Endian order. When parsing an integer, the first i.e. most significant bit of a byte indicates that the succeeding byte is counted towards the integer's value. After parsing all bytes of a number, the least significant bit of the result indicates the sign. Extract this bit and shift the number's value to the right by one. If the bit is set, change the sign to negative, otherwise leave it positive.
Source: http://trac.erichseifert.de/warp/wiki/SC2ReplayFormat#VariableLengthFormat
I've taken the following code from a C# SC2Replay client to do this VLF for me:
private static int ParseVLFNumber(BinaryReader reader) { var bytes = 0; var first = true; var number = 0; var multiplier = 1; while (true) { var i = reader.ReadByte(); number += (i & 0x7F) * (int)Math.Pow(2, bytes * 7); if (first) { if ((number & 1) != 0) { multiplier = -1; number--; } first = false; } if ((i & 0x80) == 0) { break; } bytes++; } return (number / 2) * multiplier; }
This took almost five hours to decipher with a LOT of help from various sources around the internet. I can't find it anymore but I thought I read somewhere that the game length was supposed to be in the header, but this could be incorrect. I would think the game length would be with the game recording date, players, colors, etc.
Next step is to work on extracting the different files from the ACTUAL MPQ (MPQ\x1A) then parse through the details there.
12/28/2011 @ 18:31
Decided to start on this project. I've added the ViewModelBase and made some changes to App.xaml and App.xaml.cs. Committed r1.
Resources
- SC2ReplayFormat Google Code (More Up To Date Than Above) - http://code.google.com/p/starcraft2replay/w/list
- MoPaQ Archive Format - http://wiki.devklog.net/index.php?title=The_MoPaQ_Archive_Format
- Mangos-Zero MQP Information - https://github.com/mangos-zero/server/wiki/MPQ-files
- PHPSC2Replay - http://code.google.com/p/phpsc2replay/
- SC2Replay-CSharp - https://github.com/ascendedguard/sc2replay-csharp