This is an old revision of the document!


SC2Inspector - C#

Reference

MPQ\x1B Format

Attribute Location (d) Hex Value Interpretation
Id x0-x3 4D 50 51 1B MPQ\x1B
UserDataMaxSize x4-x7 00 02 00 00 512
HeaderOffset x8-x11 00 04 00 1024
UserDataSize x12-x15 3C 00 00 00 60
DataType x16 05 DataType indicated upcomming data is of type Array with keys
NumberOfElements x17 08 Indicates 4 elements in array (VLF)
Index x18 00 Sets index to 0
DataType x19 02 DataType indicated upcomming data is of type Binary Data
NumberOfElements x20 2C Indicates 22 elements in the upcomming array
StarCraftII x21-x42 53 74 61 72 43 72 61 66 74 20 49 49 20 72 65 70 6C 61 79 1B 31 31 A bunch of hex values which resolve to a byte array of ASCII values which end up as “Starcraft II Replay 11”
Index x43 02 Sets index to 1
DataType x44 05 DataType indicated upcomming data is of type Array with keys
NumberOfElements x45 0C Indicates 6 elements in array (VLF)
Index x46 00 Sets index to 0
DataType x47 09 DataType indicated upcomming data is of type VLF
Version x48 02 Major Version
Index x49 02 Sets index to 1
DataType x50 09 DataType indicated upcomming data is of type VLF
Version x51 02 Minor Version
Index x52 04 Sets index to 2
DataType x53 09 DataType indicated upcomming data is of type VLF
Version x54 00 Patch Version
Index x55 06 Sets index to 3
DataType x56 09 DataType indicated upcomming data is of type VLF
Version x57 00 Revision Version
Index x58 08 Sets index to 4
DataType x59 09 DataType indicated upcomming data is of type VLF
Version x60 EA Build Version
Unknown x61-x75 FB 01 0A 09 DA F0 01 04 09 04 06 09 FE 9E 05 FB I'm unsure after this part. Nothing seems to add up correctly.

replay.details Format

0 (ArrayWithKeys) - Base array
|--- 0 (SimpleArray) - Array of player structs
     \--- x (ArrayWithKeys) - Player struct
          |--- 0 (BinaryData) - Player name
          |--- 1 (ArrayWithKeys) - Probably some further details
          |    |--- 0 (NumberInVLF) - Unknown
          |    |--- 1 (NumberOfFourBytes) - Unknown
          |    |--- 2 (NumberInVLF) - Unknown
          |    \--- 3 (BinaryData) - RealID
          |--- 2 (BinaryData) - Localized race name
          |--- 3 (ArrayWithKeys) - Array of player color values
          |    |--- 0 (NumberInVLF) - Alpha
          |    |--- 1 (NumberInVLF) - Red
          |    |--- 2 (NumberInVLF) - Green
          |    \--- 3 (NumberInVLF) - Blue
          |--- 4 (NumberInVLF) - Unknown
          |--- 5 (NumberInVLF) - Unknown
          |--- 6 (NumberInVLF) - Handicap
          |--- 7 (NumberInVLF) - Unknown
          \--- 8 (NumberInVLF) - Team
|--- 1 (BinaryData) - Localized map name
|--- 2 (BinaryData) - Unknown
|--- 3 (ArrayWithKeys) - Array containing map preview file names
     \--- 0 (BinaryData) - Map preview file name
|--- 4 (NumberOfOneByte) - Unknown
|--- 5 (NumberInVLF) - Save time of the replay
|--- 6 (NumberInVLF) - Unknown
|--- 7 (BinaryData) - Unknown
|--- 8 (BinaryData) - Unknown
|--- 9 (BinaryData) - Unknown
|--- 10 (SimpleArray) - Likely something about the map file
     |--- 0 (BinaryData) - Unknown
     \--- 1 (BinaryData) - Unknown
|--- 11 (NumberOfOneByte) - Unknown
|--- 12 (NumberInVLF) - Unknown
\--- 13 (NumberInVLF) - Unknown

Project Log

12/30/2011 @ 18:04

I've been successful so far in decrypting the replay.details file. I spent some time and developed a serialized data parser. I've been able to piece together the format of the replay.details file.

replay.details file format:

12/30/2011 @ 18:03

I've finally gotten to the point where I've read and enumerated all of the files in the MPQ. I think I did a really good job laying out my classes. I have a MPQArchive class which looks like this (displaying listfile contents): MPQArchive Locals Window

Now I have to determine what exactly is in each replay.* file.

Filename Purpose
replay.details Basic metadata
replay.initData Unknown
replay.attribute.events Unknown
replay.game.events Actions
replay.message.events Chat, Ping
replay.smartcam.events Presumably player cameras
replay.sync.events Presumably consistency checks

12/30/2011 @ 00:39

Ok well I took a short break to hang out with my roommates but I've finally be able to retrieve the raw uncompressed data for each file! I'm fairly sure this means that the next step is to start parsing the actual SC2 data! Right here I'd like to give a big shout out to Foole, the author of “mpqtool” (http://code.google.com/p/mpqtool/). This stuff is very complex and I don't understand a lot about how to decrypt the hash tables and such. The code from his tool has helped me IMMENSELY. Looking at his copyright it looks like most of his code is based off StormLib by Ladislav Zezula. Thanks to both of you!

I've spent some time getting my comments up to date. I've also begun using SharpZipLib (http://www.icsharpcode.net/opensource/sharpziplib/) to do the BZip2/GZip decompression. I went into the project wanting to do everything myself and I think I've done good so far. Writing a decompressor for BZip2/GZip would be a project within itself. I think it's good that I've done the MPQ stuff myself. I can make changes should Blizzard change the format of the SC2Replay file.

Once again I'm looking over my code and it doesn't seem like it's TOO complicated. I've gotten very good at manipulating bytes and streams. Actually lastnight I discovered the Hexadecimal display for Locals/Watches this has been immensely helpful for looking over the variable vales and comparing them to the actual file's data. I has been using “(listfile)” as the file to test my decryption and everything on. The file is very short (only a few hundred bytes). I extracted the file with Ladik's MPQ Editor and look at the data in a hex editor. I compared the data to the RawContents variable that is supposed to have the decompressed contents of the file. They matched! I then changed the file I'd been retrieving to the one with all the data: “replay.game.events”. I was very excited to see that the length of the byte[] matched the size of the file and that the first several bytes match as well as the last several bytes. I think that's enough to call this portion of the project done. The only things I expect to have to deal with MPQ files are maybe playing with “(listfile)” or “(attributes)”.

I'm going to take a short break then see what I can decipher from the other replay.game.* files. Committed r5.

12/29/2011 @ 20:44

Well I've been working on this for about an hour and I decided to redo the block stuff. Now both the BlockTable and the HashTable are Hashtables with Hash and Block objects stored within. This makes it easier to resolve the HashTable entries to their BlockTable entries. I've been checking all my numbers against “Ladik's MPQ Editor” (amazing tool) and everything is coming out perfectly. The next step is to read the actual file information. I'm tempted to make a MPQFile class but I think I'll store all the data within MPQBlock and keep everything in one place (this already has file flags, compressed size, location, etc). Onward! Committed r3.

12/29/2011 @ 05:57

Finally making some promising progress! I believe I last left off where I had just finished parsing MPQ\x1B and the User Data. I've since made quite a bit of progress parsing MPQ\x1A (forever after known as “The MPQ File”). I was able to successfully parse the Header from the MPQ file, which told me where and how long the Hash Table and Block Tables were. I was then able to decipher the Hash Table and Block Tables. I was THEN successfully able to take a filename (for instance “(listfile)”), hash it, compare the hash to the hash table and find it's hash. I believe I can then use the BlockIndex in the Hash Table and look it up in the Block Table to find the file's location, which can then be “cut out” and decrypted or decompressed.

I've used more code than I'd like from other people, but most of this is bit-shifting magic that is beyond me. These being most of the Hash Table encryption/decryption. I think I'm finally getting close to the actual SC2 stuff. I used MPQExtractor to take a look at some of the replay.whatever files and they're in their own internal format, which should be a daunting task to figure out. I just added a bunch of useful resources in case I can't find them again.

I feel like I should be writing more, but when I look over the code it's pretty simple in hindsight. It's simply using BinaryReader to read a bunch of numbers from a binary file. The only complex parts are the decryption parts which I find very confusing. I think I'll write a BROAD bit of pseudo-code about how this thing works:

Open SC2Replay rile
Read first 4 bytes
If first 4 bytes are MPQ\x1B Then
	Read User Data from MPQ1B header (Version, etc), most importantly the Header Offset (usually 1024)
End If
Advance Byte Buffer to Header Offset (1024)
Read next 4 bytes
If first 4 bytes are MPQ\x1A Then
	Read information about MPQ1A header (ArchiveSize, MPQVersion, HashTable & BlockTable Size & Position, etc)
End If
Advance Byte Buffer to beginning of HashTable
Read next (HashTableSize * 16) Bytes
Decrypt HashTable
Read the following over and over (16 bytes total) from the decrypted stream: Name1, Name2, Locale, BlockIndex
Store the above in a C# HashTable so we can easily access it
Advance Byte Buffer to beginning of BlockTable
Read next (BlockTableSize * 16) Bytes
Decrypt BlockTable
Read the following over and over (16 bytes total) from the decrypted stream: FileOffset, (compute FilePos), Compressed Size, FileSize, Flags
Store the above in a C# List so we can easily access it

In order for the Hashtable to be super optimized the filename is converted to a number (hash) which is then stored. This hash contains information about which blockindex the block data resides in. The blockindex then tells us where the data for the filename is in the file. It's also important to note that there are apparently several different Hash Types.

From what I can determine:

Hash Type Purpose
0x000 Hashes an Index
0x100 Hashes Name1
0x200 Hashes Name2
0x300 Hashes Table Data

I'm not 100% sure where these Hash Types factor into the encryption. There appears to be two seeds and then the Hash Type is the offset for the hash. I don't know what this means. See SC2Inspector.MPQLogic.MPQUtilities.HashString(string input, int offset).

It seems like I done an incredible amount of work in just one day. I've been at this for over 12 hours it looks like. Oh well, I'm off to bed since it's 6am. I _REALLY_ hope I resume this project tomorrow. Not even going to close all my VS/Chrome windows.

Committed r2.

12/28/2011 @ 23:46

Ok, so I've gotten a decent amount done. I was looking at code for an MPQ parser and it looks like the SC2Replay files have a slightly different format. They start with MPQ\x1B then 1024 bytes into the file have another MPQ\x1A which actually starts the normal MPQ file. MPQ1B seems to be a StarCraft II only option for displaying additional metadata without having to read the file.

Using the following data taken from a random SC2Replay:

I was able to determine the following:

VLF represents a “Variable Length Format” integer.

Additionally, SC2Replay files have a quirk concerning the way integers are stored. An integer consists of a variable number of bytes in Big Endian order. When parsing an integer, the first i.e. most significant bit of a byte indicates that the succeeding byte is counted towards the integer's value. After parsing all bytes of a number, the least significant bit of the result indicates the sign. Extract this bit and shift the number's value to the right by one. If the bit is set, change the sign to negative, otherwise leave it positive.
Source: http://trac.erichseifert.de/warp/wiki/SC2ReplayFormat#VariableLengthFormat

I've taken the following code from a C# SC2Replay client to do this VLF for me:

private static int ParseVLFNumber(BinaryReader reader) {
	var bytes = 0;
	var first = true;
	var number = 0;
	var multiplier = 1;
	while (true) {
		var i = reader.ReadByte();
		number += (i & 0x7F) * (int)Math.Pow(2, bytes * 7);
		if (first) {
			if ((number & 1) != 0) {
				multiplier = -1;
				number--;
			}
			first = false;
		}
		if ((i & 0x80) == 0) {
			break;
		}
		bytes++;
	}
	return (number / 2) * multiplier;
}

This took almost five hours to decipher with a LOT of help from various sources around the internet. I can't find it anymore but I thought I read somewhere that the game length was supposed to be in the header, but this could be incorrect. I would think the game length would be with the game recording date, players, colors, etc.

Next step is to work on extracting the different files from the ACTUAL MPQ (MPQ\x1A) then parse through the details there.

12/28/2011 @ 18:31

Decided to start on this project. I've added the ViewModelBase and made some changes to App.xaml and App.xaml.cs. Committed r1.

Resources

project/sc2inspector.1325310286.txt.gz · Last modified: 2011/12/31 05:44 by smark
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0