The file header

classDiagram file_header_t : header = const "daeh" file_header_t : version = xbox | pc| mcc | custom_edition file_header_t : decompressed_len = u32 file_header_t : trailing_space = u32 file_header_t : tag_index_offset = u32 file_header_t : tag_index_len = u32 file_header_t : name = string, 32 bytes file_header_t : build_date = string, 32 bytes file_header_t : map_type = single_player | multiplayer | ui file_header_t : ... padding of 1944 bytes ... file_header_t : footer = const "toof" tag_index_t <-- file_header_t: Points to tag_index_t : tag_data_offset = u32 tag_index_t : base_tag = u32 tag_index_t : vertex_size = u32 tag_index_t : tag_count = u32 tag_index_t : vertex_objects = reference to vertex tag_index_t : index_objects = reference to u16 reference : count = u32 reference : offset = u32 reference : padding = u32 compact_reference : count = u32 compact_reference : offset = u32

These are two of the fundamental structures to Halo Combat Evolved’s (hereby referred to as HCE) map files; there’s the file header pointing to a tag index, and a tag index pointing to other resources in the file. The file header contains some barebones information, such as the type of the map (since there are multiple versions of Halo’s map files), the name, the map type, build date, and how big the file would be if uncompressed (in the case of the Xbox version).

The different versions employ some different tricks; most significantly, the Xbox version compresses its maps using Zlib, decompressing it at runtime, and also employs compressed vertex formats where possible. There’s more about version differences as well.

There is also a particular value in the tag index structure, the tag_data_offset, that has a lot of significance; this was most likely used in the original implementation for mapping the file to a certain location in memory such that most if not all structures could be found purely by casting values to pointers. This would have massive speed benefits, as there would be no calculation to find out where a specific structure could be found.

The approach of memory-mapping to a specific address, however, has for the last decade been a bad practice, as methods such as Address Space Layout Randomization (ASLR) have been put into use to avoid a wide range of exploits.

As such, a modern implementation often looks like:


template<typename T>
auto map_pointer_to_memory(const char* base_ptr, u32 offset, u32 file_offset)
{
    return reinterpret_cast<const T*>(base_ptr + (offset - file_offset));
}

This incurs a small cost to each structure lookup, but is quite doable, and there is always some information needed from the tag index anyway in most cases.

This brings us over to the topic of what a tag index is; the tag index is a long array of descriptors, complete with names, types of objects and an offset to it in the file (again, encoded as a pointer in memory that needs to be remapped). Each is also assigned a tag ID that lets any other structure refer to any data by an ID at minimum. HCE, in fact, does not verify that the tag type being accessed is correct, however this could be a release-mode optimization that developers might want to have for debugging purposes. Custom map makers, however, have used this to their advantage in order to obfuscate their maps; removing all type information and labels from the tags.

A tag overall looks like:

classDiagram tag_t : tag_classes = tag_class_t[3] tag_t : tag_id = u32 tag_t : name_offset = u32 tag_t : data_offset = u32 tag_t : tag_storage = internal | external tag_t : padding = 4 bytes

A tag may have three different types assigned to it; some examples include [vehi, unit, obje], which describes a type hierarchy, where a vehicle is inheriting all properties of a unit, and a unit inherits all properties of an object (aka. an object placed within the level).

The tag_id is a bit more self-explanatory; it is simply an integer. Additionally, the tag index stores its index in increasing order; meaning the tag index can be optimized with binary search for tags instead of linear searches.

The name offset points to a string describing the tag, set by whoever created the map.

The data offset points to the data structure in the map file, however it can also point to data in other files; some examples include bitmaps.map and sounds.map, which are shared across all maps on the PC version, Custom Edition and MCC. There is also a loc.map file used for localization, sharing strings between map files.

Version differences

There are four major versions of Halo: Combat Evolved, each with its own quirks; not listed here are the different patches released by Bungie, such as the monumental 1.0.10 patch for Combat Evoled/Custom Edition on PC, but they don’t change the map structure.

  • Halo: Combat Evoled (2001) for the original Xbox
  • Halo: Combat Evoled (2004) for PC ported by Gearbox
  • Halo Custom Edition (2004), also Gearbox
  • Halo: Master Chief Collection (2014) for Xbox/PC, based on Gearbox port

The Xbox version is different in a lot of ways:

  • The maps are compressed with Zlib beyond the map header
  • ALL assets that need to be loaded are located in the .map file
  • No uncompressed vertex data
    • Floating-point positions
    • 16-bit unsigned integer texture coordinates for models
    • Floating point texture coordinates for maps
    • Packed tangents/binormals/normals
    • Bone information uses a more compact format, 8-bit bone indices
      • 8-bit bone indices might sound small, but it still allows 255 bones per region of a model
  • The shaders are implemented correctly (!)

The PC version from 2004 introduced

  • Uncompressed map files
  • bitmaps.map, sounds.map and loc.map
    • bitmaps.map stores bitmap data; there are also data structures in the file which need to be dereferenced.
    • sounds.map contains purely sound data in the stock PC version, while it also contains some data structures in Custom Edition (this is a very confusing difference, at least to me)
    • loc.map only contains localization strings for the UI
  • Contains uncompressed vertex data
    • Floating-point positions, texture coordinates, tangents/binormals/normals
    • Floating-point bone weights, 16-bit bone indices
  • The shaders were implemented incorrectly :(

The MCC version from 2014 launched with all the graphical defects from the Gearbox port. (This section is incomplete ;) )

Vertex formats

There are two vertex formats present in HCE map files primarily; there’s a compressed variant and an uncompressed variant. The uncompressed variant only appeared in the Halo: Combat Evolved port done by Gearbox for PC, while the original Xbox version employed only compressed vertices.

The compressed vertices mainly concern handling of texture coordinates for models and normals, binormals and tangents in all cases; texture coordinates were simply stored as normalized i16 values, while normals were stored in u11u11u10-formatted 32-bit integers. Additionally, bone weights work differently, with the uncompressed variant allowing two weight values (as of now, I don’t understand the significance, but maybe I’ll write a post about how the skeletal animation works in the future).

The uncompressed vertices are way more applicable to PC as well as mobile GPUs; there is no extra processing required to unpack the normals here.

With these facts laid out, however, it’s clear that any program seeking to render this data will need to have separate vertex shaders for Xbox and PC maps, or use the compressed vertex data with the PC maps (which is technically possible, as it’s still included, I believe).