Wednesday, January 2, 2013

Fog of War: An obfuscated defect

One of the major technical contributions I made at the end of the Underground Railroad project was to fix the fog of war feature. The designer's intent was that a player would only see the current county and one county away; if the player was at a depot of the Underground Railroad, they could see one farther. The team had already implemented a map and an invisible overlay to handle movement. The overlay had the same topography as the map that the player sees, but it was color coded. When clicking on the map, we look up the corresponding coordinates in the overlay to determine the county, as well as whether the countryside or county seat was selected. The same system is used for the popup hints: look up the mouse position in the overlay, and if its a legal target, inform the player.

The theory behind the fog of war, then, was simple. We added an opaque layer on top of the map, which I will call the fog layer.1 When the player's position changes, first determine which counties should be shown. Then, look for those counties in the color-coded overlay. Because the invisible overlay is the same shape as the actual map, turn the corresponding pixels transparent in the fog layer. Presto! We've cut out exactly the right shape in the fog layer to see only the counties that should be shown. Because the map was so big, the team came up with some appropriate optimizations, restricting the search for matching colors to those areas on or near the edges of the current camera view.

Starting the game in Jackson County, KY. The player only sees one county away.
(Freedom is North, so we don't let the player go deeper into the South.)
The team created a very domain model of the game counties, assembled using the Builder pattern in a fluent internal DSL. A representative line looks like this:

Add(Make ()
  .WithID (CountyID.DelawareIN)
  .Coordinates (40.211893f, -85.396077f)
  .SetDepot (1)
  .IsOnRiver ()
  .NorthernIndiana ())
.WithColor (35, 50, 0);

This is adding a new county to the registry. In particular, this is Delaware County, whose county seat has particular GPS coordinates, which had one depot, is on a river, and is in the northern half of Indiana. The last part constructs the color key in the county registry. Note that the actual visual color is arbitrary, since the player never sees it: all we're doing here is saying that the pixels colored as the RGB triple <35,50,0> correspond to Delaware County.

Here's where things get interesting. Unity3D has two color classes: Color and Color32. The former represents colors as four-dimensional floating-point vectors in the range [0,1], while the latter uses integer vectors in the range [0,255]. Each class has a directly corresponding constructor:

static function Color (r : float, g : float, b : float, a : float) : Color

static function Color32 (r : byte, g : byte, b : byte, a : byte) : Color32

Also, both classes support implicit conversion to the other. This means that if you have an object of one type but need the other, it will automatically do the sensible conversion—or the most sensible conversion it can.

I remember the students who worked on the domain model had some trouble understanding these different classes very early in the semester, but then I assumed that everything was taken care of. However, in trying to trace down defects in the fog of war feature, I saw some very strange color processing code. As I poked around the code, I noticed that both Color and Color32 were being used in different places. Whenever there was to be an implicit conversion, the same piece of bit-fiddling code appeared. That certainly shouldn't be necessary, since there's no way the conversion API was broken. After some exploration, I tracked it all back to the county registry builder. Here's the original implementation of WithColor:

public Builder WithColor (float r, float g, float b)
{
    Color key = new Color (r, g, b, 0);
    _county.Color = key;
    _builder.map.Add (key, _county);
    return _builder;
}

The astute reader will note that the error can already be identified. Spoilers ahead.

The call to WithColor sends in a triple of integers—35, 50, and 0 in my original example. These int values are silently converted into float, since that's what the method expects. Then, these floating-point values are passed to Color, which also expects float arguments. Remember how Color is defined? It's a four-dimensional vector of floating-point values in the range [0,1]. Yet, the Color class happily accepts the 35, 50, and 0. If you inspect this Color object or ask it to print itself, you find out that it's <35.0,50.0,0>, as you might expect. Then, if you convert it to a Color32, that object is <255,255,0>.

This also makes sense, if you think about it for a while. The Color class only really cares about values in the range [0,1], and it interprets anything above that range as being equivalent to one. So, if you drew the Color that is <35.0,50.0,0.0>, you would get bright yellow, not dull reddish green. If my students ever fully understood this, they certainly didn't articulate it or move to fix it: instead, they developed a kludge to pull the Color values out as floats and pack those back into a Color32 manually.

There are several lessons to this story. Here's what I got out of it:
  1. Read the API docs. There's no good reason to send floating-point values like 35.0 to the Color constructor.
  2. Make sure your API produces appropriate warnings when being used in such a weird way. There could have been a warning either upon sending the out-of-range values to Color or when they were used in the automatic conversion.
  3. Don't use primitive types when they are not semantically appropriate. Color should not actually take floats as arguments when what it really wants is values in the range [0,1]. So, make a class to represent that concept, and use that. Before you complain that this will impact performance, remember that premature optimization is the root of all evil.
  4. Refactor. The bizarre color-handling kludge showed up in at least two places, clearly the result of copy-paste coding. Had the original developer stopped and refactored this away, he would have at least made a better abstraction for handling the problem. In the best case, he would have recognized the problem and fixed the root cause.
  5. Automatic conversion is awful. You might think you're improving readability of your code, but only if you and the reader share the same mental model and expectations. Better to make it explicit. To me, it's similar to the desire for using static factory methods over explicit constructor calls: good naming can reveal your intention.
By the way, The Underground Railroad in the Ohio River Valley is now open to the public. Enjoy!



I was surprised that the students didn't understand "fog of war" as a metaphor. The original opaque image  looked like literal fog, and at first I couldn't understand why they had chosen it. I had used a technical term from game design, thinking it was common vocabulary, and this led to unexpected team behavior. Usually when this happens to me, it's computing jargon, not game design jargon!

4 comments:

  1. I'm surprised your students didn't understand "fog of war." I guess no one plays strategy games anymore..

    ReplyDelete
    Replies
    1. Right! I expected it would be in common parlance just because these students had some experience with videogames---some quite a bit!

      Delete
  2. I would expect them to know the term from plan and simple history class in high school.

    ReplyDelete
    Replies
    1. I have been surprised at how little they seem to learn from history class in high school. In particular, those who claim not to care about history seemed to learn very little. This is true not just of history class, naturally and sadly.

      Delete