Tuesday, February 11, 2025

Notes from "No Estimates"

I recently finished reading Vasco Duarte's No Estimates after hearing about it on Dave Farley's Continuous Delivery YouTube channel and Allen Holub's #NoEstimates talk. I had been curious about the #NoEstimates movement for some time, reading an article here and there, but this was my first real attempt to understand it. The book itself is clear and direct, interleaving traditional content with an ongoing fictional narrative that motivates and reinforces the ideas. I found many connections to my research and teaching interests. In this blog post, I will share a few findings from my notes and reflections.

Estimates

One of the foundational principles of the book is fairly simple but not something I had considered before: an estimate communicates the peak of a probability distribution. For example, if I estimate a task to take two hours, I am saying that the most likely case is two hours, but it could take as little as zero or negligible time, but it could also take a low probability that it takes forever. The cumulative probability is the area under the curve. From this, we can conclude that the probability of being late is much higher than of being early.

Duarte classifies estimates as waste as defined by a Lean perspective, that more of it will not make the product better from a user's point of view. It clarifies that "no estimates" isn't a goal but a vision: estimates won't be axiomatically removed but minimized. The discussion of waste got me thinking about QA testing in games, a point I will return to below.

Managing time, scope, cost, and quality

I regularly talk to my students about how cost, quality, and scope are the three levers we can control in project management. Since our work is constrained by the semester's schedule, I point out that we cannot shift cost; that is, we cannot simply add more time to the end of the semester to be able to get our projects done. I also argue that quality is non-negotiable: the point of undergraduate education is to learn how to work well, so sacrificing that is against the telos of the endeavor. Therefore, the only manipulable lever is scope. This perspective seems to help students understand why we focus on user story analysis, prioritizing based on features that add value to the users. 

I wish I could remember where I encountered that heuristic since it is distinct from a similar concept that dominates Web searches: the project management triangle, which is also known as the triple-constraint model or the iron triangle. This model explains how the constraints of scope, cost, and time are connected such that cutting one without changing the others will result in a loss of quality. Duarte uses this model in his book to draw a distinction between value-driven and scope-driven projects. Traditional management approaches are scope-driven, where the scope is fixed and so cost and time are unbounded. Scope-driven projects instead fix time and cost, leaving scope flexible, which leads to the approach of delivering the most value first. This is a standard agile perspective, but I previously didn't have the nomenclature of "value-driven" and "scope-driven," perhaps in part because in my academic environment I rely on that alternative model described above.

Reducing variability in throughput

In Chapter 3, Duarte provides suggestions for techniques to reduce the variable in throughput for a development team. I have used many of them before in mentoring student teams. These include using stable, cross-functional teams teams; having clearly defined priorities; not passing defects down the line; standardizing and automating when possible; freezing scope within iterations; and protecting the team from outside interruptions. He also suggests reducing dependencies so that people can work on one thing at a time. This got me thinking about how often my teams end up with coupled user stories, such that completing one requires work on another. Creating independent user stories comes up more than once in the book, and it's something I can watch for opportunities to practice and teach.

Duarte points out that good requirements must allow measuring progress early and often. They must also be flexible enough to determine which aspects of a system need to be implemented now and which can be built later, after the system is better understood. 

This leads to Duarte's conclusion that the only real measurement of progress in software development is Running Tested Stories. Anything else is ambiguous or unreliable. Teams can be managed toward consistent throughput by ensuring that there are no large stories (no larger than half an iteration), several independent stories can be completed in an iteration, and the distribution of story size stays about the same throughout.

The book references a 2003 article by Bill Wake about the "INVEST" acronym, which I had not seen before. Wake describes how user stories need to be Independent, Negotiable, Valuable, Estimable, Small, and Testable. "Negotiable" here means that they deal with the essence and not the details: they are not contracts about technical details. Wake's definition of "Small" is between half a day's effort and a day's effort. Duarte adapts "Estimable" to be Essential, which is sensible given his specialization. He includes the term blink estimation, which he attributes to Angel Mednilla and was new to me. The idea is that one makes a snap judgement about whether a story fits within two weeks or not, and that this blink estimation is usually all that is needed. Regardless of which expansion I use, INVEST may be a helpful heuristic to give to teams who are breaking down a big problem such as a game design into smaller, valuable pieces.

Planning the details just in time

I started using Scrum with multidisciplinary undergraduate game development teams many years ago, and it has been a valuable practice. I was usually the Product Owner, responsible articulating the work as user stories and prioritizing the backlog. Teams pulled stories from the Product Backlog to the Sprint Backlog during our planning meetings, as per traditional Scrum. When my teams found that a one-dimensional Product Backlog makes it hard to see the big picture, we adopted Story Maps, which ameliorated the problems. Although we tracked each Sprint's progress using burndown charts, I never bothered to compute velocity. Teams tended to get a good sense of how much they could do in two weeks by around week ten, and since I was in charge of the backlog, I could cut scope to fit into the time remaining.

My preference for agility caused some friction then when trying to apply Richard Lemarchand's Playful Production Process with my last two cohorts of game development students. Although Lemarchand calls for concentric development, he has relatively little to say about how to implement that. More importantly, his approach for each phase of production is to start by enumerating all the work that is to be done, estimating how long it will take, and then moving toward that goal. A careful reader will recognize this as scope-driven management, and a cultural observer will note that the games industry is beset by death marches and crunch. 

Duarte's alternative is rooted in agile principles: plan the details of the imminent iteration, getting them into user stories that can be completed in a day or two, and let the future work remain coarse-grained epic stories. He suggests not planning more than about two months' worth of work due to how much will be learned about the system in that time. 

This caused some stress for me since it was quite counter to one of my ongoing research projects. I have been thinking about how to combine some of Lemarchand's ideas with some ideas I took from Allen Holub's #NoEstimates talk. One of Holub's primary arguments is that we can simplify our planning, and get equivalent results, by counting each user story as a single unit of work. I have been investigating the differences between tracking work items as single units versus tracking estimated hours remaining. For example, consider these two perspectives from the end of a team's alpha phase of production.

These are two perspectives of the same period of time, an Alpha phase that lasted about three months. The first shows the number of stories in the backlog and the second shows the total estimated number of hours. The top chart shows how the team cut a significant number of features around a third of the way through Alpha. For the next third, they added stories at about the same rate as they completed them, demonstrating how they were working to reshape the project based on the initial overestimate. We set up a nifty toolchain for tracking these data in realtime using a combination of Hacknplan and Google Sheets. I even gave a workshop about this at GDEX a few months ago. But the whole thing hinges on having those planning data at the end of August for a milestone that's coming up in early December.

Duarte suggests a radically different model. Break down the problem so that the stories for the current sprint are independent and small (taking no more than half a sprint to complete). Track how many of those the team can get done in an iteration. Do that for a few iterations, and you have a good sense of how much work the team can accomplish in future iterations, which lets you control scope. More specifically, you can measure a team's User Story Velocity and its Feature Velocity, where "Feature" here is elsewhere called an epic story or an activity. 

I like the sound of that. It was clear from watching student teams try to estimate an entire Alpha phase that they knew it was shoddy. Worse, for some of them, it planted a seed in their mind that they already knew enough about their projects to plan the whole phase when, in fact, they had yet to find the fun. Switching to the #NoEstimates approach would require me to supplement or replace Lemarchand's recommendations, including new ways of using project management tools.

The medium is the message

Incorporating #NoEstimates would also mean rethinking the relationship between the developers and the artists. When I was mentoring single-semester projects with a small art team, there was seldom any trouble: artists moved fluidly from concept art, sketches, and low-fidelity assets toward production-quality assets as the semester moved on. When artists struggled to match the iterative flow of the programmers, we adopted swimlanes as recommended by Clinton Keith

It wasn't clear to me how to scale this up to teams where half may be artists. I reached out to Duarte himself, and he was kind enough to talk with me about my questions. We had a fruitful discussion, and he helped me see something that I didn't understand before: there is a whole category of practices that, fundamentally, are symptoms of a failure to regularly integrate. Swimlanes are one example, but there are countless more, as overt as separate physical locations and as mundane as job titles and org charts. If we consider that the running tested story is the only way to measure progress, then anything that does not support that is potentially distracting from it. It is the kind of observation that would make Marshall MacLuhan smile: the presence of a swimlane says more about the team than anything in the swimlane itself.

Using social complexity to determine tactics

Buying the book also grants access to a keynote presentation that Duarte gave some years ago. It's an excellent talk and a good complement to the book. One particular element jumped off the screen and into my notebook, and that is Duarte's matrix for dealing with user stories. It deals with the problem of Social Complexity, which can be summarized as "the number of people in the organization you have to talk to about it." Here is a quick reproduction from his talk:


This captures something I have tried to express to many teams but failed to capture so clearly. It relates to the four conclusions of his presentation:
  1. Predict progress with #NoEstimates
  2. Break things down by value, not effort
  3. Agree on meaning with social and technical complexity, reducing risk
  4. Use RIDICULOUSLY SHORT timeboxes
He points out that if you can only do one of these, do the fourth one, since it is the essential practice from which the others derive.

Closing thoughts

I taught the first two cohorts of the game production sequence, but I am stepping away for the third one. There are a few reasons behind this, but primarily it's so that my new colleague has an opportunity to try his hand at it. I expect to be back in the saddle with the fourth cohort, who will start in Spring 2026. Writing up these notes took much longer than I expected, especially as I began to reflect on the substantial differences between what I have done in the past, what I did following Lemarchand, and what I might like to do in the future. For now, I need to put down this line of inquiry, but formalizing these notes gives me a point of entry when I need to refresh myself on these topics.

In the meantime, if you have thoughts, feedback, stories, or reflections, please feel free to share them in the comments.

Saturday, February 1, 2025

The Goal of Higher Education: Remarks at the BSU College of Sciences and Humanities Dean's Honor Reception

By virtue of receiving the inaugural Teacher of the Year award from the College of Sciences and Humanities, I was invited to give some remarks at the Dean's Honor Reception. The reception is later this morning, and these are the remarks I intend to give. I will be working from notes, not this written form, so the precise delivery will certainly vary, but these are the main points. Also, the original title for the talk was "The End of Higher Education," which was a pun on the two meanings of "end," but it was softened to "The Goal of Higher Education."

EDIT (Feb 2): The talk was well received. I have edited my original post to contain a few turns of phrase that I used spontaneously at the time.

In preparation for this talk, I read through two of the notebooks I keep whenever I read a book. It was purely delightful: it's like someone wrote a book that contained only things that I find fascinating, inspiring, or challenging. I hope that you also keep notes when you read. It will bring you joy to read them later.

I searched my notebooks for an answer to the question, "What is the goal of higher education?" I am sure you have your own answers, perhaps dealing with careers or impact on the world. In the Republic, Plato asserts that the goal of higher education is to love what is beautiful. I think he's right. 

You might ask, "How do we come to recognize beauty?" I put it to you that beauty is already there, that it's a transcendental property of the world around us. The question then becomes, "How do we fail to recognize beauty?"

In The Lord of the Rings, Gollum was not free to recognize beauty. Gollum could only see the One Ring. Because he was distracted by this created thing, he was blind to the beauty around him. He elevated this created thing beyond its station, and this led to his downfall. Tolkien scholar Joseph Pierce talks about how we can all become Gollumized. We can become so distracted by things that we fail to see beauty.

German philosopher Martin Heidegger wrote that we are not free if we believe that technology is morally neutral. Heidegger recognized that humanity has always used technology, but he said that we are not free if we think of this technology as neutral. Take the humble hammer as an example. With a hammer, I can pound nails, but remember the old saying, "When all you have is a hammer, every problem looks like a nail." It's true. The hammer has moral agency. That is to say, the hammer affects the moral decision space that you are in. With a hammer in hand, striking things with it becomes an option. The hammer has moral agency. How much more so the smart phone?

Canvas is a technology. Canvas affects your moral decision space. Canvas color codes your grades so that you "know how well you are doing." Canvas gives you notifications, so that you stop what you are doing and pay attention to it. Canvas gives you confetti when you turn in work on time. Beware. Larry Muller, in his brilliant book The Tyranny of Metrics, writes about how the calculative is opposed to the imaginative. 

How is one to recognize beauty while tempering the calculative? I have time to share with you three stories.

My friend Dannie was an undergraduate architecture major here back in the 1990s. He asked one of his professors, "How am I doing in this course?" expecting a quantitative answer. The professor took out a piece of paper and, along a line, drew an egg, a tadpole, and then a frog. He pointed to a spot on the line and said, "You're about here." Now that is a midsemester grade!

Last weekend, I ran Global Game Jam here on campus. This is an event where people get together and, in 48 hours, create original videogames. We had over 40 people attend, mostly Ball State students but also students from other places as well as community members. There was no judging, there were no grades, there were no prizes, there was no competition, yet at the end, we had made eight original videogames, which are now available to the world for free. We made them for the joy of creating them and the pleasure of sharing them. This event is a global event, and all together, this community created over 11,000 games, just for the sake of beauty.

I regularly teach a required sophomore-level programming class. In this class, I use a system called "achievements", inspired by video games, that let students earn course credit by doing things that are outside the normal course expectations. One of the options is called, "Detox," and it requires a student to go 24 hours without looking at a screen. Every semester, a few students try it, and the beauty of their essay responses would make you weep. Students have told me how they walked across campus and really heard the birdsong for the first time. Others write about how they reflect on their life, how they got to where they are, and their hopes and dreams. My favorite story is of a student who, instead of doomscrolling on Instagram, took her grandmother for coffee. There is nothing better than that.

I put it to you that beauty is all around us—well, unless you work in the Robert Bell Building, like I do, but we're doing our best. 

Look at the people on stage here. They are beautiful. Look at the people sitting next to you or behind you. Really! Do it! They are beautiful.

Speaking for myself, I don't care what grades you get. I want you to find your One Ring—because we all have them—cast it into the fire, and then join me in life's great adventure of loving beauty.

Wednesday, January 1, 2025

The Games of 2024

It's time again for my annual reflection on the board games I played in 2024. This year, I logged 337 board game plays, which is 71 fewer than last year's 408. I played 56 different games this year, which is also less variety than last year. The year featured several campaign games, which tend to be longer anyway, but we also had fewer nights for games. The boys have gotten involved in more activities as they have gotten older. I am not sure what this year will bring, with my oldest son likely to move out to college in the Fall semester. He's the one I have logged the most plays with, clocking in at 2,870 plays since I began logging. I am eager for what the future holds, but I'm not sure we'll ever see the same number of plays as we had a few years ago.

Here are the games I played the most this year:

  • Bang! The Dice Game (19)
  • ISS Vanguard (19)
  • Ark Nova (18)
  • Dungeons & Dragons: Temple of Elemental Evil (18)
  • My Island (15)
  • Res Arcana (14)
  • The 7th Citadel (13)
  • Crokinole (13)
  • Thunderstone Quest (13)
  • Oathsworn: Into the Deepwood (12)
  • The Castles of Mad King Ludwig (11)
  • Colt Express (11)
  • Cat in the Box (10)
  • Everdell (9)
  • Heat: Pedal to the Metal (9)
I usually just report on the "dimes"—the games I played ten or more times during the year. However, I included those last two because they provide an interesting story. Many of those games of Everdell were with my youngest son, who seems to have the same kind of love of systems-based games as my oldest son and I do. He and I end up together on Monday nights when the other boys are at Scouts. Seeing him grow into a more mature player, we started by exploring two-player Everdell. He's not yet ten and still misses some opportunities, but he's a chipper kid at the table, always happy to learn something and never bitter when things don't go his way. We played a bunch of Res Arcana together, which is light enough to do two or three rounds in an evening. Last Fall, we moved into Temple of Elemental Evil, the campaign for which I have played twice now. Seriously, I have played Temple of Elemental Evil 46 times, which is a lot more than anyone should play it. At least the miniatures I painted years ago still look great, and I think maybe I can finally put the game to rest. Just before Christmas, we moved on to  a cooperative campaign in Imperial Assault, and I suspect we might move on to other similar games soon.

Heat: Pedal to the Medal was one we just got for Christmas after playing once with my brother at a family get-together. It's proven to be a big hit with my family, easily accommodating six players without getting too slow. 

It's a little sad for me to see Bang! The Dice Game in the number one slot because I don't really care for it. However, it travels well, so we brought it with on a family trip, and when it's all you have and the hotel has game-sized tables, it's what you play. Castles of Mad King Ludwig is in there in part because I splurged this year, replacing my original edition with the nice new second printing. There are a few aspects of the visual redesign that I do not care for, but it's worth it for the ease of setup and teardown that are made possible by the excellent packaging.

As I mentioned in the introduction, there were a lot of campaign games this year. My two oldest sons, my wife, and I just wrapped up The 7th Citadel before Christmas after having set it aside months earlier. It is a strong sequel to The 7th Continent, which the four of us also played, and I look forward to trying the other threat (campaign setting) that came with the core box. My two older boys and I played Oathsworn and ISS Vanguard together. Oathsworn was fun but had a few disappointing narrative beats; it's in my closet should we want to go back to it, but I have mixed feelings about how likely that will be. On the other hand, we are looking forward to getting back into ISS Vanguard once we paint the rest of the miniatures from its expansion.

I am shocked to see that we logged only eight plays of games in the Clank series this year. There were many nights where somebody suggested it and I turned us toward something else, feeling a bit Clanked out. It is the series that my family has played the most, by far. Perhaps it is the relative dearth of Clank that has me so excited to get into Clank Legacy 2, for which my boys and I have already painted our figures and are just waiting the chance to get to the table.

During 2024, my games h-index grew from 33 to 35, meaning that there are now 35 games that I have played 35 times or more. I would have to play a lot of games in order for another play of Temple of Elemental Evil to shift that. My player h-index remained at 19, which is not surprising since I have mostly continued to play with my family. Indeed, it's hard to imagine what would need to happen to change that number dramatically. 

Although it was a record low for board games this year, it was also a record high for tabletop role-playing games. We played Microscope, Grok?!, and Blades in the Dark, and we played five games of a campaign in Scum and Villainy. These were all a lot of fun, and I hope that we can do some more tabletop RPGs in the new year.

That's the summary of my tabletop games for 2024. Thanks for reading, and please feel free to share some of your favorite gaming memories of the year with me—either in the comments or over a cup of coffee.

Monday, December 30, 2024

Repo Deleter: A utility to batch-delete repositories from GitHub organizations

TL;DR: I created a tool to help batch-delete repositories from GitHub organizations. You can find the source repository at https://github.com/doctor-g/repo-deleter-flutter.

I have a few GitHub organizations that I re-use every semester. I set them up as through an academic account years ago. At the start of the semester, I add all my current students, and they push their work to repositories within this organization. This way, not only can I easily access students' work, they can also help each other out. For example, we can do peer code reviews in class across organizations without requiring anyone to use public repositories. I can also share all my sample code for the semester in the organization, and only those in the organization can get to it.

The downside to this approach is that the organizations require significant cleanup after a semester ends. Although I always instruct students how to move their work from the class organization into their own accounts, there are inevitably dozens of repositories left unattended. Deleting repositories manually through GitHub's web interface is mindless and tedious. There are a few online tools that claim to support batch-deletion of repositories, but I never had great luck with them.

After selecting a repository, going to its settings, scrolling to the bottom, selecting the delete option, and confirming that you want to delete the repository, then you also get to type in its name for super extra confirmation. Doing it once is not bad. Doing it fifty times is awful.

To make my life a little easier, two years ago, I created a little command-line tool to manage the process. I created it in Dart using the github package, which wraps GitHub's Web API. This little tool required you to go into the source code to modify the organization name and any special rules about which repositories to list. For example, I have had semesters where students had to name their projects in a pattern "PX-username" where X is the project number and username is a BSU username. The tool then had two different paths, which I would comment out alternately: the first printed the names of the repositories that it would delete, and the other would delete those repositories. It was not a great utility, and it needed manual cleanup for the repositories that didn't follow the patterns, but it did save me some manual work on GitHub's Web interface.

After a couple of semesters of dealing with that tool, I decided it was time to make something better, and so today I released Repo Deleter at https://github.com/doctor-g/repo-deleter-flutter. This new version includes a graphical user-interface powered by Flutter. Like its predecessor, it requires using a GitHub personal access token with the appropriate permissions; the details for this are given in the project README file. With the proper credentials in place, Repo Deleter allows you to select one of your GitHub organizations. Then it shows you all of the organization's repositories, both public and private. The user can select any number of these, and then, with the click of a button, delete them.

Repo Deleter screenshot (student names blurred out)

In addition to solving a proximal problem, there are two technical aspects to this project that I found rewarding. The first and most important one is that this application uses the bloc pattern. I mentioned my experiments with bloc as part of my tinkering with Dart and Flutter for creating tabletop-inspired videogames. That work is hidden away in a handful of private repositories, and because nothing became of them, it was hard to assess my own understanding of the pattern. I used bloc for the Repo Deleter as well, and it felt quite comfortable. I wonder how a bloc expert would critique the particular states and events that I used, but as a proof of concept, it definitely works. I suppose the proof of the pudding may be in six months when I have to open the project again and inevitably want to add a feature or two. Will I be able to read and make sense out of the code? Time will tell.

The less important but still interest aspect of Repo Deleter is that it's the first place where I used a formal logging framework in Flutter. It is not fancy: it's just the stock logging package, and I'm only echoing logs to a print command. Still, it eliminates the compiler warnings I had from the handful of print statements I had peppered in as ad hoc debugging aids.

One thing I would like to have done, but did not, was to have developed it via TDD. My early prototypes used the github package libraries throughout the application, with no adapter layers. Isolating the data layer from the logic layer would have facilitated testing layers without making actual network requests. I had idle hopes of using this as an example for my students, but in the end, I made the decision to just build on a working prototype rather than engineer something more robust.

Right now, the repository only has Linux platform support, but its easy enough to add more using the Flutter tools. I simply run it from Android Studio because it's easy to set an environment variable for a specific run configuration.

Tuesday, December 24, 2024

Experimenting with software architectures for video games inspired by tabletop roleplaying games

I have been tinkering the last several months with a videogame prototype inspired by some of the tabletop roleplaying games that my boys and I have been playing. Similar to how my game The Endless Storm of Dagger Mountain explored PbtA mechanisms, I've wondered about the strengths and weaknesses of interpreting Forged in the Dark systems into a text-based videogame. Last week, once I put away most of the work of the Fall semester, I was able to dive more deeply into work on a prototype. I felt really good about it until a few days ago, when I came to doubt—not for the first time—some decisions I had made in the software architecture. So, in this, my sixth December blog post, I want to unpack some of the considerations that I have put into these efforts so that I might stop programming in circles.

I decided to use Dart and Flutter for the game. I teach with Dart and Flutter in CS222 because I legitimately enjoy the technology stack. I am competent with them but would not call myself an expert. I have only built two public systems with these tools: my Thunderstone Quest Randomizer and a little timer utility to help with Promotion and Tenure Committee meetings. The former is much larger than the latter, and if I were to build it again, I would do it differently, but I keep maintaining it for myself and other fans of the card game.

I appreciate Dart's static typing, named parameters, pattern matching, and sealed classes, and Flutter's declarative approach can simplify otherwise complex UI logic. Something else that draws me toward Dart and Flutter, besides the elegance of the language and framework, is the inspirational work of Filip Hracek. His Knights of San Francisco is similar to some of the experimentation I have been doing, and his writings about Flutter's performance and the ethics of software design are interesting and insightful. I spent most of a summer working through his open source egamebook repository, trying to understand how a serious Dart programmer uses the language to accomplish his game design goals.

However, the choice to use Dart and Flutter over Godot Engine is never fully settled in my heart of hearts. Whereas Dart is dreamy for game logic, Godot Engine makes it dead simple to create juicy bits of design. Its AnimationPlayer is brilliant for little effects, whereas setting up an AnimationBuilder  in Flutter takes a whole lot of typing. Godot's node-based approach means that individual parts of the program can easily be run in isolation and tested, and tool scripts allow customization of the editor itself. Unfortunately, GDScript has no refactoring support, and this is a significant impediment to a test-driven approach: changing my mind about a name or a design choice in GDScript has nasty rippling effects. Type hints in GDScript are invaluable, but they are no replacement for real static typing. Also, creating simple data structures in GDScript is much more arduous than in Dart. All this is to say that I'm dealing with game logic in GDScript, I find myself thinking, "This would be easier in Dart," and when I'm working on simple UI tweaks in Flutter, I think, "This would be easier in Godot Engine." I know that there's no silver bullet, yet I cannot silence the little fear that maybe I chose the wrong environment for this project.

State management is at the heart of any game software. The official Flutter documentation explains the basics, and the list of advanced options makes it clear that there is not one right way. I have long been intrigued by Bloc and decided to try using it as a state management solution for my experimentation. I spent a lot of time the past several weeks reading the official tutorials, and I believe I have a good sense of the system now. Crucial to Bloc is a separation of concerns: Flutter widgets provide a humble view of the UI state, which is managed in a bloc (business logic component), and this is separate yet from the domain layer. For Internet-connected apps, the domain layer involves a repository layer, but for my purposes, it was simple enough to roll these together. For my first Bloc-powered prototype, I followed the tutorials' approach and used equatable to generate some of the boilerplate required. Searching the Web reminded me of freezed, and once I understood how Bloc and equatable worked together, I happily switched to freezed for its excellent code generation support. Using the bloc and freezed snippets plugins for Android Studio is practically necessity here. Once my experimental coding was done, I felt like I could move forward with a more rigorous TDD approach, since now I could think about the features separately from the underlying architecture. I was inspired as well by Dave Farley's commentary about how a layered approach to unit tests means that developers can change their minds about implementation strategies without breaking all of their tests. Knowing that I would continue to change my mind as I explored the design space, I moved forward.

One of my early experiments explored whether I might just consider the whole game to be "business logic" that belongs in the bloc. That is, I considered cutting out the separate domain layer and putting all the game logic in the bloc. This was of limited viability as I quickly ran into two problems. One was that I found myself having to put game logic in the Flutter widgets since they could not simply read UI state from the bloc. This was clearly counter to the spirit of the architecture. The other problem came up when dealing with threat rolls. In the Deep Cuts rules expansion to Blades in the Dark, players roll dice and assign them to consequences, which are negative effects like taking damage or losing items. Assigning dice to consequences mitigates their impact. It struck me that assigning dice was purely UI state and not game state. That is, a player might experiment with different assignments of dice to consequences, but nothing in the game domain model actually changes until those arrangements are committed.

Armed with this realization, I extracted the game rules into their own module, and I gave this module its own immutable state. The state could be modified by a few public methods that were called by either the bloc or my unit tests. For example, the method commitDice took the assignment of dice to consequences and computed the resulting change in the game world state. This also let me separate that state from the widgets entirely: whereas I had been sending the world state to the Flutter widgets, now I could add a layer of abstraction related to UI state. For example, rather than sending the game world state to the view from the bloc, I could send only those details that mattered for the state, such as which buttons were enabled, or what text should be shown in a label. This meant I could have tests on the bloc and trust that a humble view would work as anticipated.

My pleasure at this transition made it even more disheartening when, earlier this week, I sat down to add a new feature and realized I had programmed myself into a corner. After assigning dice to consequences and before their effects are committed to the game world, a player can also opt to "push themselves" to mitigate consequences. This results in another dice roll whose outcome determines how much stress the pushing causes to the character. It means that between the committing of dice assignments and the final changes to the state is another step in which players might push themselves to alter the outcomes. However, this means that the changes to the world state might be coming from unmitigated consequences, dice-assignment-based mitigation, or pushing-based mitigation. The game world simply needs to change, but a good player experience in the UI should distinguish among these. 

A fair criticism at this point would be that I should have foreseen that pushing would require a more robust handling of actions and consequences. In fact, I was aware of this, but I was also trying to push the limits of narrow slicing and Farley-style TDD/BDD combined with emergent architecture. I wanted to complete a well-factored feature (in this case, dice assignment) before increasing the complexity by adding a new feature. Despite my efforts, I can see now that revising the core action resolution system will have significant ripple effects on my test layers.

Just before exploring the pushing mechanism, I had stubbed in an approach for dealing with the outcome of progress clock expiration. I needed to attach represent arbitrary game effects to a clock, and so I sketched in a Command pattern. In particular, I encapsulated the idea that the main clock would end the game by creating an EndGameEffect and attaching that to the clock. I used freezed for the Command objects to facilitate future serialization. With this design pattern fresh in my mind, as I faced the bigger problem of state management, I found myself thinking I should be queuing game state change events rather than just making world changes. This would work, but it also made me realize that all I really wanted was to give a command to the world like "mark two stress on the character and reduce the effect of this consequence." That sounds like a couple of method calls to me.

Casey Yano of MegaCrit (Slay the Spirereflected on his company's evaluation of Godot Engine following the colossal leadership failures at Unity. The sample code he shares uses a combination of a stateful model with asynchronous invocations: await FighterCmd.GainHp(owner, 2, owner). Clearly, he's going through a presentation layer that implements all the fundamental game verbs as asynchronous calls, giving these methods the responsibility to both change the model and display the state change to the user. By contrast, Flutter's declarative approach leans toward having the UI detect a change to the model and then animate the feedback. The latter gives a clear separation of layers that facilitates testing. In practice, though, the game's UI and the game's logic are tightly coupled, and now the code for a feature like "update the health bar when taking damage" is split into disparate places.

In The Endless Storm of Dagger Mountain, which was written in Godot Engine, I managed the state and UI as in Yano's example, writing code like await adventure.show_text('It was a dark and stormy night...'). The use of await makes the call a coroutine, but Godot has no other syntactic indication that a function should be called this way. This means that forgetting a single await will break a chain of intended asynchronous calls, and that's exactly what led to a post-jam patch for that project. I didn't have exhaustive test coverage, nor did I prioritize running through all paths of the game. The result was that a missing await call made at least one of the paths completely lock up for the players. To me, this reflects a weakness of the GDScript language design; by contrast, Dart's use of async, await, and futures make it clear at compile time which invocations are asynchronous and which are not. (Incidentally, Yano is using C# instead of GDScript. I did experiment with GDScript's C# bindings, but a few things held me back from using them: many of the strengths of GDScript, such as elegant signal management, are lost in C#; there is a lot more boilerplate required; there is no Web export for Godot 4.x when using C#; and Rider is so much better than the alternatives, but because is justifiably commercial, it would mean losing money and time to my experiments.)

I had hoped that by this point in my prototyping, I would have a minimal interactive system to which I could focus on adding content and visual flourish. Instead, I have several abandoned experimental architectures. This narrative has been my attempt to explain how I got here. I have learned more about some aspects of Flutter and Dart, but I am also holding two paradoxical ideas in my head: Fred Brooks' observation that you should build a system to throw away because you're going to anyway, and the knowledge that the last 10% of a project takes another 90% of the effort. That is, any understanding I claim to have is on a sandy foundation if the project itself has not shipped. Dagger Mountain may have had critical post-launch patches, but at least I understand exactly why. Whether any particular bloc-like or asynchrony-based approach would be better for this other project is still uncertain. I continue to second-guess myself, but I am also hopeful that having written this, I can return to prototyping after the Christmas break with a fresh perspective.


Monday, December 23, 2024

Happy Camper: A December 2024 FamJam Game

TL;DR: Check out our new game, Happy Camper.

On Saturday, my eldest son led the family in a one-day fam jam as part of fulfilling a Scouting merit badge. We had attended the Indy Indies 2024 Showcase the previous night and played the eight games featured there. Playing Hardcore Cottagecorre in particular got the boys talking about wanting to make a single-stick shooter. The older boys had played Vampire Survivors, but the others really just used Harcore Cottagecorree as their genre example. We laid out responsibilities and got to work a little after 8:00AM. We wrapped up work just before 5:00PM, giving us enough time to talk about our experience over dinner and then get out to see Christmas Carol at Muncie Civic Theatre

The result of our work is Happy Camper, which you can play in the browser as long as you have a keyboard and mouse. The game is free software and you can browse the source code on GitHub.

I told my friend a little about our experience yesterday, and he had some questions about what technology we used to create the game. To that end, here are some explanations and links. These are all no-cost, free, open source tools.

  • We used Godot game engine to build the game, including its GDScript programming language.
  • The art was made using Piskel, which is perfect for the pixel art aesthetic.
  • The music was created using LMMS.
  • The sound effects were recorded using Audacity.
My wife commented on how much smoother these jams go now that everyone has more experience using these various tools. There are two things that I myself need to remember from the experience. One is that I need to talk to the younger boys about how to think like a musician when approaching LMMS. Both have a tendency to try to "make it work" rather than trying to model how composition is approached, doing things like aligning notes to beats and measures. Admittedly, the piano roll interface in LMMS is less clear here than staff, and so maybe I need to look into showing them something like MuseScore or Rosegarden, both of which give you access to a traditional notation editor. 

The other observation I had was with respect to communication, internally and through the medium of game design. My second son took the task of designing a series of weapons that would work well together. He had a list that he considered finished, but I encouraged him to write them up in a way that they could be used as a specification, together with illustrations of how they would work. He did this, but he still described all seven weapons in half a small sketch book page, cramming them all together and including indecipherable drawings of the design intention. We talked briefly about how the task was not merely to inscribe his ideas onto a page, but to do so in a way that invited others to comment, edit, and learn from them. That is, there had to be more room for annotation, more space for people to read the diagrams together. He had succeeded at the "invention" part but was weak on the "communication" part (see Cockburn's argument that software development is a cooperative game of invention and communication). It's all part of the development of teamwork and game design, and I'm glad we had a chance to talk about it. I would like to have an opportunity soon to give him another similar task and see if he can apply our conversation.

This relates to a similar story from later in the day, when he and his elder brother were trying to figure out what to work on before we shipped the game. They seemed blind to the fact that, in its current state, the game was unlearnable and not fun to anyone. As makers of the game, they could play for about ten seconds, and there was no scaffolding for anyone who didn't know all the implementation details. I pushed them on this point, that unless we were making the game only for ourselves, we had to think about the perspective of new users—people who didn't know what the enemies or weapons looked like, where they would come from, or how these systems worked together. Giving them that charge, I left them for about an hour. When they pushed their changes, the game was much more enjoyable, with better balance and escalation without needing massive changes to the implementation. Of course, if we were not in a one-day jam, we could have done even more work here, but within our constraints, I think they did a great job, and I told them so. It was only later that I realized that this was in the same class of feedback as I had given my son earlier: to recognize that "done" needs to be considered from the perspective of the consumer, whether that is the reader of a design document or the player of a game.

It had been almost a year since our last Fam Jam. Some of us will certainly participate in Global Game Jam in January, but I hope it's not another year before we get the whole family involved. I'm not sure what will happen to our Fam Jam tradition once the boys start leaving the house. 

Thursday, December 12, 2024

Reflecting on CS315, Fall 2024 Edition

As described in my course revision post in June, the overall structure of CS315 Game Programming was unchanged from previous semesters: half the semester was spent on weekly projects designed to build skills and confidence, and half the semester was spent on larger projects. 

The most significant change was in how those weekly assignments were evaluated. The past several years, I have used checklist-based evaluation, but I was hoping to find a fix for the problem of students doing the checklists wrong. This takes something simple and makes it into more work for me than if it was just a point-based rubric. Unfortunately, the strategy I used did not make things any simpler. Instead of checklists, I gave students a list of the criteria that needed to be met in order to be satisfactory. Their work then was assessed as Satisfactory, Needs Minor Revision (fix within 48 hours), or New Attempt Required. New attempts could be made at the rate of one per week, as I've done for years in most of my non-studio courses. I ran into a bit of the same problem as I wrote about yesterday, where Canvas' "Complete/Incomplete" assessment combined with no-credit assignments leads to a bad user experience, but it was not among the dominant frustrations. Those frustrations were two: students not submitting satisfactory work, and students not submitting work.

The first of those is the most disconcerting. As with checklist-based grading, I gave the students the precise criteria on which a submission would be graded. All they had to do was to meet those, and most of them did. Sometimes it took minor revisions or a new attempt or two, but these were no big deal: handling and correcting misconceptions is exactly what the system is supposed to do. The real problem came from students who submitted things that were wrong multiple times after I had told them what was wrong. In a strict reading of the evaluation scheme, this means the work was still simply unsatisfactory, whereas in other schemes (including checklist-based) they might have gotten a D or C for the work. I am still torn on this issue: was the system unfair to students of lower ability or was it the only fair thing to do with them? Put another way, is it better to give a student a C when they still have serious misunderstandings, or is it better to clearly tell them that they should not advance until they understand it? I don't interpret any of the criteria I gave as strictly "A"-level. That is, it did not require excellence to meet those criteria. What it required was rigor

The other problem, of students not resubmitting work that needed to be resubmitted, seems unrelated to the evaluation scheme chosen. Speaking with professors across campus and institutions, this seems to be part of a generational wave of challenges. I have a few hypotheses about root causes, but the point of this blog post is not to opine on that topic.

Some of my early-semester assignments take the form of multi-week projects. For example, the set of assignments involve creating an Angry Birds clone. It is submitted as a series of three assignments with increasing complexity, and the complexity is scaffolded so that someone who has never made a game before can follow along. I had a student in the class this semester who fell behind, and then he wondered if he could just submit the final iteration of that three-week project as long as it showed mastery of each week's content. I ended up declining the request. One of my reasons is that the assignments double as a sort of participation credit. It makes me wonder though if it's worth my separating these things. For example, something I've done in other courses in the past is make it so that the final iteration's grade supercedes earlier ones if it is higher. 

This was the first semester that a colleague offered a different section of CS315 during the same semester. Looking at his students' games, as well as some recent conversations in the game production studio, made me realize that I should probably emphasize the build process more in my section. Rather than simply running their games in the editor, I should ensure that they know how to create an executable or a web build. It's an important skill that's easy to miss, and there's a lot to be learned by seeing the differences between running in the editor and outside of it.

Now that we've grown the number of games-related faculty in my department, there's a chance I may not teach game programming again until 2026. I expect I will come back to these notes around that time. The biggest pedagogic design question I will need to consider is whether to return to checklist-based grading (with its concomitant frustrations) or move to something else, like a simple point distribution.