Tuesday, December 24, 2024

Experimenting with software architectures for video games inspired by tabletop roleplaying games

I have been tinkering the last several months with a videogame prototype inspired by some of the tabletop roleplaying games that my boys and I have been playing. Similar to how my game The Endless Storm of Dagger Mountain explored PbtA mechanisms, I've wondered about the strengths and weaknesses of interpreting Forged in the Dark systems into a text-based videogame. Last week, once I put away most of the work of the Fall semester, I was able to dive more deeply into work on a prototype. I felt really good about it until a few days ago, when I came to doubt—not for the first time—some decisions I had made in the software architecture. So, in this, my sixth December blog post, I want to unpack some of the considerations that I have put into these efforts so that I might stop programming in circles.

I decided to use Dart and Flutter for the game. I teach with Dart and Flutter in CS222 because I legitimately enjoy the technology stack. I am competent with them but would not call myself an expert. I have only built two public systems with these tools: my Thunderstone Quest Randomizer and a little timer utility to help with Promotion and Tenure Committee meetings. The former is much larger than the latter, and if I were to build it again, I would do it differently, but I keep maintaining it for myself and other fans of the card game.

I appreciate Dart's static typing, named parameters, pattern matching, and sealed classes, and Flutter's declarative approach can simplify otherwise complex UI logic. Something else that draws me toward Dart and Flutter, besides the elegance of the language and framework, is the inspirational work of Filip Hracek. His Knights of San Francisco is similar to some of the experimentation I have been doing, and his writings about Flutter's performance and the ethics of software design are interesting and insightful. I spent most of a summer working through his open source egamebook repository, trying to understand how a serious Dart programmer uses the language to accomplish his game design goals.

However, the choice to use Dart and Flutter over Godot Engine is never fully settled in my heart of hearts. Whereas Dart is dreamy for game logic, Godot Engine makes it dead simple to create juicy bits of design. Its AnimationPlayer is brilliant for little effects, whereas setting up an AnimationBuilder  in Flutter takes a whole lot of typing. Godot's node-based approach means that individual parts of the program can easily be run in isolation and tested, and tool scripts allow customization of the editor itself. Unfortunately, GDScript has no refactoring support, and this is a significant impediment to a test-driven approach: changing my mind about a name or a design choice in GDScript has nasty rippling effects. Type hints in GDScript are invaluable, but they are no replacement for real static typing. Also, creating simple data structures in GDScript is much more arduous than in Dart. All this is to say that I'm dealing with game logic in GDScript, I find myself thinking, "This would be easier in Dart," and when I'm working on simple UI tweaks in Flutter, I think, "This would be easier in Godot Engine." I know that there's no silver bullet, yet I cannot silence the little fear that maybe I chose the wrong environment for this project.

State management is at the heart of any game software. The official Flutter documentation explains the basics, and the list of advanced options makes it clear that there is not one right way. I have long been intrigued by Bloc and decided to try using it as a state management solution for my experimentation. I spent a lot of time the past several weeks reading the official tutorials, and I believe I have a good sense of the system now. Crucial to Bloc is a separation of concerns: Flutter widgets provide a humble view of the UI state, which is managed in a bloc (business logic component), and this is separate yet from the domain layer. For Internet-connected apps, the domain layer involves a repository layer, but for my purposes, it was simple enough to roll these together. For my first Bloc-powered prototype, I followed the tutorials' approach and used equatable to generate some of the boilerplate required. Searching the Web reminded me of freezed, and once I understood how Bloc and equatable worked together, I happily switched to freezed for its excellent code generation support. Using the bloc and freezed snippets plugins for Android Studio is practically necessity here. Once my experimental coding was done, I felt like I could move forward with a more rigorous TDD approach, since now I could think about the features separately from the underlying architecture. I was inspired as well by Dave Farley's commentary about how a layered approach to unit tests means that developers can change their minds about implementation strategies without breaking all of their tests. Knowing that I would continue to change my mind as I explored the design space, I moved forward.

One of my early experiments explored whether I might just consider the whole game to be "business logic" that belongs in the bloc. That is, I considered cutting out the separate domain layer and putting all the game logic in the bloc. This was of limited viability as I quickly ran into two problems. One was that I found myself having to put game logic in the Flutter widgets since they could not simply read UI state from the bloc. This was clearly counter to the spirit of the architecture. The other problem came up when dealing with threat rolls. In the Deep Cuts rules expansion to Blades in the Dark, players roll dice and assign them to consequences, which are negative effects like taking damage or losing items. Assigning dice to consequences mitigates their impact. It struck me that assigning dice was purely UI state and not game state. That is, a player might experiment with different assignments of dice to consequences, but nothing in the game domain model actually changes until those arrangements are committed.

Armed with this realization, I extracted the game rules into their own module, and I gave this module its own immutable state. The state could be modified by a few public methods that were called by either the bloc or my unit tests. For example, the method commitDice took the assignment of dice to consequences and computed the resulting change in the game world state. This also let me separate that state from the widgets entirely: whereas I had been sending the world state to the Flutter widgets, now I could add a layer of abstraction related to UI state. For example, rather than sending the game world state to the view from the bloc, I could send only those details that mattered for the state, such as which buttons were enabled, or what text should be shown in a label. This meant I could have tests on the bloc and trust that a humble view would work as anticipated.

My pleasure at this transition made it even more disheartening when, earlier this week, I sat down to add a new feature and realized I had programmed myself into a corner. After assigning dice to consequences and before their effects are committed to the game world, a player can also opt to "push themselves" to mitigate consequences. This results in another dice roll whose outcome determines how much stress the pushing causes to the character. It means that between the committing of dice assignments and the final changes to the state is another step in which players might push themselves to alter the outcomes. However, this means that the changes to the world state might be coming from unmitigated consequences, dice-assignment-based mitigation, or pushing-based mitigation. The game world simply needs to change, but a good player experience in the UI should distinguish among these. 

A fair criticism at this point would be that I should have foreseen that pushing would require a more robust handling of actions and consequences. In fact, I was aware of this, but I was also trying to push the limits of narrow slicing and Farley-style TDD/BDD combined with emergent architecture. I wanted to complete a well-factored feature (in this case, dice assignment) before increasing the complexity by adding a new feature. Despite my efforts, I can see now that revising the core action resolution system will have significant ripple effects on my test layers.

Just before exploring the pushing mechanism, I had stubbed in an approach for dealing with the outcome of progress clock expiration. I needed to attach represent arbitrary game effects to a clock, and so I sketched in a Command pattern. In particular, I encapsulated the idea that the main clock would end the game by creating an EndGameEffect and attaching that to the clock. I used freezed for the Command objects to facilitate future serialization. With this design pattern fresh in my mind, as I faced the bigger problem of state management, I found myself thinking I should be queuing game state change events rather than just making world changes. This would work, but it also made me realize that all I really wanted was to give a command to the world like "mark two stress on the character and reduce the effect of this consequence." That sounds like a couple of method calls to me.

Casey Yano of MegaCrit (Slay the Spirereflected on his company's evaluation of Godot Engine following the colossal leadership failures at Unity. The sample code he shares uses a combination of a stateful model with asynchronous invocations: await FighterCmd.GainHp(owner, 2, owner). Clearly, he's going through a presentation layer that implements all the fundamental game verbs as asynchronous calls, giving these methods the responsibility to both change the model and display the state change to the user. By contrast, Flutter's declarative approach leans toward having the UI detect a change to the model and then animate the feedback. The latter gives a clear separation of layers that facilitates testing. In practice, though, the game's UI and the game's logic are tightly coupled, and now the code for a feature like "update the health bar when taking damage" is split into disparate places.

In The Endless Storm of Dagger Mountain, which was written in Godot Engine, I managed the state and UI as in Yano's example, writing code like await adventure.show_text('It was a dark and stormy night...'). The use of await makes the call a coroutine, but Godot has no other syntactic indication that a function should be called this way. This means that forgetting a single await will break a chain of intended asynchronous calls, and that's exactly what led to a post-jam patch for that project. I didn't have exhaustive test coverage, nor did I prioritize running through all paths of the game. The result was that a missing await call made at least one of the paths completely lock up for the players. To me, this reflects a weakness of the GDScript language design; by contrast, Dart's use of async, await, and futures make it clear at compile time which invocations are asynchronous and which are not. (Incidentally, Yano is using C# instead of GDScript. I did experiment with GDScript's C# bindings, but a few things held me back from using them: many of the strengths of GDScript, such as elegant signal management, are lost in C#; there is a lot more boilerplate required; there is no Web export for Godot 4.x when using C#; and Rider is so much better than the alternatives, but because is justifiably commercial, it would mean losing money and time to my experiments.)

I had hoped that by this point in my prototyping, I would have a minimal interactive system to which I could focus on adding content and visual flourish. Instead, I have several abandoned experimental architectures. This narrative has been my attempt to explain how I got here. I have learned more about some aspects of Flutter and Dart, but I am also holding two paradoxical ideas in my head: Fred Brooks' observation that you should build a system to throw away because you're going to anyway, and the knowledge that the last 10% of a project takes another 90% of the effort. That is, any understanding I claim to have is on a sandy foundation if the project itself has not shipped. Dagger Mountain may have had critical post-launch patches, but at least I understand exactly why. Whether any particular bloc-like or asynchrony-based approach would be better for this other project is still uncertain. I continue to second-guess myself, but I am also hopeful that having written this, I can return to prototyping after the Christmas break with a fresh perspective.


Monday, December 23, 2024

Happy Camper: A December 2024 FamJam Game

TL;DR: Check out our new game, Happy Camper.

On Saturday, my eldest son led the family in a one-day fam jam as part of fulfilling a Scouting merit badge. We had attended the Indy Indies 2024 Showcase the previous night and played the eight games featured there. Playing Hardcore Cottagecorre in particular got the boys talking about wanting to make a single-stick shooter. The older boys had played Vampire Survivors, but the others really just used Harcore Cottagecorree as their genre example. We laid out responsibilities and got to work a little after 8:00AM. We wrapped up work just before 5:00PM, giving us enough time to talk about our experience over dinner and then get out to see Christmas Carol at Muncie Civic Theatre

The result of our work is Happy Camper, which you can play in the browser as long as you have a keyboard and mouse. The game is free software and you can browse the source code on GitHub.

I told my friend a little about our experience yesterday, and he had some questions about what technology we used to create the game. To that end, here are some explanations and links. These are all no-cost, free, open source tools.

  • We used Godot game engine to build the game, including its GDScript programming language.
  • The art was made using Piskel, which is perfect for the pixel art aesthetic.
  • The music was created using LMMS.
  • The sound effects were recorded using Audacity.
My wife commented on how much smoother these jams go now that everyone has more experience using these various tools. There are two things that I myself need to remember from the experience. One is that I need to talk to the younger boys about how to think like a musician when approaching LMMS. Both have a tendency to try to "make it work" rather than trying to model how composition is approached, doing things like aligning notes to beats and measures. Admittedly, the piano roll interface in LMMS is less clear here than staff, and so maybe I need to look into showing them something like MuseScore or Rosegarden, both of which give you access to a traditional notation editor. 

The other observation I had was with respect to communication, internally and through the medium of game design. My second son took the task of designing a series of weapons that would work well together. He had a list that he considered finished, but I encouraged him to write them up in a way that they could be used as a specification, together with illustrations of how they would work. He did this, but he still described all seven weapons in half a small sketch book page, cramming them all together and including indecipherable drawings of the design intention. We talked briefly about how the task was not merely to inscribe his ideas onto a page, but to do so in a way that invited others to comment, edit, and learn from them. That is, there had to be more room for annotation, more space for people to read the diagrams together. He had succeeded at the "invention" part but was weak on the "communication" part (see Cockburn's argument that software development is a cooperative game of invention and communication). It's all part of the development of teamwork and game design, and I'm glad we had a chance to talk about it. I would like to have an opportunity soon to give him another similar task and see if he can apply our conversation.

This relates to a similar story from later in the day, when he and his elder brother were trying to figure out what to work on before we shipped the game. They seemed blind to the fact that, in its current state, the game was unlearnable and not fun to anyone. As makers of the game, they could play for about ten seconds, and there was no scaffolding for anyone who didn't know all the implementation details. I pushed them on this point, that unless we were making the game only for ourselves, we had to think about the perspective of new users—people who didn't know what the enemies or weapons looked like, where they would come from, or how these systems worked together. Giving them that charge, I left them for about an hour. When they pushed their changes, the game was much more enjoyable, with better balance and escalation without needing massive changes to the implementation. Of course, if we were not in a one-day jam, we could have done even more work here, but within our constraints, I think they did a great job, and I told them so. It was only later that I realized that this was in the same class of feedback as I had given my son earlier: to recognize that "done" needs to be considered from the perspective of the consumer, whether that is the reader of a design document or the player of a game.

It had been almost a year since our last Fam Jam. Some of us will certainly participate in Global Game Jam in January, but I hope it's not another year before we get the whole family involved. I'm not sure what will happen to our Fam Jam tradition once the boys start leaving the house. 

Thursday, December 12, 2024

Reflecting on CS315, Fall 2024 Edition

As described in my course revision post in June, the overall structure of CS315 Game Programming was unchanged from previous semesters: half the semester was spent on weekly projects designed to build skills and confidence, and half the semester was spent on larger projects. 

The most significant change was in how those weekly assignments were evaluated. The past several years, I have used checklist-based evaluation, but I was hoping to find a fix for the problem of students doing the checklists wrong. This takes something simple and makes it into more work for me than if it was just a point-based rubric. Unfortunately, the strategy I used did not make things any simpler. Instead of checklists, I gave students a list of the criteria that needed to be met in order to be satisfactory. Their work then was assessed as Satisfactory, Needs Minor Revision (fix within 48 hours), or New Attempt Required. New attempts could be made at the rate of one per week, as I've done for years in most of my non-studio courses. I ran into a bit of the same problem as I wrote about yesterday, where Canvas' "Complete/Incomplete" assessment combined with no-credit assignments leads to a bad user experience, but it was not among the dominant frustrations. Those frustrations were two: students not submitting satisfactory work, and students not submitting work.

The first of those is the most disconcerting. As with checklist-based grading, I gave the students the precise criteria on which a submission would be graded. All they had to do was to meet those, and most of them did. Sometimes it took minor revisions or a new attempt or two, but these were no big deal: handling and correcting misconceptions is exactly what the system is supposed to do. The real problem came from students who submitted things that were wrong multiple times after I had told them what was wrong. In a strict reading of the evaluation scheme, this means the work was still simply unsatisfactory, whereas in other schemes (including checklist-based) they might have gotten a D or C for the work. I am still torn on this issue: was the system unfair to students of lower ability or was it the only fair thing to do with them? Put another way, is it better to give a student a C when they still have serious misunderstandings, or is it better to clearly tell them that they should not advance until they understand it? I don't interpret any of the criteria I gave as strictly "A"-level. That is, it did not require excellence to meet those criteria. What it required was rigor

The other problem, of students not resubmitting work that needed to be resubmitted, seems unrelated to the evaluation scheme chosen. Speaking with professors across campus and institutions, this seems to be part of a generational wave of challenges. I have a few hypotheses about root causes, but the point of this blog post is not to opine on that topic.

Some of my early-semester assignments take the form of multi-week projects. For example, the set of assignments involve creating an Angry Birds clone. It is submitted as a series of three assignments with increasing complexity, and the complexity is scaffolded so that someone who has never made a game before can follow along. I had a student in the class this semester who fell behind, and then he wondered if he could just submit the final iteration of that three-week project as long as it showed mastery of each week's content. I ended up declining the request. One of my reasons is that the assignments double as a sort of participation credit. It makes me wonder though if it's worth my separating these things. For example, something I've done in other courses in the past is make it so that the final iteration's grade supercedes earlier ones if it is higher. 

This was the first semester that a colleague offered a different section of CS315 during the same semester. Looking at his students' games, as well as some recent conversations in the game production studio, made me realize that I should probably emphasize the build process more in my section. Rather than simply running their games in the editor, I should ensure that they know how to create an executable or a web build. It's an important skill that's easy to miss, and there's a lot to be learned by seeing the differences between running in the editor and outside of it.

Now that we've grown the number of games-related faculty in my department, there's a chance I may not teach game programming again until 2026. I expect I will come back to these notes around that time. The biggest pedagogic design question I will need to consider is whether to return to checklist-based grading (with its concomitant frustrations) or move to something else, like a simple point distribution. 

Wednesday, December 11, 2024

Reflecting on CS222, Fall 2024 Edition

I had a little break from teaching CS222 last semester as I wrapped up work on STEM Career Paths. I have not blogged much about that project, but you can read all about it in my 2024 Meaningful Play paper, which I understand will be published soon. In any case, here I want to capture a few of the highlights and setbacks from the Fall 2024 class, and I promise, I'm trying not to rant about Canvas more than I have to.

Regular readers may recall that I tried a different evaluation scheme this semester, which I wrote about back in July. In September, I wrote a detailed post about some of my initial frustrations with the system as well as a shorter one about how I felt my attention being pecked away. I don't want to bury the lede, so I'll just mention here that to compute final grades, I went back to my 2022 approach, the tried and true, the elegant and clean system that I learned from Bill Rapaport at UB: triage grading. Between my failed experiment this semester and the similarly failed EMRF experiment from last year or so, I feel like I'm looking for a silver bullet that doesn't exist. It reinforces to me, yet again, that I should really be running some kind of workshops for local people here to learn about what makes triage grading superior.

I still want to track some of the specific problems of the semester, though, so that readers (including future self) won't walk into them. First, I tried to set up a simple labeling system in Canvas such that I could mark work as being satisfactory, needing a minor revision, or needing a new attempt. I made no headway here in part because of Canvas' intolerable insistence that courses are made up of points. I talked with a respected colleague who is willing to toil over Canvas more than I about his approach, and he mentioned that he encodes this information into orders of magnitude, something like 10 points for satisfactory, 1 point for minor revisions, and 0.1 points for new attempt required. Combining these together, students get a weird combination of numeric and symbolic feedback. He acknowledged that it wasn't perfect. 

What I tried to do instead was to use Canvas' built-in support for grading as "complete/incomplete." Because that was all I cared about, I set the assignments to be worth zero points. When I used SpeedGrader, sure enough, the work was labeled properly. It wasn't until midsemester that I downloaded all the grades as a spreadsheet and saw that it only gave me the zero points. That is, whether the work was complete or incomplete was stripped from the exported data set. There wasn't so much data that I couldn't eyeball it to give students midsemester grades, which was facilitated by my recent transition to only giving A, C, or D midsemester grades (which are epistemologically vacuous anyway). 

It wasn't until weeks later that it dawned on me that my students almost certainly had the same problem: Canvas was showing them zeroes instead of statuses. Of course, all my policies for the course were laid out in the course plan, and I do not have any qualms about considering those to be the responsibility of my students. However, when the university's mandated "learning management system" actively disrupts their ability to think about the course, it becomes more of a shared responsibility. About two weeks ago, I went in and re-graded all of the work to use triage grading instead, which allowed me to distinguish not only between complete and incomplete, but also between things that were submitted-but-incorrect and things that were not even attempted.

One positive change that I made this semester was counting achievements as regular assignments. This made processing them simpler for me, and I suspect it made thinking about them easier for the students too. While they have a different shape than the other assignments, they are "assigned" in the sense that I expect people to do them to demonstrate knowledge. I also set specific deadlines for them, spaced out through the semester. This reduced stress from the students by providing clear guidelines, since they could still miss one and resubmit it later by the usual one-resubmission-per-week policy. It also helped me communicate to them that the intention behind the achievements is that they give you a little side quest during the project-oriented portion of the course.

I had a really fun group of students this semester, as I mentioned in yesterday's post. There were still some mysteries around participation, though. I had several students withdraw a few weeks into the semester without ever having talked to me. It is not clear to me if they decided the course was not for them or if they were simply scared. By contrast, I know I had at least one student who was likewise scared early on, but who stuck with it, and ended up learning a lot. It is not clear to me if there is more I can do to help the timid students lean toward that mindset. Also, despite excellent in-meeting participation, I had many students who just didn't do a lot of the assigned work. I have some glimmers of insight here, but it still puzzles me: how many times do I need to say, "Remember to resubmit incomplete work?" I hope that some of the simplifications I have made to the course will help streamline students' imagination about it, but more than that, I am thinking about the role of the creative imagination. I am sure that a lot of students come into this required sophomore-level class without a good sense of what it means to study, to work, or to learn. My friends in the Biology department recently took their required senior-level professionalism course, in which students do things like make resumes, and made it a sophomore-level course. I wonder if we can do something similar to help the many students we have who are not well formed.

Tuesday, December 10, 2024

What we learned in CS222, Fall 2024 edition

My students are currently typing away, writing their responses to the final exam questions for CS222. As per tradition, the first step was to set a 20-minute timer and ask them the list off anything they learned this semester that was related to the course. This was an enthusiastic group with hardly a quiet moment. They listed 130 items in 20 minutes. I gave them each six votes, and these were the top six:

  • TDD (9 votes)
  • SRP (8 votes)
  • Code cleanliness (6 votes)
  • DRY (6 votes)
  • Git (6 votes)
  • GitHub (6 votes)
Here are all the items they listed, together with the number of votes each earned, if any. There some interesting items here that point to interesting stories of personal growth. It was really a fun group of students to work with, even though several of them exhibited some behaviors I still cannot quite explain, such as a failure to take advantage of assignment resubmission opportunities.
  • Flutter (1)
  • Code cleanliness (6)
  • TDD (9)
  • A new sense of pain
  • How to set up Flutter (1)
  • DRY (6)
  • SRP (8)
  • Mob programming (2)
  • Pair programming (1)
  • Git (6)
  • Version control (2)
  • Future builder
  • Setting up your environment
  • Asynchronous programming (1)
  • UI design (3)
  • GitHub (6)
  • Code review (1)
  • Defensive programming
  • Working with APIs (1)
  • Model-View Layers (2)
  • Teamwork (4)
  • Better testing (1)
  • What "testing" is (2)
  • Explaining code with code instead of with comments (1)
  • Understandable and readable code
  • Agile development (1)
  • Naming conventions
  • Functional vs Nonfunctional Requirements
  • User stories (2)
  • Paper prototypinig
  • CRC Cards
  • User acceptance testing
  • Programming paradigms
  • How to write a post-mortem
  • Resume writing
  • Knowing when something is done (3)
  • Debugger (1)
  • Time management (3)
  • Using breakpoints
  • Test coverage (1)
  • Modularization
  • Distribution of work (1)
  • Communication skills (1)
  • Discord
  • Dart
  • commits on git
  • pull using git
  • Flutter doctor
  • pub get
  • Configuring the dart SDK
  • Rolling back commits
  • Checking out commits
  • Going to office hours early
  • Commit conventions
  • CLI tools
  • Don't use strings for everything
  • Structuring essays
  • Enumerated types
  • Sealed classes
  • Better note-taking
  • Humans are creatures of habit
  • Parse JSON data
  • JSON
  • Refactoring (5)
  • How often wikipedia pages change
  • Data tables
  • OOP (2)
  • URL vs URI
  • One wrong letter can lead to the program not working
  • How data are handled in memory
  • FIXME comments (1)
  • Widgets
  • State management
  • Encapsulation (1)
  • Abstraction (2)
  • Presenting projects
  • Coming up with project ideas
  • Reflection (2)
  • pubspec management
  • .env files
  • Hiding files from GitHub
  • Serializing JSON
  • Personal strengths & weaknesses
  • Falling behind sucks
  • Software craftsmanship
  • Work fewer jobs
  • Finding internships
  • Remember to email about accommodations
  • Accepting criticism on resubmissions (1)
  • Procedural programming
  • You don't have to take three finals on one day
  • Painting miniatures
  • GitHub has a comic book
  • Being flexible
  • Dead code
  • Holding each other to standards
  • Bad and good comments
  • Aliasing
  • Reading a textbook thoroughly
  • Rereading
  • No nested loops (no multiple levels of abstraction)
  • Using classes is not the same as OOP (1)
  • SMART
  • A bit about the Gestwicki family
  • Places to eat in NY
  • Getting ink to the front of an Expo marker
  • How to clean a whiteboard properly
  • New York Politics
  • Data structures vs DTOs vs Objects (1)
  • Conditions of satisfaction
  • Setting up ShowAlertDialog
  • Handiling network errors
  • Handling exceptions
  • Build context warnings
  • CORS errors
  • Semantic versioning
  • Dealing with Flutter error reporting
  • Test isolation (1)
  • Don't make multiple network calls when testing
  • Improving test speed
  • Always run all the tests
  • You can test a UI
  • Writing 'expect' statements
  • Running tests on commit
  • Autoformatting in Android Studio
  • Testing in clean environments
  • Creating dart files
  • Hard vs soft warnings
  • Functioning on 0-3 hours of sleep
  • Configuring git committer names

Top Five Videogames of 2024

Over on the Indiana Gamedevs Discord, one of the organizers encouraged members to share their Top 5 (or Top 10) games of 2024. I am fascinated by the fact that most of the other developers' top games are things I have never heard of. A friend pointed out that games were becoming like music, where each person has an individual taste that might be completely unknown to someone else. Trampoline Tales put their favorites on their blog, and I figured I'd go ahead and do the same.

It may be obvious, but these are video games. I don't pay much attention to how many or what kind of video games I play during the year except occasionally to wince at the hours spent on a particularly catchy title. For tabletop games, I log my plays on Board Game Geek and RPG Geek, which makes it easy to collect the data I need to write my annual retrospective. For this reflection on video games, I was pleased to see that Steam makes it easy to see which games I played by month over the past year. GOG's website  and my Epic account show games in order of activity. All these data sets are somewhat polluted by a combination of judging for the IGF and acquiring (but not playing) freebies from Prime or Epic. 

I ended up with seven games that were contenders for my favorite five of the year, but the ones I've chosen to list below really stood out from the others. These were not the only games I played, and in fact, they were not even the games I played most. There are some games I played this year that I found deeply disappointing, but I will probably keep those as internalized design lessons rather than writing a separate post about them.

Here are the five I listed for my fellow Indiana gamedevs, along with links and a very short blurb about them. 

  1. Dave the Diver
    I didn't know much about this game except that it was popular. I found the whole experience to be delightful.
  2. Tactical Breach Wizards
    Turn-based strategy, defenestration, and magic. One of the characters had an ability that I still think about, something I've never seen in a game before that is beautiful, elegant, thematic, and hilarious.
  3. SKALD: Against the Black Priory
    This is a wonderful homage to classic CRPG gameplay with just enough modern twists to feel fresh.
  4. Balatro
    This is a great example of a simple idea taken to a logical and beautiful end.
  5. Steamworld Heist II
    A sequel to one of the most interesting takes on the turn-based tactics genre, combining a 2D camera and platform elements with robots and firearms. Fun battles and rewarding power escalation.

Tuesday, November 26, 2024

Bloom's Taxonomy, Teaching, and LLMs

Recent discussions of LLMs in the classroom have me reflecting on Bloom's Taxonomy of the Cognitive Domain. Here's a nice visual summary of its revised version.

Blooms Taxonomy of the Cognitive Domain
(By Tidema - Own work, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=152872571)

Bloom's Taxonomy, as it is called, is a standard reference model among teachers. The idea behind it is that a learner starts from the bottom and works their way upward. As far as I know, it has not been empirically validated: it's more of a thought piece than science. This is reflected in the many, many variations I've seen in the poster sessions of games conferences, where some young scholar proposes a play-based inversion that moves some piece into a different position on the trajectory. All that is to say, take it with a grain of salt. The fact remains that this model has had arguably outsized influence on the teaching profession. (Incidentally, I prefer the SOLO taxonomy.)

There's been a constant refrain the past few decades among a significant number of educators and pundits that technology has made obsolete the remember stage. Why memorize this table of values when I can look them up? Why remember how this word is spelled? Spellcheck will fix it for me. My skepticism of the concept has only increased as I have worked with more and more students who use digital technology as a crutch rather than a precision instrument.

LLM-generated code comes up in almost every conversation I have among teachers and practitioners in software development. There are ongoing studies into the short- and long-term implications of using these tools. My observations are more anecdotal, but it's no exaggeration to say that every professional developer and almost every educator has landed in the same place: LLMs can generate useful code, but knowing what to do with it requires prior knowledge. That is, the errors within the LLM-generated code are often subtle and require knowledge of both software engineering and the problem domain. 

From the perspective of Bloom's taxonomy, a developer with a code-generating LLM is evaluating its output. They come to their evaluation by building upon the richness of cognitive domain skills that undergird it. At the very fundamental level, they bring to bear a vast amount of facts about the praxis of software development that they have remembered and understood.

If Bloom is right, then among the worst things we could do in software development education is throw students at LLMs before they have the capacity for viable evaluation. Indeed, before LLMs, the discussion around the water cooler was often about how to stop students from just searching Stack Overflow for answers and submitting those. Before Stack Overflow, it was that students were searching the web for definitions rather than remembering them. My hypothesis for learning software development then is something like this:

  • Google search eliminates the affordance for learning to remember.
  • Stack Overflow eliminates the affordance for learning to understand.
  • LLMs eliminate the affordance for learning to apply.
This hypothesis frames the quip that I share when an interlocutor discovers that I am a professor and, inevitably, asks what I think about students using ChatGPT. My answer is that I'm considering banning spellcheck.