Paul Gestwicki's Blog: October 2025

Wednesday, October 29, 2025

Why I am not using GitHub Classroom

As I mentioned the other day, I have turned my attention to preparing for Spring's courses. I am on deck to teach CS222 in the Spring. It's a fun and challenging course to teach, and I've been thinking about whether there are some ways to streamline the student experience. This morning, I spent some time investigating GitHub Classroom. It wasn't the first time, and that's why I'm writing this quick reminder to myself about why it's not the right fit for me.

There are two things that keep attracting me to GitHub Classroom. The first is that it reinforces the use of distributed version control. Anything that gets students into the habit of using version control and making small commits is good. The other is that it leverages the power of Markdown, where a writer can seamlessly integrate different kinds of text into one document: prose and code live happily together. This forms a synergy with the first point, since it gets students to start to think about version controlling all the things.

What specifically got me looking again at GitHub Classroom though was its support for feedback via pull requests. This would allow me to comment on an individual line of student work as well as the work as a whole. The shortcoming is that the writer cannot do the same.

I need a student to be able to do three kinds of writing:

Traditional prose
Source code
Commentary on specific lines of the source code

The writing environment has to allow a student to say that a particular section of code manifests a particular property. This cannot be done in source code comments because those comments are part of the code itself. Rather, I'm talking about a kind of metadiscourse that is at a different level of abstraction than the code. It cannot easily be done in prose either. This requires either interrupting the code with commentary or using a referencing system, such as line numbers. The latter is not terrible, but it pales in comparison to the approach I have used for years: using Google Docs' comment feature to comment on different parts of the document. Using this approach, both the student writer and I can evaluate the text. In fact, I wish I had one more level of abstraction: I would like to comment on the students' comments, not in a thread, but at another level yet.

These qualities describe a writing environment unlike any I have seen. It would be an interesting HCI exercise to explore how one might interact with such a thing. One might start by looking at Google Wave (RIP) or Code Bubbles. In the meantime, I will have to keep using word processors like Google Docs that allow for the metadata stream of comments.

Updated postscript:

I was thinking about switching from Google Docs to Office365, but after that experimentation, I don't think it is fit for purpose either. It seems like Word insists that we are preparing documents that will be printed on paper, whereas Google Docs has long allowed for continuous, non-paginated documents.

Saturday, October 25, 2025

Course revisions: CS390 Preproduction

I wanted to switch into writing mode this week, but I recently received my teaching assignment for Spring, and some relevant decisions were looming over my head. Of the three courses I'm teaching, I knew that my Game Studio Preproduction—the first in the three-semester game production sequence—needed the most attention. I have taught this course twice, both times closely following Lemarchand's Playful Production Process. It provided a useful framework for me and my students as we built our new curriculum.

The game production sequence has been on my mind as I have read other books over the last year, including Vasco Duarte's No Estimates and Tynan Sylvester's Designing Games, both of which I blogged about, as well as books like Mike Sellers' Advanced Game Design, which I did not. I have been particularly troubled by my agreement with Duarte about the distinction between scope-driven and value-driven software development. The former is based on a fixed scope, which unbinds budget and time, and the latter is based on fixed budget and time, which unbinds scope. I wrote a paper about this distinction that I will be presenting in two weeks at the Symposium on Games. To make a long story short, most of the published advice about games draws from AAA production patterns and takes a scope-driven approach, but my experience and learned intuition suspect that my teams will have much better luck with value-driven approaches. Sylvester was one who really helped me put a pin in this idea when he wrote about how much of game production borrows concepts from other fields rather than embracing the idea that software is different—which, of course, it is. He also introduced me to the phrase "therapeutic planning," which he cites to Nassim Taleb and which I find terrifyingly evocative.

As a result, I have decided to forego my previous approach to preproduction and, if it goes well, the production sequence next year. The scope-driven methodology will be replaced by an agile, value-driven one. Preproduction has an agile bias no matter how you slice it, but where this change makes the most material impact on the course will be in the expected deliverables. In the past two offerings, I have followed Lemarchand's advice and had preproduction culminate in a macro document and production schedule. My experience is that these are always spectacularly wrong, but you cannot exactly blame the students for this: given their limited experience, of course these will be quite wrong. Laying them out feels like therapeutic planning to me. I will replace the macro document and schedule with the concept document format proposed by Sellers, peppering a few of my perspectives and additions. Sellers' is a three-part document that includes a high-level description, a business-oriented product description, and a lengthy detailed design that captures core loops and interactions. I hope that the production teams will be able to use these document as more appropriate starting points once production starts, compared to the compounded awkwardness I have witnessed of having teams attempt to base production plans on poorly composed progenitors. On the contrary, something from Lemarchand that I have seen teams make incredible use of is an articulation of experience goals and design goals, and that is something I want to weave into Sellers' recommendations for concept documentation.

My previous sections included an explicit element of learning how to learn from a book like Lemarchand's, and I have no regrets about that: the students who engaged with the reading did learn significantly from it. This included spending significant time at the beginning of preproduction talking about and practicing techniques of ideation. This was usually interesting, but I realized that the student projects rarely had any significant connection to this part of the class. It is more like pre-preproduction. In my plans for Spring, I have cut this down to two in-class exercises that are designed more to get people talking and thinking than to generate anything weighty.

This change allows me to move into a slightly more structured prototyping phase, for which I am borrowing ideas from my colleague Travis Faas. I am putting a little more structure into the prototyping weeks, including constraints that I hope will foster creative problem-solving. My sketch right now includes a short paper prototype, a short experience with greyboxing, and two two-week explorations. My plan is for students to get into teams and settled on a direction just before Spring Break.

I still have a few holes to fill in my plans. For example, I would like to incorporate a structured analysis of successful games, showing students how they might reverse-engineer a Sellers-style concept document from a game that they have played. I would like to have everyone play the same one or two games so that we can share our analyses. If you have ideas of freely available games that would fill the bill, let me know. I'm seriously thinking of Rogue, given the popularity of "roguelikes" and the dearth of knowledge about the original, but I want to make sure I'm not doing it only for my own nostalgia.

Monday, October 6, 2025

Experiences interacting with generative AI in my solo game development work

I have been reflecting on the role of generative AI in education. In order to better understand it, I decided to incorporate some generative AI systems into my software development work the last two weeks. To be clear, I had no great intrinsic motivation nor external pressure to do so; I mention this because it seems that most of the stories I hear about using generative AI involve either one or the other. Instead, I approached it from a position of curiosity while also trying not think too hard about the immoral ways many models have been trained and the incredible amount of power required to drive the models.

For my context, I have been developing a game prototype using Dart and Flutter. Some of my earlier attempts at using Flutter for game development were hindered by my own floundering with software architectures: several times, I built prototypes only to have a constant desire to refactor due to code quality deficiencies. This was especially true when using TDD, with its emergent architectures and constant refactoring, which had me frequently doubting whether I was programming myself into a low-quality corner. I suppose this is the danger of doing TDD outside of Pair Programming.

I started by using ChatGPT as a sounding board to talk about some of my goals, including following the principles of Clean Code, good architectural separation, and robust tests. My first significant conundrum dealt with separation of layers while using a functional approach. That is, I had a World object that represented the immutable state of the world, and then I had an Adventure object that held a reference to the current World. Taking actions on the Adventure updated the World. The programming for this was made easy by using freezed. At the same time, interacting with the Adventure could generate events to which the UI needed to respond, and so I took a page out of egamebook and had the Adventure create a stream of events that the UI layer can listen to. Dealing with streams is something else that Dart makes quite easy.

These two different systems created a conflict regarding how exactly an interaction should play out, especially when one interaction generated multiple events. I explained this situation to ChatGPT, and I was impressed by how it summarized the pros and cons of each of the different approaches. It concluded that I should just stop what I was doing and simplify to a purely functional Adventure object that only transforms World objects, and in doing so, returns a list of events. The generated text was unequivocal and, I daresay, correct. I think I needed someone to point out to me where I was trying to do two things at once, and that my own constraints--using an immutable data model--logically led to my needing to abandoning streaming events in favor of a purely functional Adventure object. It took some rebuilding, but once I was done, the separation between layers was much more clear.

Dealing with automated tests is another area where I was surprised by the high quality of ChatGPT's recommendations. I had a test file with a few hundred lines of code, and I had written some helper methods to try to keep the test reading fluently. Yet, I felt unsatisfied with it. I pasted that code wholesale into ChatGPT and asked it to explain to me why I was feeling uneasy with that code, given my architecture. It observed that my code was mixing too many responsibilities together and that this could be improved by creating a separate test harness. This would move all the domain references out of my tests and into the harness so that changes to the domain would only change the harness, not the expression of the tests. Once again, it took some typing to get the system rebuilt this way, but the result was much nicer. In fact, a few days later, I made a significant change to the representation of actors in the world: I did not have to make any changes to my test code, and my harness only required a few tweaks to make it keep working.

Some time later, I ran into another problem, the nature of which I don't exactly remember, and I found that ChatGPT has lost the entire history of our discussions around software architecture. I asked it about what happened, and it gave me the impression that yes, sometimes context is just lost. I briefly considered doing more research into figuring out what this meant and how to avoid it, but I decided I should explore a different approach: generative AI that is integrated into the IDE. Because I'm using Android Studio, the path of least resistance was to enable the Gemini integrations and to give it access to my project.

My experience using Gemini was more mixed. Using the chat interface, Gemini provided advice that got me out of another architectural blunder with its recommendation to separate my Adventure object into different services. For example, I created a CombatService, and then injected that into the Adventure. My first attempt at this was to make the CombatService modify World objects just like Adventure does, but after some frustrations and interactions, I received good advice to make it only deal with the logic of combat, not with the logic of state management. This meant that my unit tests for CombatService could make sure it was working correctly. I could also inject a mock CombatService into the Adventure when I needed to control the output for Adventure's unit tests. For example, an actor has a chance to hit its target combat, but that kind of randomness needs to be factored out of a well-behaved unit test; making a mock CombatService that responds with a hit result meant that I could test that Adventure handled hits correctly without binding the test to a random outcome. Gemini framed this as a good place to practice separating unit and integration tests, which also gave me a useful perspective on this system I was building.

Gemini was less useful in terms of practical code examples, in part because it seems to not have enough training on Dart code that uses freezed objects. More than once, it tried to explain to me how I should be changing my code to use different kinds of object construction that just don't work with freezed. The agent mode, which is clearly labeled as being in beta, was particularly unhelpful because of this. As a result, I have barely touched software agents despite having heard some promising things about them, such as in this interesting conversation between the hosts of the Mob Mentality Show.

My initial reaction to these systems is that they have been helpful for me because I can understand the implications of their recommendations. There were several times where the recommendations were clearly bad because they violated some particular principle, and when I pointed that out, the system was sycophantic about how smart I was for pointing out this problem, and then it changed direction, presenting the new advice with as much certitude as the last. This combination was troubling: confident answers and fawning reactions to criticism. It would be a terrible team member. Despite my overall positive experience, I am left uncertain about how much it was teaching me versus how much it was helping me stick to my guns and reminding me of how various principles arise in practice. Put another way, I retain a suspicion that I would be much more productive if I was pairing up with someone of equivalent experience and similar goals to mine.

Figuring out what any of this means for students is a subject for another time.