Paul Gestwicki's Blog: Experiences interacting with generative AI in my solo game development work

I have been reflecting on the role of generative AI in education. In order to better understand it, I decided to incorporate some generative AI systems into my software development work the last two weeks. To be clear, I had no great intrinsic motivation nor external pressure to do so; I mention this because it seems that most of the stories I hear about using generative AI involve either one or the other. Instead, I approached it from a position of curiosity while also trying not think too hard about the immoral ways many models have been trained and the incredible amount of power required to drive the models.

For my context, I have been developing a game prototype using Dart and Flutter. Some of my earlier attempts at using Flutter for game development were hindered by my own floundering with software architectures: several times, I built prototypes only to have a constant desire to refactor due to code quality deficiencies. This was especially true when using TDD, with its emergent architectures and constant refactoring, which had me frequently doubting whether I was programming myself into a low-quality corner. I suppose this is the danger of doing TDD outside of Pair Programming.

I started by using ChatGPT as a sounding board to talk about some of my goals, including following the principles of Clean Code, good architectural separation, and robust tests. My first significant conundrum dealt with separation of layers while using a functional approach. That is, I had a World object that represented the immutable state of the world, and then I had an Adventure object that held a reference to the current World. Taking actions on the Adventure updated the World. The programming for this was made easy by using freezed. At the same time, interacting with the Adventure could generate events to which the UI needed to respond, and so I took a page out of egamebook and had the Adventure create a stream of events that the UI layer can listen to. Dealing with streams is something else that Dart makes quite easy.

These two different systems created a conflict regarding how exactly an interaction should play out, especially when one interaction generated multiple events. I explained this situation to ChatGPT, and I was impressed by how it summarized the pros and cons of each of the different approaches. It concluded that I should just stop what I was doing and simplify to a purely functional Adventure object that only transforms World objects, and in doing so, returns a list of events. The generated text was unequivocal and, I daresay, correct. I think I needed someone to point out to me where I was trying to do two things at once, and that my own constraints--using an immutable data model--logically led to my needing to abandoning streaming events in favor of a purely functional Adventure object. It took some rebuilding, but once I was done, the separation between layers was much more clear.

Dealing with automated tests is another area where I was surprised by the high quality of ChatGPT's recommendations. I had a test file with a few hundred lines of code, and I had written some helper methods to try to keep the test reading fluently. Yet, I felt unsatisfied with it. I pasted that code wholesale into ChatGPT and asked it to explain to me why I was feeling uneasy with that code, given my architecture. It observed that my code was mixing too many responsibilities together and that this could be improved by creating a separate test harness. This would move all the domain references out of my tests and into the harness so that changes to the domain would only change the harness, not the expression of the tests. Once again, it took some typing to get the system rebuilt this way, but the result was much nicer. In fact, a few days later, I made a significant change to the representation of actors in the world: I did not have to make any changes to my test code, and my harness only required a few tweaks to make it keep working.

Some time later, I ran into another problem, the nature of which I don't exactly remember, and I found that ChatGPT has lost the entire history of our discussions around software architecture. I asked it about what happened, and it gave me the impression that yes, sometimes context is just lost. I briefly considered doing more research into figuring out what this meant and how to avoid it, but I decided I should explore a different approach: generative AI that is integrated into the IDE. Because I'm using Android Studio, the path of least resistance was to enable the Gemini integrations and to give it access to my project.

My experience using Gemini was more mixed. Using the chat interface, Gemini provided advice that got me out of another architectural blunder with its recommendation to separate my Adventure object into different services. For example, I created a CombatService, and then injected that into the Adventure. My first attempt at this was to make the CombatService modify World objects just like Adventure does, but after some frustrations and interactions, I received good advice to make it only deal with the logic of combat, not with the logic of state management. This meant that my unit tests for CombatService could make sure it was working correctly. I could also inject a mock CombatService into the Adventure when I needed to control the output for Adventure's unit tests. For example, an actor has a chance to hit its target combat, but that kind of randomness needs to be factored out of a well-behaved unit test; making a mock CombatService that responds with a hit result meant that I could test that Adventure handled hits correctly without binding the test to a random outcome. Gemini framed this as a good place to practice separating unit and integration tests, which also gave me a useful perspective on this system I was building.

Gemini was less useful in terms of practical code examples, in part because it seems to not have enough training on Dart code that uses freezed objects. More than once, it tried to explain to me how I should be changing my code to use different kinds of object construction that just don't work with freezed. The agent mode, which is clearly labeled as being in beta, was particularly unhelpful because of this. As a result, I have barely touched software agents despite having heard some promising things about them, such as in this interesting conversation between the hosts of the Mob Mentality Show.

My initial reaction to these systems is that they have been helpful for me because I can understand the implications of their recommendations. There were several times where the recommendations were clearly bad because they violated some particular principle, and when I pointed that out, the system was sycophantic about how smart I was for pointing out this problem, and then it changed direction, presenting the new advice with as much certitude as the last. This combination was troubling: confident answers and fawning reactions to criticism. It would be a terrible team member. Despite my overall positive experience, I am left uncertain about how much it was teaching me versus how much it was helping me stick to my guns and reminding me of how various principles arise in practice. Put another way, I retain a suspicion that I would be much more productive if I was pairing up with someone of equivalent experience and similar goals to mine.

Figuring out what any of this means for students is a subject for another time.

Paul Gestwicki's Blog

Monday, October 6, 2025

Experiences interacting with generative AI in my solo game development work

No comments:

Post a Comment