Paul Gestwicki's Blog: May 2014

Friday, May 30, 2014

Revising courses, Summer 2014: Serious Game Design

I recently finished revisions to my Serious Game Design course. These come in response to the changes I made last Summer, when I designed a rather complex system of achievement-oriented assessment. I was happy with how the system allowed students variability in how they approached the themes of the course. However, I may have been a bit too tricky in the design. I had designed grade incentives for students to follow a particular path, and of course, these bright Honors College students determined that path and followed it. That involved unnecessary indirectness, however, and a few students called me out on it in course evaluations.

Seeing it in that light, I plan to divide next year's course into two units. The first five weeks will be structured, with students doing common readings and activities. Last year, I encouraged students to play a variety of games and reflect on them, but I think this was a "you can lead a horse to water" problem. Yes, the students played some games they wouldn't have otherwise played, but often just once (which is insufficient for critical analysis) or sometimes incorrectly (which leads to inappropriate analysis). Hence, the activities of the first five weeks will involve more reading and discussion of common experiences—including games we will play together in class, assuredly by the rules—and less divergent experiences.

The second unit of the class will be the design workshop. Last year, we spent five weeks coming up with various game concepts, and the following five weeks were spent iteratively prototyping one of them. The variety of game concepts last year was excellent, and they did increase in quality over time; however, in the last five weeks, many students got burned out or stuck on prototyping. I think they kept working on the prototypes only for the grade and not because they thought it was the best use of their time. This Fall, the plan for the last ten weeks is to allow students to present either concepts or prototypes each week. I have put a threshold on the number so that, for students to earn "good" grades, they need to iterate on a prototype several times. However, I am hoping that opening up these ten weeks this way will let students dabble a bit more, rather than feeling trapped into one prototype in which they have lost confidence.

I have kept a pared down version of the achievements system for the ten week design workshop. For a student to get an 'A', the minimum number of design artifacts would be two game concept documents and four generations of a prototype. Presenting one each week, this leaves four weeks for other legitimate activity—that is, to earn achievements. Most of these are repeated from last Fall's list, but there are two new ones. Networker is earned by going to a formal meeting of game designers, of which there are a few in Indianapolis, and Organizer is earned for organizing a playtesting session for the whole class. I am pleased with this last one, since setting up playtesting is an important part of the process, but there's no reason an interested student couldn't do it rather than me. This gives a student a chance to make a big impact in an authentic way. There is also an achievement that can be earned if students want to just keep on developing prototypes, since this is surely authentic class activity. I thought about adding an achievement for bringing snacks to a team meeting, but as much as I love snacks, this is probably too easy an out.

For several years, I have included an option in my game design class for students to read and present on A Theory of Fun for Game Design, the book that sparked my current scholarly endeavors. I have held off on requiring the book, however, for fear of trying to force students to follow "my path." Yet, every year, one or two students read the book, were amazed by it, and recommended that I make everyone read it. This year, I have decided to require it, and we will read and discuss it in the first five weeks. It was meaningful to me when I read it ten years ago and to my VBC students back in 2012, and I hope that it will inspire this Fall's group as well. I also hope that it will help us develop some common vocabulary.

Once again, we will be working with my friends at The Children's Museum of Indianapolis. They will providing themes and feedback on the designs, and I know that this was an important aspect of the course for the students last year. Last year's Fall game design course was intentionally scattershot: students explored a wide range of themes and ideas. This year, we are going to be a bit more focused. Also, I am planning on designing a game in the Fall as well, to use as a running case study. I look forward to the creative challenge as an opportunity to model some of my favorite practices, as well as to show students that nobody gets it right the first time.

Here are some links for the interested:

Wednesday, May 21, 2014

An open letter to the Spring 2014 Game Studio

Late is better than never, and our team shirts are finally ready. I think they look fantastic.

I've assembled a packet for each team member, and in the packet I included a letter, which I am happy to share here.

Thank you for your participation in the Spring 2014 Game Studio. During the semester, it is easy to get caught up in what Frederick Brooks calls the "joys and the woes of the craft." Even the final meeting was just one step in an incomplete journey.

Now that some time has passed, I think it is easier to reflect on the real impact of the semester. Most of us started out not knowing each other, and certainly, no one knew everyone. By the end, we were a team, each contributing to the creation of a novel and original educational game. This point is worthy of reflection: if we had not come together this semester, The Bone Wars game would not exist. We made a game unlike any other, a worthy software product, from nothing but our imagination and perseverance.

You will always be a member of the team who made The Bone Wars. I hope that you will continue to reflect on this experience and that it has positive value for years to come. Indeed, from my past immersive learning projects, I know that it can take the span of years to come to understand an experience.

This brings me to another important point. In addition to being a member of The Bone Wars team, you are a member of a small and exclusive club: Ball State University students who have been a part of serious games scholarship. These students have gone on to industry, to graduate school, and to found new ventures. Each group is unique and unrepeatable, and I cherish the opportunity to learn something from each one. To this end, I thank you for your feedback: your honesty and candor contribute to the improvement of all future projects. A special thanks to those of you who participated in the study: I expect this work to have a significant scholarly impact and, hopefully, to cause ripples that improve higher education in an even broader sense.

Wear your shirt with pride.

Tuesday, May 13, 2014

The Bone Wars Project Retrospective

Edit: It seems many of my pictures were eaten by a grue. I'm not quite sure what happened. Maybe someday I will find them again. Sorry.

This past semester, I was the mentor for a six credit-hour immersive learning course in which my students developed an original educational video game, The Bone Wars, based on the historic feud of 19th-century paleontologists O. C. Marsh and E. D. Cope. The team consisted of eleven students (ten undergraduate and one graduate), and like last year, we had a dedicated studio space in which to work. We were working with colleagues at The Children's Museum of Indianapolis.

I did not write much publicly about the project during the semester, and this rather lengthy post is my project retrospective. In this post, I start by giving a little background, and then I go into more details on some of the themes of the semester. These reflect concepts that arose throughout the semester, in my own reflection, in conversations, in essays, and in formal team retrospectives. My primary goal in writing this essay is to better understand the past semester so that I can design better experiences in the future. Like any team, we had successes and we had failures. I may dwell more on the failures because these are places where I may be able to do things better in the future. I will also pepper in some pictures, so that if you don't want to read the whole thing (and I don't blame you), you can at least enjoy the pictures.

The original game logo, now a banner on the team's blog.

Background

This project was internally funded by the university's initiative for immersive learning. I have led many immersive learning experiences, but the most ambitious and most successful one was undoubtedly my semester at the Virginia Ball Center for Creative Inquiry, where I worked with fourteen students for an entire semester on a dedicated project. That was in Spring 2012, and since then, I have had two "Spring Studio" courses—six-credit immersive experiences designed explicitly to bring ideas from the VBC to the main campus. I have recently received approval to offer another in Spring 2015, and hence my desire to deeply reflect on the past semester's experience.

Immersive learning, by definition, is to be "student-centered and faculty-mentored." This creates a tension in my participation: I want plans, ideas, and goals to emerge from the students with my guidance. Yet, college students generally have had very little leadership experience or real authority. Indeed, most of them have spent the last fifteen years in an educational system that minimizes agency, that is designed to produce obedient workers rather than ambitious creatives. This can result in a leadership vacuum on immersive learning projects, which then manifests as a stressful pedagogic question, "Do I step in and fix this for them, or do I let them fail and hope they learn from it?" I mention this here because this tension will emerge as a theme in other parts of the essay. Despite having led many immersive learning projects, I still struggle with this.

The illustration from the game's loading screen subtly establishes that Marsh is brown and on the left, while Cope is blue and on the right.

Prototyping

We started the semester with a few themes from The Children's Museum. After the first week's introduction to educational game design, each of us wrote a concept document for a game, choosing from among the set of themes. The choice to go with The Bone Wars was unanimous.

Once we decided on the theme, I asked each student to choose a concept document and create a physical prototype of the game. I made one also, in part so that I could model the process. The students had the option of presenting theirs to the team as contenders for the final game, and about half chose to do so. However, none of these that were presented were actually prototypes at all: they were ideas or sketches, but they were not playable, and hence they could not be evaluated on their own merits. It seemed that, despite the team having read a few articles on rapid prototyping (including the classic How to Prototype a Game in Under 7 Days) and game design, they had not really understood what it meant—or, at least, those who presented their ideas as contenders did not understand this.

This opens up a puzzle that I have not been able to solve. Half of the team did not present their designs as contenders. Were theirs actually prototypes? Did the other half of the class understand the readings, make playable prototypes, evaluate them, and then throw them away because they were not great? I hope that this is true, because then it was simply my failure to make then show me what they had, so I could explain to them that the first prototype is always thrown away. If not, however, then maybe the whole team did not understand, from the beginning, that game design is hard.

In any case, my prototype—which was created just to show them how it could be done—was the only one that was really a prototype at all. Half of the team members wanted to do another round of open prototyping, while the other half argued for moving forward into production with my prototype. My recollection is that the people in the former group were the ones who presented non-prototypes as contenders, but this could be wrong. Many in the latter group were coming from my game programming class in the Fall, where they barely got games working in a three credit-hour experience. I think it was on the strength of their argument that the team voted to move forward with my prototype.

A prototype in the studio, task board in the back.

This ended pre-production, and moving into our first three-week sprint, the team divided into two squads: one would iterate on the paper prototype and the other would build a digital version. However, at the end of the sprint, the game was fundamentally unchanged, and the digital prototype was nowhere near playable. This was not wholly unexpected, however, as the first sprint is always a struggle. To be clear, success was possible, but failure was fine as long as the team learned from it. The team agreed that, moving into the second sprint, they would create divergent prototypes based on the original one, while a few people took a more disciplined approach to the digital version as well. However, about two weeks in, I looked a bit closer and saw trouble.

As they had in the first iteration, the team had continued to report progress and claim late night design sessions. I had been distracted by the digital prototyping team, helping with difficult software architecture issues, and so I had taken it on faith that they had made progress. However, when I finally sat down with them and asked to see what there was, it was practically nothing. There had been a little incremental refinement, but nothing near the divergent prototyping that was required. Checking the clock, I gave them a fifteen-minute design challenge: take an idea that had been discussed, modify the existing prototype with this, and have it ready to test. The educational goal behind this intervention was not to produce quality design artifacts, of course, but to have them get a feel for rapid prototyping, to understand what it means to fail fast. My intention was to work alongside them, show them how in fifteen minutes I could make something, evaluate it, and determine what to do with it. I think I had two words written on an index card when I was called away by the digital prototyping squad with a legitimate software design problem. When I returned to the other squad almost twenty minutes later and asked to see what they had, they had nothing. They had chatted about game design ideas—again—rather than making anything that could be evaluated.

Two thoughts blazed through my mind. The first was, these students simply do not understand prototyping. The second was, if we don't fix this game design, we're up a creek. I gave them the challenge to make divergent prototypes and have them ready to evaluate within the next 24 hours. I got out my trusty index cards and spent an hour designing a different set of rules around some of the same game ideas—in part, again, to model the process, but also with the thought that we could have something decent if nobody else had anything. At the next meeting, I presented my new prototype. I explained how I had taken the ideas from the first prototype, as well as the playtesting results, research notes, and team ideas, and turned these into something new. The team seemed to like it, and I had hoped this would inspire them, create a spark of understanding for their own prototypes. However, they liked my revision so much, it became "the divergent prototype," and that's basically what you see in our final product. Note that their use of "divergent" here indicates that there was still a fundamental misunderstanding of the rapid prototyping process, but at this point, I internally declared it a lost cause: there was about a semester's worth of work to do and about half a semester to do it.

The game we built over the course of the next six weeks was essentially unchanged from the prototype I whipped up on some index cards: a two-player worker-placement game. It could have used some refinement, but there was no time for it.

An interesting shot of our whiteboard, with UI design ideas sitting on top of sprint retrospective notes.

Integration

One of the most important successes of the semester was getting our artists onto the version control system. With previous teams, we had awkward manual workflows for getting art and music assets into the game. This semester, however, we were able to teach one of the artists how to use TortoiseHg to mediate the pull-update-commit-merge-push process. Now, when a request came in to clean up an asset, or if there was something he saw that needed change, he could just make it happen. There was one morning session where he and I worked together on the fossil widgets, and it was smooth as silk to have me laying down code while he was cleaning up and producing assets. As great a success as this was, I cannot help but wonder, if we had this kind of workflow earlier, could we have showcased more of the artists' work?

The artists weren't just good at drawing, they were both quite witty. Sadly, the final game only offers hints of both.

Unfortunately, the audio assets did not have the same success. The composer had something wrong with his computer that prevented us from getting a Mercurial client to work (something systemic—it wasn't the only thing broken, and I suspect he needed a complete wipe). The quality of the music is undeniable, but up until the week before we shipped, there was only one song and a handful of sound effects in the game. Right before the end the of the project, the composer worked with a programmer, and they put all of his songs into the game, about 30 minutes of music for a ten-minute game. This is the main reason the game takes so long to download: some of the pop-up dialog boxes have 3-minute soundtracks! The root problem was the failure to rapidly integrate and test. If the music had been integrated even a few days before shipping, we would have noticed the spike in project size and remixed the short-lived songs and re-record some of the inarticulate voicework. While the visual assets underwent significant change and improvement during the last six weeks of production, the audio missed this opportunity.

Our software architecture involved model-view separation, with the model written using Test-Driven Development. TDD has been a boon to some of my other projects for dealing with the intricacies of game logic. After the first two sprints, however, we had a mess on our hands: the testing code was cumbersome, and bits of logic had been leaked into the view. After the second sprint, when we completely overhauled the core gameplay, it was an opportunity to throw away all that we had and start again—and that's what we did. The revised architecture made more prudent use of the functional-reactive paradigm via the React library, and I took the reins on the UI code, laying the groundwork for that layer.

However, the team continued to develop both the model and the view in separate, parallel layers, rather than rapidly integrating across the two. Integration was difficult and therefore not done. I paired up with some students to demonstrate a more effective model: picking a feature, then writing just enough of the model that I could add a piece to the UI—a vertical slice through the system, rather than parallel development of separate layers. I wanted this to be a major learning outcome for the students, particularly the Computer Science majors, but I doubt that most of them understood this. Even at the time, I recognized that I did a lot of the typing, when I could have stepped back and let them build an understanding by taking small steps. However, I also felt like I knew how much more work there was to be done, and they did not, and so I chose the path that built momentum toward completion.

I believe that these decisions were justified. Part of my design for the semester was inspired by research on situated learning and communities of practice, which show that apprentices learn best in a community when they are centered on the practices of master. By working with the students on some of these challenging design and development problems, I wanted to model not just a solution, but a mindset. However, I observed that the students seemed to bifurcate into two groups: a group who watched intently, asked questions, and copied what I did; and those who daydreamed, wandered away, and ended up contributing almost nothing to the project.

When a student was trying to fill up the poster, he asked what else to include. I suggested, "Game development is hard." It was meant as a joke, but it looks pretty good there.

One place where I saw a great success of integration was in the addition of cheat codes. I had explained to the team that these "cheat codes" were really tools of quality assurance, allowing testers to create specific game scenarios and verify expected behavior. This idea was bandied about for a few days without any activity, either because they didn't feel the need for it yet or they didn't know how to move forward. After I added a cheat-code-listener to manipulate a player's funds, I showed a few students how it worked. Fairly quickly, they became adept at adding new codes to test features. When it worked best, this showed how they understood how to break down a feature into vertical slices, embracing the behavior that I had modeled for them.

A final perspective on the theme of integration. One of the lessons learned from the Spring 2013 Studio was that I needed to schedule formal meeting times. Hence, this semester, everyone had to be available MWF at 9AM for stand-up meetings. This got around one of the problems from last year, where we had "the morning group" and "the afternoon group," which of course led to communication problems and trust breakdowns. Our three collocated hours a week were still not enough, and I never expected them to be. The plan was to use these hours as a seed and then find additional collocated times to meet; however, what happened was that those three hours became the only time each week that everyone was together.

I think the team set up bad habits from the first week of the semester, when we were doing reading and design exercises together. I asked them to do these in the studio, so they could talk about them, share results, compare analyses, and playtest each others' designs. However, I don't think this happened: they scattered to the winds, as if this was any other course rather than a collaborative studio. Then, when we were in production, I reminded them of the need to prioritize collocated studio time over extracurricular obligations. However, these mandates (which indeed they were, from the ground rules to which everyone agreed) were treated as casual requests. It was worse than falling on deaf ears—it was insubordination. This damaged not just the team's communication and collaboration patterns, but it also damaged my trust in them.

I hosted a few social events for the team, but there was very little participation. Some team members brought up the need for more teambuilding and social events during retrospectives, but no one else seemed to act on these ideas. As a result, I think the most significant failure was that the team itself did not integrate.

A partial wireframe for the final game UI.

Usability

The game, as delivered, is almost completely unlearnable: if you downloaded it, it would be very unlikely that you would figure out how to play, much less how to play well. What fascinates me is that the team didn't seem to recognize this. I understand that when you are close to a project, it can be difficult to step outside oneself and see it critically, as an outsider would. Perhaps I took it for granted that my students generally understood that some games had good experience design and some had bad, and how feedback in particular is a hallmark of good design.

Here is an anecdote to demonstrate this lesson. For weeks, I pointed out to the team that there was no indication of whose turn it is. The team knew the rules for whose turn it was: Marsh goes first on odd rounds, players alternate turns. However, there was literally nothing in the game to reflect this. The inaction regarding this defect started making me wonder if the team was ignoring it or if they disagreed that it was a problem. Then, I was in the studio one day when a student turned to me excitedly and told me how he was testing a feature and had to look away from his screen. When he looked back, he could not tell whose turn it was. It was just as I had said, he confirmed, and we should do something about it. When I pressed him to reflect on why it took him so long to realize that this was a problem, he responded that he was too close to the project, he wasn't in the habit of looking at it from the player's point of view.

This defect was addressed, albeit inelegantly, but the game still suffers from a woeful lack of both feedback and feedforward. Before taking actions, it's not clear what they will do; after taking actions, it's not always clear they did. This leads us to the next theme...

The "Fossil Hunter" dialog box, which I designed, is one of the most often misunderstood screens in the game.

Finishing

In our final meeting, it was clear that the students were proud of themselves for having completed a potentially shippable product. However, this had been the goal every sprint: produce a potentially shippable product that we can show to stakeholders and playtest with the target audience. Yes, the team deserved some pride for having finished one, but it's not clear to me that the team, as a whole, recognized that the game itself was essentially unfinished. The ideal situation was that they would have completed a potentially shippable product each sprint and learned to critically analyze it, and the next sprint would be spent improving it. The real situation is that they felt a bit of joy, and relief, at having completed a big-bang integration after fifteen weeks of effort.

I am sad about this because it is another educational opportunity lost. I am not confident that the students can look at the product they completed and clearly articulate where it is strong and where it is weak, or develop any kind of coherent plan for finishing it. Perhaps in another sprint, this could have been accomplished, and there's definitely at least three weeks of work that could be done polishing the game. However, that's not a sprint we have. I am hopeful that some team members recognize this, but it's only hope.

Community

For the first time, I had a team member dedicated to community outreach. This is something I had always wanted since working on immersive learning projects, but I had not previously been able to recruit one. His work resulted in the team blog as well as the Twitter handle. He also put together a series of podcasts for the blog that tell some of the history of Marsh and Cope.

His contributions were excellent, and it gives me some ideas for how I might work with such a student in the future. It would be worth investing more effort in getting the attention of serious games networks, because this could result in both dissemination and free expert evaluation. Despite having the blog and Twitter accounts, the game was really developed behind closed doors. Indeed, much of this is due to the failure of integration, but if we got over that hurdle in a future semester, we could be much more bold in promoting the potentially shippable products themselves. Similarly, although we ostensibly worked with The Children's Museum, this didn't manifest in our outreach efforts.

This student—like many others—also pigeonholed himself into his comfort zone, but that's contrary to the values of agile software development. The team saw him as an "other," and he didn't appear at home in the studio space after the physical prototyping ended. He produced the podcasts in part because the game did not capture the historical narrative of The Bone Wars well, and so they were designed as ancillary artifacts. What if, instead, he had worked more closely with the team to explore these in still screens or animatics? As mentioned above, it was very common for students to wall themselves in by their disciplinary background, but when engaged in interdisciplinary activity, I believe it is necessary to break down these walls. I had never worked with a student with these communication production skills before, and so I think I did not recognize what his walls would look like. I am hopeful that, from his positive experience and contributions, I can continue to recruit one or two similar team members for future projects.

Our visit from Muncie Mayor Dennis Tyler was due to the efforts of our social media manager.

Accountability

In the Spring 2013 Studio, I used a student assessment model borrowed from industry: a mid-semester performance evaluation, and an end-of-semester performance evaluation. Basically, the model was, "Do honest work, and trust me to give you a fair grade." While this worked well in most cases, it was not perfect. One student had contributed almost nothing all semester, but because of the collaborative nature of the studio, I felt that I could not give him as low a grade as he deserved. If it he had been appealed, I would have had very little evidence. To guard against this, and to encourage academic reflection, I used more formal evaluations this semester. At the end of each sprint, I was to assign students participation scores (0-3 points) based on how well they kept their commitments. Each student also wrote an individual reflection essay (0-3 points). However, this system may have proved to be worse—for me, anyway.

In the pre-production sprint, I gave everybody full credit and some formative feedback because I thought that they had all kept their commitments. Now, I am not so sure. It seems to me that if the team had really read and thought about the readings assigned that first week, some of those ideas would have shown up in the rest of the semester, but they didn't—not in reflections, not in essays, not in conversations, and not in practice. I noted this in the first production sprint, and so I changed the format slightly from my assigning participation scores into a survey question: did you keep your time commitment to the project? This is where things really broke down for me. Many students claimed to have kept all their commitments, putting in their 18 hours per week of meaningful effort into the project—and I did not believe them. I think they lied to me, willingly and knowingly, to get a good grade. The survey was conducted via Blackboard, and in my responses, I said as much to them. I told them what I thought their participation was, and what grade I was recording for them... and nobody pushed back. Nobody justified their original response, nobody provided evidence. This silence has to be taken as admission of guilt.

And it didn't just happen the first sprint.

This experience left a very bad taste in my mouth. That metaphor is not quite right... it left a weight on my heart. It's hard to listen to a stand-up report, knowing that several members of the team are content to lie to you as long as they don't get caught. I don't know if these students recognized what a burden it was on me. I don't know if they have any experience, any guilt, that would indicate to them how hard it is to regain trust. I suspect not, because as far as I could tell, no one seemed to try. Some did ramp up their efforts, but only to the minimum level to which they were originally committed, and some not even that much.

This leaves me with a difficult design problem: do I return to ad hoc assessment, knowing that some people may sneak by me, or do I use more rigorous assessment, and open myself to the heartache of having students lie to me for something as inconsequential as a grade?

Incidentally, I laid some of this on the table for the students in our final meeting, which was the last day of finals week. I'm afraid I may have been a bit of a downer, but I wanted them to understand that betraying my trust was not just a professional blunder, but that it actually hurt me personally. No one really responded to this, and there weren't any more reflection essays. I wonder what they think, and I hope they learned from it. The team's risks are academic, not economic. I realize that my risks tend to be emotional.

One of our few historical event cards.

Closing Thoughts

I am faced with a conundrum: what happens to The Bone Wars next? I feel a great sense of ownership over the project: it is my design, and I wrote a large portion of the code. As it is, however, it's almost unplayable because of the lack of user experience design. It would only take a few days to polish it, and it would not require any new assets. This would make it ready for presentation at conferences and competitions, and it would make it a piece that I personally can be proud of. If I did this, however, would it still be the result of a "student-centered and faculty-mentored" experience? Does that matter, once the semester is over and credit is assigned? I am honestly not sure, but the truth is that I have already made some modifications. Version 0.4, released by the team, contained some embarrassing typographic errors, again due to a failure to integrate early. I quickly patched it and re-released it as 0.5. I also have a version 0.6 that I've been tinkering with, a bit with a student but mostly by myself. This version is already much nicer than 0.5, but I am not certain what its future is.

The team had very little interaction with our community partner. We had one meeting with them, toward the end of the semester. It was extremely useful and provided a lot of direction, but only three team members were able to attend. I had positioned myself as a liaison between the team and The Children's Museum for two reasons: first, I knew that they were dealing with serious production stress so I did not want to bother them unnecessarily; second, I have seen students flub these community relationships, and I didn't want to damage ours. I need to consider whether or not this was a mistake. The easier fix, however, would be to arrange meetings ahead of time, far enough in advance that the team feels the pressure of having something playable to show them.

I know I need to do something more intentional with the scheduling. The past two years, I have given preference to getting the best people even if we don't have the best schedule. Now, I think this was a mistake. If the best people are getting together at odd times—or not getting together at all—then their efforts will be fruitless. If I was dealing only with Computer Science majors, it would be easy, since I know and can even influence our course schedule. However, every other major added to the mix increases the complexity. The most difficult to work with are the artists, who have significant studio obligations, but they are also the most critical for the kind of games that interest me. Right now, it's still an open problem, although I certainly need to have something more formal in place for next Spring.

The game design and UI design issues are tricky ones, but again, I think the past two years point me in the right direction. I have tried to model these six-credit experiences after my 15-credit VBC experience, but even there, the students barely finished their project on time. Prefacing the production with a "crash course in game design" has not been fruitful: game design is too difficult for the students to do well with such a tight timetable. I think that next year, I will have to have the critical design elements done ahead of time. Right now, I am leaning toward doing the design myself, working with my partners at the Children's Museum, and using that as a case study in Fall's Serious Game Design colloquium and as a starter specification in the Spring. This would allow the next crop of Spring Studio students to focus more on the problems of production and, hopefully, get a better idea of what it means to make something really shippable.

In a hallway conversation with one of the students, I articulated a taxonomy that I used for personal, informal assessment of learning experience design. It looks something like this:

What succeeded because of my designs?
What succeeded despite my designs?
What succeeded regardless of my designs?
What failed because of my action or inaction?
What failed despite my action or inaction?
What failed regardless of my action or inaction?

I still don't have clear answers to many of these questions. The good news, however, is that this immersive learning experience was also the subject of a rigorous study by a doctoral candidate in the English department. He has been interviewing students and gathering evidence to understand the students' lived experience, focusing primarily on writing, activity theory, and genre research. I hope that the study will shed some light into the areas that were hidden from me, and that I can use this to design even better experiences.

I know that many of my Spring 2014 students had a rewarding learning experience, and I hope they carry the lessons with them for a long time. There is a subset of the team in whom I am very proud—students who learned what it meant to take initiative, to make meaningful contributions, to learn from mistakes, to recognize weaknesses, to see the value of collaboration. To those of you who read this, I hope that this essay helps you understand my perspective a bit more, and of course I welcome your comments. As I said in the introduction, the intended outcome of this essay is that I can design even better experiences in the future.

Monday, May 12, 2014

A year of achievement-oriented assessment in CS222

I conducted a formal research study as part of my experiments with achievement-oriented assessment in CS222. Although the semester is over, I have not yet opened my consent forms to determine whose data I can use in the formal study. First, I wanted to take some time to write out my informal thoughts about the past academic year.

For context, here are a few links:

Revising Courses, Part III: Advanced Programming
Tweaking Achievements and Reflections
Fall and Spring course descriptions

My intention was to use the same format in the Fall and Spring, to get more consistent data for the study, but there was one piece that I felt I had to change between semesters: the tight binding of achievements and reflections. In the Fall, students' weekly writing assignments had to tie directly to their achievements. Late in the Fall, I realized that this was an unnecessary restriction on the topic of their writing, and that what I really wanted was for them to be reflecting on any course experience, whether it was the achievements or not. The Spring reflection essays were more varied and much less formulaic. In fact, some delved into personal stories that were quite moving and inspiring. Of course, I cannot say more about that publicly, except to point out that it was the kind of moment that makes me proud to be a positive influence in a person's life.

There were still some hiccups with the reflective writing. There were three parts to each essay in the Spring—one less than Fall—but still students submitted without hitting each part. I can probably mitigate some of this by spending more time with Blackboard, copying the requirements from the course description to each assignment rather than referencing the course description. The programmer in me wants to simply reference the authoritative document, but experience has shown that students fail to check the reference.

I am also concerned about students' balance of attention between course activities. The achievements are mostly programming-related and are designed to integrate with the final projects. The best students saw this and leveraged it, but it feels like the capacity to recognize this dropped exponentially with students' effort or attention. Reflections, similarly, were supposed to tie all these experiences together, and the best students got this; many other students wrote trite, predictable, platitudinous, or probably-false reflections just as a way to earn course credit. I believe that anyone can learn from reflective writing, and I suspect many of my students did, but I also suspect that many more did not learn significantly from this activity, since they treated it as a box to be checked. This makes me wonder if the whole system of reflections should, itself, be an achievement, rather than required. More on this in a moment.

The achievement system led to students engaging in a wider variety of learning activities than I had seen before. Furthermore, much of this was self-directed: I gave criteria whereby achievements were earned, and students were responsible for finding the resources to earn them. Of course, I myself was a resource, but one could almost draw a line between high and low grades as to who recognized this. This is a common problem, not related to achievements I think, where "strong" students will talk to professors, ask questions, seek guidance, and "weak" students suffer from culturally-imposed fears that prevent this (fear of showing weakness, fear of authority figures, etc.).

All of the achievements were posted to a shared folder on Google Docs where students could see all that each other have posted. All of my evaluations were posted as comments on these shared documents. It is not clear to me the extent to which students read each others' posts and learned from them, but I know that it did happen at least sometimes. I am uncertain whether students read each others' posts for content or for form, but it seems to be a bit of each: from what I gather, ambitious students would scan others' posts and see where I left comments, and then try to avoid these pitfalls. As above, however, we see the weaker students making the same mistakes that their peers made just a few lines above them.

Several achievements dealt with what I called "production code," meaning that it had to be the students' final project code or parts of Open Source software. This definition, which seemed clear to me, caused students trouble in both semesters. Regularly, students would submit achievements that used code that did not fit this definition, sometimes the same students week after week. The confusion between current project code and old project code seems like a lack of attention to detail; however, some students confused public open source code with snippets on StackOverflow or tutorial sites. I need to clean up this definition in the future, making it clear that the point is to look at code that is part of a system, not didactic or decontextualized.

This leads to another unexpected oddity: in the final round of reflections and in the final exam, students used terms like "production code" as if they were universally meaningful, as opposed to only being meaningful in our class. I did not make it clear that this was a definition for us, and so I worry that students may go into interviews or other discussions and mistakenly express this as vocabulary. This was much worse with the achievements, in fact, many of which I tried to name in clever ways. For example, one achievement deals with Java's toString, equals, and hashCode methods, and I called the achievement, "Master of the Big Three." I didn't know what else to call it when I was writing it up, and that seemed like fun, so I called it that. However, then I overheard students talking about "The Big Three" as if that meant something in an absolute sense. Similarly, I had several achievements dealing with design patterns, and since I see patterns as a way-of-programming, I called these "Pattern Disciple" achievements. Then I heard students talking about "pattern discipline"—they didn't even get the word right, but they were still using these corny terms as if they were accepted jargon. I am not sure how to fix this aside from coming up with much drier names for the achievements, but the punster in me will likely find this impossible.

To get an 'A' in the class, students needed to earn "meta-achievements," which were coherent sets of related achievements. For example, the "Engineer" achievement required using UML and design patterns. (In retrospect, it may have made sense to call these "quests" instead, to better show the metaphor of an 'A' being at the end of the journey.) The members of one of the Spring teams did earn this achievement, although because of how I articulated the achievements, they had to incorporate the Decorator pattern where it really had no place. When I saw this, I was disheartened: what's the point of a pattern if it doesn't actually solve a problem in the domain, and would the students recognize the difference? When I expressed this doubt to the team, one of them actually defended my policies, saying that he was happy to have an excuse to study the pattern and use it, regardless of whether it "fit" the project. At least this guy saw the balance of pragmatic and academic here, and the result was positive for us both; however, I think I can circumvent this in the future by articulating quests in a way that ensures the activity is still motivated by the project context. Most students have difficulty finding the kind of intrinsic motivation that this one had!

Right at the end of the semester, I had an enlightening conversation with a student regarding the entire metaphor of achievements. I asked why he had not been more active in pursuing achievements when they were clearly tied to the course grade. His answer may have been an excuse, but it's interesting nonetheless. As a gamer, he conceptualized "achievement" as they are often treated in game design: as ancillary rewards, secondary to the primary gameplay. The discourse around the semester focused on the nine-week final project, and I can certainly see how students would see that as "the point of the course." From a game design perspective, these weren't really "achievements," then, as used in contemporary game design: they were essential, as missions or quests, waypoints or progress markers. Another student put it another way, saying that they weren't achievements at all, they were just a set of optional assignments. This is valuable feedback, and I see two ways out of this situation: re-name the achievements to reflect their essentially-mandatory nature; or actually make them optional, perhaps as gatekeepers to tiers of grades ('B' and 'A', for instance).

Achievement submissions were organized by topic, so all the "Master of the Big Three" achievements were on one page, for example. This was convenient for students to see what their peers had been working on, and what kind of feedback I had given to them. The resulting course document structure was like a wiki, then, with content organized by topic. I wonder how different the learner's experience would be if we took a portfolio approach, organizing the content by contributor. This, I think, would make it easier for individuals to see and think about what they did all semester. On the other hand, it would work against the goal of helping students see themselves as part of a community, and I suspect it would be less likely that students would look at each others' work, since one would not know whether a particular student earned a particular achievement or not. Also, it might lead to many mid-level students honing their sites on the portfolios of a small number of strong students; this would be problematic since, from my reading of the literature, students learn better from people at similar skill levels. This leads me to think that the wiki-structure approach is superior to the portfolio-structure approach, and now that it's on my blog, maybe I will remember this point as I consider what ideas to incorporate into my summer course revisions.

When I first imagined the achievement system, I envisioned the contribution of high-quality content on which I could provide meaningful, thoughtful responses. In fact, in my original design, I would not evaluate the achievement submissions at all, since each one had crisply-defined requirements, but I added a professor-review before the start of the Fall semester. In practice, there were many submissions that were not up to snuff. On one hand, this is predictable: if students will often submit assignment submissions without actually fulfilling their requirements, why wouldn't they do the same with achievement submissions? On the other hand, why would they do this? The achievements were not graded on some archaic percentage scale. There were only three grades: incorrect, acceptable, and correct-with-distinction (coded as 1, 2, and 3). The best I can figure is that students uploaded incorrect submissions because schooling culture encourages "submit and hope for points" as the dominant idiom, rather than "consider whether or not this is correct." (This is just one of many soapbox issues for me, where I point my finger squarely at... well, pretty much all the rest of formal education, which is difficulty because I have pretty small fingers. I could go on about how the dominant schooling culture discourages reflection, accountability, creativity, and deep learning while encouraging risk-averseness, but this parenthetical phrase is long enough.) I wonder, perhaps, if the numeric coding itself brought to mind students' conventional experience with points. Indeed, one student told me that he really wanted to get those '3's, even though, from a grading perspective, they were equal to '2': the only difference was one of pride. If I keep this aspect of achievements, I will try turning these into symbols, perhaps thumbs-down, thumbs-up, and ... I don't know, maybe Chuck Norris.

As a preliminary conclusion, I call the achievement-based grading system a success for this course. Students were able to focus on their nine-week projects and engage in meaningful content-oriented and reflective activity, with freedom to choose activities that aligned with their interests. At a minimum, I want to redesign some achievements to align more clearly and obviously with the projects, but I will hold off on major redesigns until I take some time to read my study data.