Paul Gestwicki's Blog: May 2013

Wednesday, May 8, 2013

CS222: A Year in Review

Introduction

Three years ago, my department introduced a new required sophomore-level course, CS222: Advanced Programming. I was on the committee that designed the course, which was created to help students bridge between introductory programming and data structures courses to project-oriented, upper-division courses. I have taught the course several times, and this post is a reflection on my experience with two sections this past academic year.

Course Structure

Students come into this course having taken a course on discrete structures and two introductory programming courses—one on the fundamentals of programming and one on basic data structures. I also convinced the department to add freshmen composition as a prerequisite, in order to stave off the worst of the bad writing. This was the first year that the writing prerequisite was in effect, and I don't remember many times when I wanted to throw my monitor out my window at reading students' writing, so I suppose that was a success.

I used the same course structure in both the Spring and Fall semesters. We began with seven weeks of structured daily exercises, generally emphasizing reflective practice. For example, several assignments involved reading an article or expert programming tip then comparing and contrasting programs that follow and don't follow the tip. Frequently, I asked students to look at their own programs from the first two programming courses and evaluate them for phenomena such as naming convention violations or comment rot. The next two weeks were devoted to a two-week project undertaken in pairs, with a formative evaluation after the first week. In the Fall, I used Mad Libs, and in the Spring, an RSS analyzer that counted frequencies of words. The latter was inspired in part by my experience working on WikiBeat.

The last six weeks of the semester were devoted to the final project, which I described as a six week final exam. Teams of four pitched project ideas, which I vetted before approval. They had to set milestone objectives for themselves for three- and six-week milestones. Once again, the mid-project milestone was a formative evaluation (that is, it did not count in the final grade). Each team had to produce a technical whitepaper in addition to the working software. I provided a rubric before the pitches indicating how the project would be evaluated, and it contained the following categories: in the whitepaper, requirements analysis, acceptance testing, architecture articulation; in the software implementation, coding conventions, distributed version control, object orientation, and logging.

The logging requirement is unique in the list. I purposefully do not design interventions about logging, neither what it is or how to do it. Instead, I put it in the requirements as a litmus test of sorts, to see if the student teams can collectively identify an unknown, research it, and integrate it. If someone asks me for help, I will certainly point them in a fruitful direction.

A Year of Clean Code

When we designed CS222, I chose Joshua Bloch's Effective Java as a recommended text for instructors who wanted to use Java. One of the goals of the class is to teach students how to learn advanced programming concepts themselves, and Effective Java is a treasure trove. The book is full of general advice (such as preferring immutable objects) along with tips on how to implement these in Java (such as details on how to use final fields and prevent subclassing). In my three semesters teaching with this book, I used assignments to scaffold the process, starting with assignments in which I directly told students how to read and apply a section, moving toward assignments in which the student had to find, understand, and integrate new tips. I was happy with this approach, but each semester, there was a vocal, significant minority who complained that the book was impenetrable. There are many reasons that sophomore undergraduates may have had trouble with these exercises, but I don't have the luxury of addressing the root causes: I had to deal with the fact that some students were having real trouble with this.

Around this time, I read Robert C. Martin's Clean Code and began using it in my projects. I adopted Clean Code this past academic year, using it in the Fall and Spring offerings of CS222. My hope was that it would be more accessible to the students because it is easily readable; my fear was that they would not be able to apply the principles because it was separated from the specifics of a programming language. I was also inspired by the success of using Clean Code in my VBC project last Spring: for all of the complex and difficult work that team did, Clean Code still came up as one of the most important things they learned.

Different Semesters, Different Populations

I had the smaller of two sections in the Fall, only around a dozen students. Their abilities generally met my expectations, and we had an excellent semester. There were four project teams, and each decided to create a game: one infinite side-scroller, one tower defense game, and one platform game. I had told them that they could make games, but that despite the fact that I teach game programming as well, I discouraged them from trying to make a game in six weeks. Once they made their pitches, though, it became a very comfortable teaching and learning experience, since I am intimately familiar with game programming and, specifically, the kinds of problems that novices have as they start making their own games. I was quite proud of the students' work, and so I allowed them to vote on whether the final presentations should be open to the public or not. They voted unanimously for it, and it was great to see how proud they were of their accomplishments and how impressed the audience was in their products.

Unfortunately, the story does not end there. When I evaluated their three-week milestones, I noticed several significant violations of coding conventions (Clean Code in particular), problems of object-orientation, and a failure to do appropriate logging. The final projects had more features, more polish, but still contained significant flaws. We had a good discussion about this after the final presentations, about how the teams had gotten caught up with features—as I had warned them might happen—and had neglected the actual purpose of the project, which was to incorporate the semester's lessons into an interesting and personally meaningful project.

In the Spring, I had the only section, about two dozen students. I started the semester by showing two of the games from the Fall, explaining to the new group that they would be able to make their own projects too—but that it was important that they learn how to do it right. I used FizzBuzz both semesters early in the semester as an introduction to test-driven development and object-orientation; I noticed that the Spring group had trouble getting just the basic FizzBuzz to work, but at the time I dismissed it as a fluke of having to meet at 8AM.

Turns out, it was not a fluke. The Spring class' technical abilities were far below the Fall's, a contrast between semesters unlike any I had seen before even in teaching "off-sequence" introductory programming courses. The first several weeks of the class involve reading, reflecting, and analyzing previously-written code; the students don't start writing significant programs until several weeks into the semester. As a result, it took me almost half the semester to recognize that many of the students in my class—even of those left after the drop deadline—could neither read nor write rudimentary Java programs. This means that they didn't really get much from the first several weeks either: while they could identify naming conventions violations, many didn't have any mental model for what the program was actually doing.

I did a bit of detective work in an attempt to root out the problem, and I found two contributing factors. First, several of my students had taken their two introductory programming courses in the 2012 Summer sessions, and had no prior programming experience. This means that they did programming for only ten weeks of their lives, and then didn't touch it for half a year, and then showed up in my advanced programming class. These students simply lacked the experience required to have any sense of the craft (although, I should note, one particular student recognized this failing and, laudably, demonstrated significant growth by undertaking out-of-class projects and befriending a community of practitioners). The second factor is that, according to the word on the street, some teaching in the introductory sequence only use fill-in-the-blank programming assignments. I have heard this rumor intermittently, and one must always take such rumors with a grain of salt. This semester, however, a trusted student in my game studio told me how he had never written an application before CS222.

Consider the implications! A student with no prior programming experience decides she wants to take introductory programming, spends fifteen weeks struggling with concepts like iteration, sequencing, and selection, and after fifteen weeks... still has no idea how to write a program to actually do something she wants to do! I am sure that the professors are doing what they think is best, but they are wrong. Although I cannot do anything about this in CS222, I am a perennial member of our foundations curriculum committee, and so I have some agency in how this can be addressed as a curricular issue.

By the time I had all the pieces together, it was too late. In fact, I designed some in-class exercises to help catch people up on nomenclature they consistently got wrong—a student pointed to a parameter and called it a "constructor" in the last three weeks of the semester, for example. However, at that point, the people who needed help most had stopped attending class. Indeed, attendance plummeted in the penultimate week of the semester, although I never got a good reason for this besides students' being overwhelmed by undergraduate life.

In any case, there were five project teams in the Spring, and they developed: a Bomberman clone; a 2D maze; a financial calculator for expense-sharing flatmates (originally through Google Apps Scripts, then transitioned to GWT); a Reddit client for iPhone; and a point-and-click adventure game. All were able to get their applications running, although with less elegance than Fall's teams. Several of the teams incorporated appropriate logging frameworks, even if their use of these was less than ideal—but, I recognize that it's hard to understand logging before you've had to debug an application that's too big to fit in your head, so they got credit for their efforts. Unfortunately, these groups fell into the same trap as in the Fall: they added a lot of features in the second iteration, but they failed to address the comments in my three-week evaluation, resulting in many significant methodology violations. Some of the teams dropped the ball on the whitepaper, which I suspect is also because they focused on features (such as running without crashing!) rather than stepping back to ensure that requirements were met.

Tardiness and Donuts

There was one other significant difference between the Fall and the Spring semesters. I have always been offended by students who come to class late, and I decided to add a new policy to the Spring course: no late admittance without donuts for the whole class. I borrowed this from my game development studios, where we begin each meeting with an important stand-up meeting. Putting this policy in place ensures that team members show up on time. My intention for the Spring, which I explained, was that we are a learning community, a team of learners, and that coming in late is unprofessional and disrespectful. Some students immediately pushed back against this, falling back on predictable consumerist notions that they have paid to learn and will soak up what they can no matter when they show up. Indeed, I think that the group as a whole misunderstood the donuts as punitive rather than apologetic. This was exacerbated by the fact that there were some big personalities in the class who tended to drive the discussion towards dissension rather than critical analysis.

It bears repeating that the Spring class met at 8AM&mdash. It is not a time I would choose, but because of this, there was no excuse for coming in late: no previous professor could hold a student back, no class-across-campus would give them an impossible commute. Instead, all one has to do to show up on time to an 8AM class is wake up early enough. In fact, I explained to the students my belief that if you are not there five minutes early, you are already late: we begin at 8AM, which means everyone should already be seated, laptops and pens ready.

I don't think they got it. Several students complained that my no-late-entry policy was preventing them from learning, instead of recognizing that this was a learning opportunity in itself. I still struggle with this: how do I encourage students to act professionally when the unprepared ones—the ones who really need the lessons—are the same ones least ready to accept it?

What We Learned

As part of the final exam, both groups generated a list of about a hundred things that they learned. As I have described before, this is done in two phases: first, anyone can contribute anything that they learned, and it is recorded; then, everyone is given a small number of votes to select those items that are most important. Taking the top ten percent or so gives the items that are most important to the collective learning community.

The top items for Fall were:

Clean code
Teamwork
Single-responsibility principle
Class design

The top items for Spring were:

Don't Repeat Yourself (DRY)
Clean Code
Build one to throw away because you're going to anyway
Refactoring
Naming conventions
Research before acting

Keeping in mind that we followed essentially the same course structure, it's interesting to me how these two lists compare. Clean Code shows up in both, which is to be expected since it was a major theme of the class. Indeed, many elements on both lists are principles that are upheld in Clean Code, and so it's no surprise that students would put their vote into the bigger category to capture them all. "Teamwork" only shows up in the Fall, where the teams worked very well and fruitfully together; it does not show up in the Spring, where many teams experienced difficulty coordinating their efforts. By contrast, the Spring team shows explicit reflection on process, recognizing that the first try is almost always wrong: several Spring teams had to throw one away in order to succeed, whereas the Fall teams kept patching their existing code. The Spring team also recognized what I usually articulate as the human failure mode, "Inventing rather than researching." I think this group became more aware that they had to learn in order to succeed, whereas the Fall group—with their generally higher aptitude—perhaps had already experienced this. Perhaps, in comparing the two semesters, I can hope that CS222 was able to fulfill its role in the curriculum in that both groups are prepared to do larger project-oriented work in upper-division courses. The course cannot magically grant months or years of additional experience to those with little, but it can provide them with the intellectual tools necessary to proceed. I will be eager to see how these students turn out in their senior capstone teams.

One of the students in the Spring mentioned something that I have never heard before during this final exam exercise. He tentatively raised his hand and asked whether we could put anything that we learned this semester or not. I confirmed, and he said, with some exasperation, "I need more practice!" This is a great insight, and although it did not receive many votes, I think it is one of the keenest observations a student has made in my class.

Closing Thoughts

CS222 is an enjoyable but sometimes frustrating course to teach. It is designed to build a level of competency in the students, but it suffers from highly variable inputs, as illustrated above. I am considering adding a programming assignment to the first week of class, an entrance exam of sorts that would allow me to better gauge the collective abilities of the students: if I cannot influence the prerequisite courses, at least I can get a better idea of where the students are, so I can meet them there.

I have written before about my idea to restructure the course more explicitly around essential questions. Reflecting on this past year, I think this is a good idea. In the Spring, one of the learning articulations from the final was, "A working program is not necessarily a good program." It only got three votes, but it cuts to the heart of what I want the course to teach. Perhaps by articulating this as an essential question and giving it privileged status on the course description, the Web site, and in exercises, more students will think more deeply about it. It would be a success if students could go beyond valuing Clean Code in itself to building a deeper understanding about what it means in practice and why it exists.

Speaking of Clean Code, I am generally happy with the transition to Martin's book. However, when I made the switch, I still kept my teaching approach from Effective Java: carve out bite-size pieces and put them into daily reading and analysis problems. Evidence from the six-week project shows that even very strong teams have trouble remembering these lessons and contextualizing them in their work. This may be because Clean Code tells a bigger story than Effective Java: it's not a bundle of tips so much as a way of thinking. I am considering, then, a pedagogic change: if I have the students read the whole book very early in the semester, will this give them a mental map of its contents sufficiently that we can then draw upon both the holistic sense of Clean Code and reinforce specific items that I find important? For example, if we read through the book in the first four weeks, could we then spend the next three weeks focusing on critical issues such as DRY, immutability, and SRP?

Regardless of the book being used, I have had essentially the same six-week project format since the first time teaching CS222. However, students struggle with the requirements analysis and making realistic pitches. This is partially because we do it on such a tight timeline that there is almost no time for me to reject proposals. I am considering the implications of turning the six-week project into a nine-week project, where we spend the first iteration on concepts such as user stories, risk management, and prototyping. We already do cover these, but in a separate context from the students' projects. An alternative approach would be for me to more rigorously specify what must be delivered at each milestone. In my current approach, I evaluate Milestone 1 as if it were final, but the grade does not "count" for the students; it is designed to presage the evaluation I will give on Milestone 2. I suspect that one result is that students simply don't prioritize the things that sound difficult, such as code review and a whitepaper. I could mandate that these be done, and use the old carrot-and-stick of grades to enforce it. I don't like leaning on extrinsic motivation in this way, but perhaps it would help scaffold students more effectively toward strong Milestone 2 submissions.

I enjoy teaching this course. I think that it suits my teaching style and my emphasis on reflective practice, even though the course structure, format, and even its content is a work-in-progress. The content of this course is still not well referenced by the rest of the curriculum, even those classes that have it is a prerequisite. The students, without prompting from their instructors, fall back into bad habits. For example, I saw a graduate student give a presentation on an project that, from a cursory glance at Eclipse, was rife with compiler warnings, object-oriented design flaws, and violations of coding conventions. I would like to see students emerge from CS222 not just empowered to program well, but passionate about it to the point that they see the value in practicing good programming in all of their work—whether it is graded that way or not.

Tuesday, May 7, 2013

Reflection on The Spring Game Studio

Introduction

In the Spring semester, I was the mentor for an immersive learning project through which a multidisciplinary team of students developed an original educational video game. This post is a reflection on the semester. This post begins with a description of the studio's charter and its performance. With this background, I describe some specifics of my mentoring, focusing on the formats of the iteration and summative (end-of-semester) retrospectives. I close with some of my own personal reflections and a few acknowledgements.

I have peppered some pictures throughout. Some are related to the surrounding text, and some are not. However, I have found that when I need to prepare a presentation about my work, it's often easiest for me to search my own blog posts, since this gives a good curated collection.

An Overview of The Spring Game Studio

The Spring Game Studio—having officially been named Purple Maize Productions—had two formal meetings during finals week. On Monday, the team delivered Children of the Sun, their educational video game based on the Middle Mississippians, to our community partners at the Indiana State Museum. This was to mark the end of the team's production focus, and on Friday, we held our last formal meeting: a semester retrospective focused on what the team learned during their fifteen weeks together.

The team was able to overcome significant human, technical, and design barriers in order to create this game. On the human side, the team was working in a shared space but without a shared time. I had to decide during recruitment whether my priority was to pick the best student composition or those who could work at the best times. I chose the former, and so my individually-talented students had to work around the lack of consistent, whole-team synchronous communication. We used Google Groups and Google Drive for digital communication and, in physical space, several shared whiteboards. On the technical side, we were developing a networked iPad game without being able to have a persistent server: we ended up using Bonjour to discover local machines, with one having as both server and client. None of the students had network game development experience, and most had no game development experience at all—much less educational game development. Finally, regarding design, the team was working within a rare combination of constraints: an educational game, about Middle Mississippian culture, for upper-elementary school students, that they would play only once, on iPads, together, in a formal educational program about archaeology.

Start screen

In the village, giving orders to 300 villagers.
The layout is based on archaeological evidence from Angel Mounds.

Planning, Reviews, and Retrospectives

In the past, I have always aligned the three Scrum-style meetings of Planning, Review, and Retrospective to align with weekdays: Planning on a Monday, Review and Retrospective two Friday's later, for a two week sprint. Unfortunately, the only time the Spring Studio reliably had to meet as a whole was Tuesday mornings, so modifications were in order. We held Planning meetings at the beginning of a two week sprint, and at the end, we held short Review meetings with whoever could be available on the Monday afternoon ending the sprint. The Retrospectives were not aligned with the end of sprints, instead taking place on Tuesdays that were not planning meetings. We did not have a retrospective in the middle of the first sprint, and another was cancelled due to my planning error; hence, we had only three iteration retrospectives during the semester.

I articulated and prioritized user stories based on my understanding of the team, the project, and the community partner. The Sprint Planning meetings went in two phases: planning poker to assign story points, then the team's breaking down the stories into tasks. I often skip planning poker in a three credit-hour studio course, but since these students were earning six, we had time to include it. I'm glad we did, since the conversations that arose from this activity helped the team to see what needed to be done; this was especially true later in the semester as the team developed a better vocabulary and better understanding of the game, the tools, and our processes.

Promotional art for Children of the Sun Infinite.

I have been in the habit of using a Start-Stop-Continue format for iteration retrospectives, and it has served my teams well. However, I read Kua's The Retrospective Handbook: A Guide for Agile Teams in preparation for the semester, and I decided to deploy some ideas from it. For this semester's iteration retrospectives, I wrote four categories on the board:

What did we do well?
What did we learn?
What should we do differently?
What still puzzles us?

At the beginning of each retrospective, I read the prime directive and then asked the team to write down contributions on sticky notes for each of these categories while I did the same. After ten or fifteen minutes, we came to the board and posted them all, and I encouraged the team to read each others' to find clusters. Then, I led the team in a left-to-right reading and discussion of the notes, looking in particular for potential action items, which were recorded on a separate board. After going through all of the items, we culled the list of potential action items into just those items that the team agreed were important. Individuals committed to these, and the meeting was adjourned with another reading of the prime directive.

We were rarely all in the same place at the same time, but this artist's rendering shows what it might have looked like.

One thing I missed in the early retrospectives—honestly, because I forgot—was to triage troublesome notes into a separate column. There were a few cases where we probably had too deep a discussion about singular notes, and this broke our stride: we should have postponed that discussion, since the rest of the exercise was going to reveal more details about it anyway.

There's a good story here about community learning. In the early sprint planning meetings, I served my usual role of directing both the planning poker and task articulation processes. However, it was revealed in a retrospective meeting that my articulation of a particular UX design process did not make sense to the team. (Their reasons for not seeking clarification during the planning meeting or when the miscommunication was recognized are not known to me.) One of the action items from that retrospective meeting was that one team member was responsible for ensuring that when I wrote a task, the team understood what it meant. However, the more I thought about this, I realized that a better approach would be to give the articulation more directly to the students. Surely, I had modeled the expected behavior well enough for them to do no worse than I did! At the next planning meeting, after planning poker, I told them it was up to them to take charge, and I sat down. To my great delight, two of them jumped up, ready to grab the sharpie and sticky notes. They did a great job, and I don't think the team had any more of these task-articulation defects.

Sprint Planning: Student-Directed Task Articulation

The End of Production

Our original production schedule called for the game to be feature-complete four weeks before the end of the semester, allowing ample time for polish, quality assurance, evaluation of efficacy, and consideration for distribution on the App Store. The team missed this goal, as well as the revised goal of being feature-complete two weeks before the end of the semester. I set a firm production deadline of 4PM the Friday before our final meeting, which the team also missed. Work continued through the weekend, though by only one or two developers who claimed it was all clean-up. This "clean-up" required additional clean-up Monday morning before the ISM meeting. The ostensibly final version that was delivered at the meeting was revealed, in the demonstration, to have a defect that prevented the game from starting. The team was duly embarrassed and, to my embarrassment, pulled out a laptop and tried to fix the game during the meeting. As another part of the meeting continued, a small group was able to put in a kludge to get the demo working, at least partially.

One of the team members agreed to drive down to the ISM on Thursday to deliver the fixed version of the code. I became concerned about the radio silence after Monday's meeting, and on Wednesday, I posted a message to the team stating, "I trust that the changes made between Saturday and Monday have been subjected to rigorous quality assurance." This was a manipulative lie, of course. One developer (or perhaps I should say, only one developer) came in and spent several hours in the studio Wednesday, where he rooted out the actual defects and fixed them—despite his being nearly overwhelmed by his senior capstone project, which was due at the same time. I have to wonder, what would have happened if he had not done this?

Final presentation to the ISM. Notice the nifty team polo shirts!

Semester Retrospective

In the past, I have used a two-phase, divergent-then-convergent approach to semester retrospectives: the team brainstorms a list of things they learned during the semester, and then votes on the top collectively-important items. Recently, however, I read "Groupthink: The Brainstorming Myth," which reminded me of the dangers of purely-divergent, criticism-free brainstorming. I decided to ask the students, several days before our final meeting, to reflect personally on what they learned. In particular, I asked them to produce an artifact such as an essay or list based on this reflection, arguing that articulating their ideas would help their understanding—a classic encouragement of writing for metacognition. In hindsight, I should have asked them to send a copy to me: one student did, of his own volition, and looking at it now, I can trace his personal learning into the collective learning.

For the retrospective meeting itself, I was strongly inspired by Alistair Cockburn's recent technical report on knowledge acquisition during product development. In this report, Cockburn identifies four areas in which teams build knowledge, quoting:

How to work together, and indeed, whether these people can get this job done at all.
Where their technical ideas are flawed.
How much it will cost to develop.
Whether they are even building the right thing.

I paraphrased these four categories on one whiteboard and gave each a stack of colored sticky notes. Comparing my shorthand to Cockburn's articulations, I realize I should have used his nomenclature: it is more elegant though more verbose, but I had the room:

Four categories of knowledge acquisition.

I drew a timeline across the adjacent whiteboard, highlighting significant events from the semester, including: the three week orientation and design workshop; the six subsequent two-week sprints; our presentation at the Building Better Communities Showcase, the Butler Undergraduate Research Conference, the Ball State University Student Symposium, and Science Day at the Fairgrounds; and our meetings with the ISM. The team reviewed the timeline and agreed that I had all the major events in the right places.

After explaining the four categories, I invited the team members to translate their individual learning (from their personal reflections) into collective learning, to choose a category and write down something the team learned, and to mark it on the timeline. They seemed a bit hesitant, and when I asked if they wanted me to start, they agreed. I picked up a blue sticky note and wrote, "Dr. G cannot be here in the studio all the time," explaining that I learned that I could not be there with the team as often as I wanted to. Although I knew this, cognitively, from the start of the semester, its implications didn't really hit me until about a third of the way through production. They agreed that this was a good estimate of when the team recognized that my interventions would be necessarily periodic, and so I stuck the note there and sat down. (In hindsight, I should have written one of the implications of this, namely, that I wouldn't know what people were working on.) They still seemed hesitant, so I stood back up and grabbed an orange note, marking on it that it we were making a game about the Middle Mississippians; I actually knew this before the semester started, but it was announced to the team at the end of the first week of the semester, so that's where it went.

After this, team members began coming forward and adding to the timeline. However, to my dismay, one of the first people to approach the board, grabbed a pad, and then sat down in a chair by the board. I held out hope that they would write quickly and get back up, but sadly, this became the pattern: people grabbed the pad they wanted, sat down to write, and then put the note on the board. In fact, after a time, people just wheeled their chairs forward and grabbed a pad. I suspect that this was a significant loss of momentum, and I regret not having moved that chair away. Several times, it was clear the student didn't really know what to write until he or she had grabbed a pad and sat down and started to talk through it. Despite this, I am glad with what the students contributed, although I had hoped for more items.

Annotated timeline from the semester retrospective

The image above demonstrates that a majority of articulated learning outcomes dealt with how the team should work together (blue). While there is a good spread throughout the semester, there is a cluster in the rightmost side, which is marked in the timeline as "today." These are items that came out of the personal reflections and collective retrospective, and they include the idea that the studio space is important, that technical debt needs to be paid off regularly, that we had a particular focus on "making" (in contrast to, say, "understanding") in part because it's what the team was explicitly recruited to do, and that it's easy to fall into the human failure mode of inventing rather than researching.

Here are a few of my personal highlights from the timeline. Clustered at the middle and end of the penultimate sprint are "Test Early and Often," "Playtests Reveal Defects," "SRP," and "The importance of Clean Code Principles." This is really where the team came together and recognized the value of what I had been preaching from the beginning: it is important to have an iterative design process that incorporated real end users, and having a solid technical foundation is necessary for being able to adapt to these changes. Unfortunately, by this point, the team was locked into their game design and software design, so there were not many significant design changes even where it would have been beneficial. However, it's important here to remember that ours are academic risks, and the fact the team learned these ideas before the end of the semester is a great win for studio-based immersive learning.

The team had a booth at Science Day at the Fairgrounds

I was disappointed that the team misconstrued the cutting of features as a team failure. Early in the semester, as I described in an earlier post, the team defined an ambitious set of features for the game. Of course, many of these were not included in the final game; more specifically, the features that were included were those with the highest priority based on our shared understanding of technical feasibility, the potential players, and the community partners. However, during the retrospective, the team described this outcome as a failure, as if cutting these features was bad or a sign of failure. It is good that the students recognized that some of these features revealed important cultural aspects of the Middle Mississippians, and that cutting these meant a lost opportunity; however, I think they failed to understand that this was normal in game development. All those features we described at the beginning of the semester were risks—unknowns. We never knew if they would actually teach what we wanted them to teach. By having a smaller, tighter game, I would argue that we have a better game. However, for the retrospective, I was more interested in seeing what they learned than in using the moment to teach any more about game development processes, so I left it alone.

Four team members at the 2013 Ball State University Student Symposium

I particularly wanted to push the students to consider the nature of commitment in a team where sprints were not completed and there was very little accountability for following the methodology. (As examples, the team "committed" to holding code reviews, but none were held; the team "committed" to Clean Code, but a majority of the code is in violation of these standards.) When I brought this up, I was surprised at the dissension. Some of them clearly understood my question: that the team had made commitments to each other and to me, and had not fulfilled them, and that this pattern became status quo—so what was learned? Others, however, argued that there was no problem, because the team was composed of amateurs, did not know how to estimate, and set their own goals: that is, in my nomenclature, they argued that these were not commitments at all despite being called them. I have as much trouble accepting this now as I did when it came up, in part because when I told them they were committing to user stories, I meant "committing" in the strict sense, not in a relativist sense. It became clear that this was something of a religious issue for some team members, and the best I could do end this debate was to suggest that the team learned, that day, that they still didn't agree on what the word "commitment" meant. They accepted this, but when I reviewed the board later, I was upset at how one of the students had articulated the outcome: the card reads, "Commitment is a fuzzy word." I suggest that it is not, and rather that the students have a fuzzy understanding of commitment, since they have so little experience with it. This raises a puzzling question for mentoring future projects: how will I identify and deal with future team members who don't share my definition?

Post-Retrospective

After this group exercise, I asked the students to write personal essays based in response to a prompt: if they were put in charge of a similarly-skilled team to make an educational game for the ISM, how would they do it? I was surprised that most of the essays assumed that the team would be undergraduates working for credit during a semester, when this was neither in the prompt nor my intention. The essays revealed a consensus that iterative development should be beneficial, and that getting digital prototypes into the hands of players and sponsors earlier is better.

Testing on the iPads

Several pushed for a longer production schedule, at least two semesters, so there would be more time to incorporate feedback from playtesting and the community partner, as well as to recover from design mistakes. It is notable that several of the students had been involved in two-semester Computer Science senior capstone projects, so they were not just guessing that more time would help—they had seen it for themselves. The Spring Studio had been designed as a six-credit experience, which I tried in part so that they would have time to revise their designs. However, after these fifteen weeks together, I think the students are right: the issue may have less to do with hours-per-day and more to do with calendar time. It reminds me of my frustrations with teaching five-week summer courses in programming: one may have the same number of contact hours as a fifteen-week semester, but there's just not enough calendar days to develop a sense of the craft.

One of the students pointed out that the team had agreed to use pair programming, at my encouragement, even though they had little experience with it and some didn't believe in it. Despite this consensus, most of them did not actually practice pair programming for the first several weeks. After significant frustration with the project, some began pair programming—lo and behold, they saw that they were more productive in pairs than individually! He asked me, then, how can a leader get the team to see the benefits of pair programming earlier, or is it the case that a team must fail first? I laughed and pointed out that I have been dealing with that exact problem for several years! It was good to see that several of the students' responses mentioned pair programming a particular practice that they would adopt in a follow-up project, which suggests to me that they collectively felt its value.

The poster shows the evolution of a UX design from sticky notes to digital prototype.
The laptop shows the final version.

The team members were novices in almost all areas of serious game design and development, and they had struggled considerably with two related concepts: prototyping and iterative development. As a whole, though they recognized that physical prototyping was faster and cheaper than digital development, the team had trouble seeing the connection between physical prototypes and digital prototypes. When they finally did produce a digital prototype, there was some confusion about how to effectively evaluate and how to integrate playtesting results into development. This was undoubtedly related to the team's troubles with iterative development, which has been challenging for every student team I have mentored. Student experiences generally come from trivial individual projects or very small scale team projects—projects with no clients besides the students and a professor. This particular team did not successfully complete any sprint: there were no sprints in which all the user stories were completed and a potentially shippable product was created. Notably, the team rejected sprint renegotiation, preferring an optimism that resulted in sprint failure. This is unique in my student teams, as I recall. However, I also stopped recommending renegotiation after they rejected it in early sprints, and I wonder if I should have come back to it again.

Testing an early paper prototype

By university regulation, I must assign grades to the students, and so I try to do so in the most fair and unobtrusive way that I can. Each student had a midsemester progress report meeting with me, during which I tried to give useful, private, individualized feedback. At the end of the project, I gave the whole team a grade based on their collective achievement, although individual grades varied based on peer commendations. Such grading is rather stressful for me, and it is also divorced from the day-to-day activity that I want to encourage. Yet, I tend to be conservative in my interventions as Grade-Giver, preferring to offer mentor's advice that the team can choose to follow or not. Indeed, the team chose not to follow my advice to either complete all the user stories or renegotiate, and this did not doom the project: the game is still acceptable to the client, despite their collective dismissal of my suggestions. I talked informally with one of the team members who happened to also be in my Fall game design colloquium—my first experiment in badge-based grading—about whether a badge-based approach would work here. What if there was a badge such as "Completed all stories in a sprint" or even "Renegotiated a failing sprint"? He agreed that it would have to be very carefully designed. The best outcome would be that the student see more clearly what kinds of behaviors I value and promote, in such a manner as might lead to a performance bonus in industry; the worst outcome would be that students lose autotelic focus on the project and start to game the system for points instead of project success.

Looking Forward

I am proud of the work of Purple Maize Productions. They overcame significant obstacles to produce an original educational game, using tools, technologies, and processes that were new to all of them. They emerged from the semester with powerful, significant reflections—evidence that the learning outcomes of the studio were met. This was the also first time that one of my student teams attempted a multiplayer game. Should future teams do the same, I have some great stories to share with them, and I will know to push the team much earlier and more forcefully toward an end-to-end prototype. Indeed, given the complexities of testing and the tight fifteen-week time limit, I may simply advise students against any networked multiplayer, sticking with hot-seat or single-player games.

To experiment with potential villager animations, the team projected the village view onto the board and traced it. Brilliant!

I intend to reuse both of the retrospective formats from the Spring Studio experience. I like how the iteration retrospectives are oriented toward concrete action items and will likely keep the format as-is. I am considering two changes to the semester retrospective format. First, I would have the students submit their personal reflections to me before the meeting. This would serve two purposes: it would demonstrate their importance to the students, since I asked to see them, and I would have more data with which to triangulate actual team learning. The other change is that, after modeling how to write a team learning statement, I would give the students a few minutes to write out their own before posting them. This should avoid the problem of individuals coming to the board without really knowing what to say, since it would already be written.

There are still some puzzles regarding Purple Maize Productions. For example, why was the team hesitant to seek my advice in my areas of expertise, or from Ronald Morris, my collaborator on this project? Why was the team not fulfilling its commitments to each other regarding retrospective action items? Why did they continue to underestimate tasks sprint after sprint? Fortunately, this studio was also used to conduct research on immersive learning, and one of my colleagues from the English department collected field notes and two interviews from the students. I look forward to digging into this data with him and seeing what stories they tell.

In reviewing the semester, including the students' final essays, I find myself wishing I could have been more proactive in helping team formation earlier in the semester. With several past teams, I have hosted game nights and pizza parties, which help people get to know each other. At the start of the Spring—the critical time for breaking the ice—I had a newborn who was not sleeping through the night, an 8AM class that had me ready for bed when some of my students were just hitting stride, and a case of the flu that had me in my bed for almost a week. I hope that I never again have to deal with these factors along with a team that does not have adequate face-to-face meeting times!

I don't know what this is, but I love coming into the studio and finding things like this.

The ideal relationship between me and teams like this one merits further reflection. How do I balance the roles of professor, mentor, coach, designer, developer, and researcher? This semester, I opted to emphasize mentor, in part because I knew I could not work with all the students due to scheduling conflicts. Children of the Sun is a fine game, but it's very much the students' design. I can't help but look at the fascinating constraints my student team was given and be a little jealous: I wish I had fifteen weeks to work on such a project! Would it be better if I ran the team more like a research group, using student labor to realize my own designs? It would be different, and I would argue that it would no longer be "immersive learning," which enshrines student-direction and faculty-mentoring in its definition. This gets quickly into issues of community partner versus client as well as teaching versus consulting. There is a significant difference between what I can do and what I can mentor students to do, but I worry sometimes that the immersive learning model is presented as if these are the same. Immersive learning is not about providing free or cheap consulting services: it is about welcoming external partners to become co-teachers and co-learners with us.

Writing this post helped me recognize a critical difference between this project and Morgan's Raid, another course where I had separate design and production courses. In Morgan's Raid, the initial prototyping was done in the Spring, and I was able to refine these over the Summer. This also led to my selection of an appropriate software architecture, choice of a stable and appropriate toolchain, and design of interventions to help students learn these. I was able to start Fall's production course by saying, "Here's what we're building, and here's how we will do it!" By contrast, the Spring Studio inherited an unrefined (indeed, unplayable) prototype from Fall's design colloquium, and they had to create the game design, choose the toolchain, and design the software architecture. I provided recommendations of course, but I delegated the decision-making authority and process to them. I do not doubt that the Spring Studio learned much more about these items than the Morgan's Raid team, but I am unconvinced that it was worth the time and attention. Given that it is logistically impossible for me to recruit a two-semester team at this point, I will take a more hands-on approach regarding the transition between the design course and production studio next time.

Speaking of which, I am pleased to report that I have received internal funding through the Provost's Immersive Learning Initiative to teach another series on serious game design and development next academic year. I will be teaching a game design colloquium in the Honors College in the Fall and leading another six-credit multidisciplinary development studio in Spring. We will be working with my friends at the Children's Museum of Indianapolis to develop an educational video game. This upcoming collaboration is one of the reasons it was worth more than a day's effort to write this post: I want to ensure that I emerge from this past academic year with a good understanding of what went well and what went poorly so that next year, we can be even more successful.

Almost everyone was there for the team photo. This is a photo of that photo.

Acknowledgements

I would be remiss if I did not acknowledge some of the people, offices, and agencies who made this project possible. Thanks go out to: Ronald Morris, who was co-mentor of the project; Ball State University's Computer Science Department, History Department, Honors College, Building Better Communities office, Provost's Immersive Learning Initiative, and Digital Corps; Motivate Our Minds; College Mentors; the Indiana State Museum; and Angel Mounds.

Monday, May 6, 2013

Classical Mechanics and Hot Wheels

My family went to the Children's Museum of Indianapolis last week to see the Super Heroes exhibit before it closed May 5. While there, I was also able to see the Gecko exhibit and of course, we stopped by the DinoSphere. We ended the day at the Hot Wheels exhibit, where there were numerous cars and ramps available. It reminded me of many happy days in my grandmother's basement, setting up orange plastic tracks and running cars down ramps, through loops, and into little green army men.

This particular portion had four ramps in a row, each with slightly different features, the closest one with a fantastic gap to be cleared (and an optional plastic shark to jump over or ram into).

As I was playing with my boys, we noticed that some cars made the jump consistently and others did not. Why would that be, I asked myself. Because some are heavier and will fall faster, I answered, and then very nearly walked on to the next exhibit.

Turns out, there's one little problem with this...

Once in the air, the cars will always fall at the same rate, no matter their weight. It's the faster cars, not the heavier ones, who will make the jump. My brain defaulted to the intuitive and wrong explanation of physics. I had to wonder, how many kids came to the wrong conclusion, and how many parents even explained it incorrectly to their kids, as I almost did? What was it that made me stop and recognize that I had fallen into the same misconception of gravity that I had as a child in my grandmother's basement?

It strikes me that this context—Hot Wheels cars racing down ramps—would be a great opportunity for a physics education intervention. Imagine, for example, a space where one would drop a heavy and light car at the same time and see which hits the ground first, and then tandem ramps where the same two could be raced.

I don't know if that would be an effective museum-based intervention for teaching physics or not, but it made me reflect on the misconceptions that people harbor toward computing. Whereas physics has almost thirty years of research on the Force Concept Inventory, we have nothing so well established in computer science. If we don't know what misconceptions people carry, it's hard to imagine designing interventions to overcome them!