Tuesday, April 19, 2022

Playing the Mob Programming RPG in CS222

I did something new in Monday's meetings of CS222 Advanced Programming: I had my students play Willem Larson's Mob Programming RPG. A colleague in the games industry turned me on to the idea of mob programming by way of the Mob Mentality Show.  More specifically, the RPG came up in this interview with Joe Justice about mobbing at Tesla. It was the first Mob Mentality Show that I listened to, and it was interesting to jump in with a discussion of how mobbing, which is usually associated with software development, was used for hardware development around electric cars.

Willem Larson was inspired to make a free PbtA-style RPG to teach Mob Programming. Now that scratches a lot of my itches! I encountered a few concerns while reading the rules, including the fact that the introductory script is very long: the amount of ideas presented orally to the team far exceeds what they can remember, especially when you consider that they also are handed three different role descriptions to process. There are certain parts of the instructions that appear incomplete or ambiguous, but rather than start by copy-editing and opening a pull request, I figured I should present the game as-is. Shuhari and all that.

I printed off all the playbooks, reviewed the rules, grabbed the handful of other required supplies, and set off to CS222 Section 1. I figured it was not feasible to plan on everyone sitting in a line, which is recommended in the rules, since I never know how many students are going to show up. I ended up using mobti.me, the first online mob timer I found when searching online for a web-based one. The rules recommend the people on the left and right of the Driver and Navigator help those two manage their XP acquisition, and so to adapt this to haphazard seating, I added the roles "Driver-Helper" and "Navigator-Helper" to the rotated roles.

We used FizzBuzz as our example. Before both classes and in accordance with the game recommendations, I set up separate starter projects. In both cases, I gave a failing unit test as follows:

@Test
public void testFizzBuzz() {
  FizzBuzz fizzBuzz = new FizzBuzz();
  String actual = fizzBuzz.valueOf(1);
  Assertions.assertEquals("1", actual);
}

Of course, without any implementation of FizzBuzz, the test does not compile and hence is "Red." The mobs then started with a failing unit test.

Section 1 proved to be a trial run. I presented the rules as written, answered all the questions, and let them go. Unfortunately, three problems quickly showed up. One was that I had entered the roles in an unhelpful order, which meant that players did not move as intended from Driver to Navigator. After seeing this go poorly, I understood why this sequence is preferable. The other problem was that I did not have the sound configured properly for the online timer. I assumed the problem was that the room's speakers were off (they were) or that my laptop volume was too low (it was). After doing this, though, we still didn't get sound, so I ended up running a parallel timer on my phone. The fact is I had forgotten that the audio buzzer is a configurable option in mobti.me, not a default. Finally, I also noticed that people started putting their badges up, but no one had called out the fact that they had earned XP. Yes, I had announced that when someone earns XP, they should say why; no, the students had no memory of this. It actually took me a while to recognize this error mode, and that put me in a sticky situation: should I interrupt and re-explain the rules? And if I interrupted, should I invalidate or keep the existing badges? I ended up taking the laissez faire approach, although in retrospect, I wish I hadn't.

By the end of one rotation, they had about six or eight badges on the wall, but none of them were authentically earned since no one called out their XP gains; crucially, this means that any student didn't know what valuable thing any other student had just done. Also, throughout the exercise, very few people in the mob participated. The Navigators never really did anything except, occasionally, ask for ideas. They mostly watched the Driver type in reaction to one or two voices in the mob as they talked through a proposed solution at the code level. When these people became Drivers, they just implemented their visions without any Navigator intervention, and as Navigators, they never interacted with the mob.

After the rotation, we had about fifteen minutes left in the class period, so we switched into discussion mode. I explained how I had come across the game, and that I had never played it, nor had I any personal experience in mob programming. I also admitted up front that I noticed them missing that important rule about claiming XP out loud, and I had held back on interrupting their flow. They seemed happy with my candor, and one student even thanked me afterward for running the game. When I asked for their comments, a student pointed out that it didn't feel right, that they didn't think there was a single point of failure, but that at the same time, the ideas weren't going smoothly from the mob to the driver. I pointed out that, if this was what was going wrong, doesn't that mean the Navigator was the single point of failure consistently? After a moment's hesitation, I think the class agreed, although the unwillingness to hold anyone accountable was palpable.

This led to good but brief discussion of the role of the Driver and the Navigator. I pointed out how what they learned in CS120 was, very likely, an antipattern: the Driver works and the Navigator watches. (Maybe we should call this the "Napigator Antipattern".) I pointed out how different the form is when the Navigator has to speak their ideas out loud in such a way that the Driver has to listen and translate it into code. Someone suggested that changing Navigators means that there is not a unified direction for the mob; I made the counterpoint that only changing the Navigator can mean that everyone understands the direction of the mob. Again, I got the sense that they understood it was more complex than it seemed at first blush, but it's hard to say if these particular ideas stuck with the students or not. (In writing this blog post, I did a little research, and it seems some people call it The Strong Technique of the Driver/Navigator pattern when the driver can only do what the navigator suggests. This gives me some nomenclature and resources for talking about how I expect my teams, and the teams in prerequisite courses, to work.)

I was more confident to start the game with Section 2. The same number of students showed up—eight—although more of them late for this section. (Tardiness always irritates me, but cases like this are the most egregious, when someone comes in and, while I'm speaking, loudly asks a friend, "What are we doing?") This time, when explaining the game, I made it clear and explicit, repeating the point that you can only mark the XP if you tell the mob that you earned it. During the pre-game Q&A, students twice asked questions about how they know if they have earned the XP, for cases such as "Listening on the edge of your seat." The answer was easy, and I repeated it verbatim from the instructions: it's up to you to decide.

This section launched right into action, and they clearly were enjoying themselves. Occasionally, someone would say with a smile, "I am listening on the edge of my seat!" There was much more clarity about the roles, as the Drivers acknowledged getting XP for ignoring directions from the mob that didn't come through the navigator, typed things they disagreed with, and asked clarifying questions. Indeed, I think it was in large part the diligent driving that held the Navigators to high standards. Of course, the having the right sequence in place helped too, with the Drivers moving into the Navigator roles before returning to the Mob. For example, a Driver who thought they were typing toward a dead end could, as Navigator, direct the next Driver how to fix it.

There was a lot more excitement and energy in the second section, with people moving around, taping up badges, jotting ideas on the whiteboards, and laughing as they called out their XP earnings. One emergent problem though was that, as roles changed, the new people would wait for things to settle down before starting the next round. That is, a round would end, conversation would stop, and folks would be moving around, cutting out badges, looking at higher level roles, and so on. Even though I encouraged them to just get started, stating that the mob was simply mobbing, the students consistently wanted to wait until everyone was settled and facing front before starting the timer. I wonder if this was because they wanted it to look like a class discussion rather than, well, a mob. This had the effect of significantly interrupting the flow,  killing the energy. It also had the practical effect of using up all our time: so much time was wasted during role rotation that we only had about two minutes for discussion at the end. Fortunately, the students stayed about two minutes after to wrap up the discussion. They also were very helpful in cleaning up the room, for which I am grateful, since we had a colloquium getting set up in the same room for the next hour.

Still, Section 2 did quite well, and with a final score of 31.


During our brief discussion, the students commented that one of the biggest problems with the game is that they spent most of their time looking at their role sheets to figure out how to get points rather than solving the problem before them. This is a serious game design problem, by which I mean it is a serious problem and it is a problem with a serious game. The mechanics of the game here are working against, rather than with, the learning objectives. Indeed, it tempts me to mold what Larsen has so generously provided from his intuition into something that is more rooted in theories of teaching and learning. 

Another student made a good observation about a hesitancy to interrupt someone else's flow. They said that in their final project team, they often felt like others knew what they wanted to do, and so the inclination was to not interrupt even if the direction was not understood. "I don't want to interrupt, and I'll figure it out later," is how the reasoning goes. Spoiler alert: later never came. This allowed us, as the previous section has done, to discuss the Navigator and the fascinating wisdom of rotating that role through the whole team. If everyone on the team can take a turn explaining where the team is going, then everyone understands it, and that means everyone is rowing in the same direction.

This leads me to the most startling observation of the exercise: my students did not come near to solving the problem. I've heard reports and studies about FizzBuzz is still an effective filter since most people who apply for a programming job cannot solve it. Surely, I thought in my naïveté, my students would be able to deploy TDD to come up with at least most of a solution. It was not the case.

Again, we must begin with a disclaimer that they were working under the distractions of the game. They were told, specifically, to try to get as many points as possible. In theory, the points should be earned while moving toward a solution to FizzBuzz, but especially in Section 2, many points were earned by doing the corresponding actions in a vacuous or even antiproductive manner. It's possible that their working memories were completely overloaded in thinking about the game rather than the problem. 

That said, I don't think this explains away all the different error modes that I observed, which suggested more fundamental problems. Both sections showed an alarming lack of rigor around TDD, a practice that is supposed to be a focal point of our work this semester. They are all supposed to be doing Beck-style Test-Driven Development throughout their nine-week projects. Judging from their submitted artifacts, many appear to be doing so. During the game, though, nobody wrote out a plan of test cases that would lead to success: they all started in with coding without any real plan. Indeed, despite the instructions specifically saying that Mob Programming should feel like a bulldozer rather than a race car, the game fought against this: the only way to get points fast is to work fast, so my students throw discipline out the window. I only noted one instance in which a student called out for a failing test to be written before production code. Otherwise, it was a kind of erratic hopping between writing tests and writing production code, with refactoring never really touched. Indeed, in one of the sections, a student said, out loud, in front of me, "Let's just skip refactoring and do it later." Spoiler alert: later never came. However, the horror of saying such a thing in front of the professor who has been proclaiming the importance of refactoring for 13 weeks also points to something subtle and important: I believe I was witnessing authentic student practice. That is, even though I was sitting in the back row, saying nothing, I got the sense that they were not performing any show for my benefit. I was seeing how they really work.

Common definitions of the Driver/Navigator pattern specify that the Navigator should be describing an approach, and the driver turns that into code. In the RPG, it is presented more bluntly: "... [T]he Driver’s job is to type what the Navigator instructs them to type. The Navigator’s job is to sift the ideas of the mob and instruct the Driver what to type." My students took this very literally: the Navigators told the Drivers what to type, keystroke by keystroke. Indeed, one student typed in exactly what the Navigator said, even though it was clearly a syntax error, and then claimed an XP for typing something they disagreed with. The clear problem with this literal interpretation is that it keeps all of the discourse at the level of keystrokes rather than ideas. No wonder they could not solve the problem if they were trying to do so at the level of parentheses! In other cases, however, it wasn't clear if students were being overly literal or if they really just couldn't translate an implementation idea to code. For example, one student said, "We need an 'if' statement: if 'i' is 3." The Driver then keyed in "if (i is 3)". It took over ten seconds for the Navigator to explain that what they meant was "equals equals." Notice here that they did not mean "equals equals" at all. They meant "i is 3," but that turns into "i==3" in languages like Java. This was a breakdown caused by the inability of the students to have discourse about the code at a reasonable level. Not knowing the names for punctuation symbols used throughout programming languages didn't help them either. It seems like a case where the masters have been lecturing them about the Parthenon, but they should have been lecturing about the optative.

I had hoped that my initial failing unit test would push the students to see a clear model-view separation in the problem, but that didn't happen either. I thought it would be obvious from our various class discussions that there is a model that computes the answer and a view that prints it out. The first group, after getting the initial test to pass, agreed to change the test case itself so that FizzBuzz's valueOf method should return an integer. I am still uncertain how they thought this would help. Truly, I cannot get my mind into a state where I can see this as useful, neither the specific change nor the idea that the test I would give them is somehow flawed at the level of static typing. Their implementation of valueOf just ended up being to return the passed-in value. They then added a second method, called isDivisible, that took an int and returned the String that should be printed for it. If you have any experience programming, this should boggle your mind, yet my students happily moved forward with this. Suffice it to say that the rest of their code made no real progress toward a solution, not in any deliberate, structured, or sensible way. If they were ever to get to a solution, it would have been essentially by chance.

In the second section, they started more sensibly, getting the first case to pass in the obvious way and then adding a test for the third word being "Fizz." In doing so, they (seemingly unknowingly) removed the code that solved the first test. However, when they went to run the unit tests, they only ran the new one, the case for three, which passed. I saw a clear regression defect, but they did not, as they immediately went on to writing the code for the "Buzz" case—without having run all the tests, without refactoring, and without writing a failing test first. At some point, several minutes later, someone in the mob saw that only one test was being run, and they instructed the Navigator to ensure all the tests were run. This, then, caught the regression. I suspect that the student who noted the oversight would have noticed it earlier had they not been face-down in role sheets. Of course, in a semester where we learned "Always run all the tests" on the second day of class, I hoped that this knowledge would be more widely distributed, but this seems not to be the case.

Perhaps the most important thing I gained from the Mob Programming RPG were the insights into how my students think and work. Techniques that I've shown them and required on their projects for months are clearly not part of their normal operating mode. That suggests that there really are a few anchor people who are ensuring the right things get submitted on a team, but that this is not being done in a way that the ideas are being disseminated. During the game, I saw how some strong voices could derail the whole team, moving in directions that were not just contrary to our principles, but not even syntactically valid. It makes me want to pull Mob Programming earlier into the semester, although I recognize a logistical problem with that: we're at a point now where about half the class does not show up and a few students have withdrawn. What started as 20-some students in the room has turned into about eight who show up regularly. We could not possibly finish a whole mob cycle with 20 in one class session, which means I would have to split it over multiple days and then deal with absences from the rotation. Mobti.me would not support this, but of course, I'm already imagining whipping up a more robust replacement in Flutter. I need to think about how to deal with this since the benefits of the exercise certainly seem worth the team, even if just for the discussion and my observation of real student practice.

Making my own Mob Programming RPG sounds like a great scholarly endeavor. I think using PbtA playbooks is a clever idea, but the mental load on my students was clearly too much. I am sure that professionals would approach the game differently than undergraduates. Anyone with a few years under their belts would be more adept at programming, communication, and coordination. Crucially, they would also be more familiar with holding each other accountable. My students are trained, well before they get to me, that they should never criticize their peers for anything they do or say. That is completely self-destructive for collaborative software development, of course, but my pointing that out does not change a cultural phenomenon. Students are also swimming in a sea of points, and so if they see that a particular action earns points, they will pursue it regardless of prudence. A good example is in Larsen's Rear Admiral role, in which a player can earn XP for speaking quietly in the Navigator's ear. Twice in the second section, students walked up to the Navigator, whispered something, and walked away, claiming XP. Yet, although what was whispered was actually good advice at the time, they did not regard whether or not the Navigator was in a position to hear or understand, and what the Rear Admiral had to say ought to have just been said aloud as part of the mob. My own hypothetical version of a Mob Programming RPG could capture not just particular learning outcomes of my curriculum, but it could also account for the spaces, headcounts, and population that I deal with regularly.

Looking to the immediate future, I would like both sections of the class to have the opportunity to deploy mobbing to actually solve the FizzBuzz problem. I plan to use another class period this week to repeat the exercise but without the game. That is, we can just do "regular mobbing", using the mob timer with a Driver and Navigator, and see if we can make progress. We will have to start again from scratch since the code from the end of each session is not worth saving. I am concerned that Section 1 may not have seen enough of the practical execution of mobbing, but I think I will try to keep the two exercises in sync rather than allow the two sections to diverge. I will need to start by pointing out the difference between the Navigator telling the Driver what to type vs. describing an approach that the Driver turns into code. Also, I will tighten up role transitions by ensuring that everyone knows the sequence of roles, either by finding a wireless keyboard to pass down a literal line or by copying the name sequence to the board so it's not hidden in a browser window. I am tempted to remind them about Red-Green-Refactor and model-view separation; my hope that they simply remember these may be in vain.

I know that we are all creatures of habit, and I do not expect that any team will make radical changes to their practices in the last two weeks of a nine week project. However, I hope that these exercises will get them thinking critically about their team experience, giving them something concrete to compare it to, and give them a vision for what might be possible. The role of this course in the curriculum is to prepare students for team- and project-oriented upper-division courses, and so I am eager to see how I can leverage this exercise myself as I start thinking about Fall.

This story continues in the next post.

No comments:

Post a Comment