Tuesday, December 10, 2019

Reflecting on the Fall 2019 CS315 Game Programming course

My students are currently taking their final exam, so this seems like a good time to start my end-of-semester blog post about CS315 Game Programming. This was my third Fall semester teaching this upper-level elective course using Unreal Engine 4. The semester ended up consisting of four "mini-projects", each about two weeks, and one six-week final project, which was completed in two iterations. By and large, I am happy with how the semester went: students learned how to work with some contemporary tools, including Perforce Helix for centralized version control, and they made interesting final projects. What I want to document here are some of the struggles, because it will do me more good when planning next year's class than focusing on the successes.

It turns out that almost all of my frustrations from the semester stem from my decision to use Specifications Grading again. For people who are not familiar, you can easily hop over to the course plan's projects page to see what they are. Briefly, I laid out all the criteria for which student work would be evaluated ahead of time, and like last year, I asked students to submit self-evaluations in which they graded their own work.

This leads quickly into the first problem: students did not seem to understand how to use checklists. It feels so strange to even type that, but it's true. As part of their submission, the students had to complete a checklist, and then based on what was satisfied, they could know—and had to say—what their grade would be. However, more often than not, I would read through the student's submission and have to point out that they didn't actually satisfy some criteria. I designed a little leniency for students who legitimately did not understand a criterion or two, but what I didn't expect was that several students made the same mistakes again and again and again. I forced the students to rotate partners during the Mini-Projects, thinking that this would ensure that mistakes would be caught by the partner; instead, what I saw was that the misunderstanding (not the understanding!) spread to new partners.

I suspect that a major reason for the checklist problem is that students are so deeply brainwashed into the "turn this in and hope for points" model that they cannot conceive of an alternative. Certainly, in my years of teaching, I've had plenty of push-back on unconventional things I do. (I continue to do unconventional things, partially because I want students to learn to question conventions.) I can work on clarifying the language around the specifications themselves of course, but I feel like this is treating a symptom rather than a cause.

There is one place where my instantiation of specifications grading contrasts, as I recall, against the presentation in Nilson's well-known work. She describes making a choice between more hurdles and higher hurdles, but my version of specifications grading is both more hurdles and higher hurdles: students have to do more and better work to earn higher grades. This is sensible to me, but I wanted to mention it here because it is a lever that I could pull in an experimental assignment or section.

Another problem I encountered with the specifications grading this semester was that a minority of students were able to follow the specifications I provided to earn relatively high marks but, in my professional opinion, without really meeting the learning objectives. For example, I had a B-level criterion which was, basically, that the project should have all the parts of a conventional video game: a title screen, gameplay, an ending, and the ability to play again. An alarming number of teams did not handle mouse input controls properly, so that once you click in the game, the mouse is captured and the cursor made invisible. This means that technically you can still navigate a UI menu, but without being able to see the cursor, it's awfully difficult. Their conventional solution seemed to be to use Ctrl-F1 to release the cursor so they could see it again: an editor kludge for a runtime problem. Did such teams satisfy the criterion? Well, yes, but also no. I liberally allowed it, leaving notes in my review that they should fix this, which almost nobody did. I could, of course, add text to the already-wordy criterion to say "If you are developing for a Desktop application, and you are using mouse navigation, make sure etc." That's just one special case of a particular environment, though. What I think I'm really running in to is the problem if specifications grading in the face of creative, wide-open projects.

Several students took my examples wholesale, brought them into their projects, and then submitted them as satisfying the relevant criteria. For example, I showed C++ code to count how many shots a character fired; student teams put this into their game and then checked the box saying that they included C++ code. Technically yes, but without any semblance of understanding. Another student took my dynamic material instance example wholesale and put it into his final project. Again, no indication of understanding the pieces, just copying and pasting it into his project and claiming that he included dynamic material instances. Yes, they're in there; no, there's no evidence of understanding. Some of this could, in theory, be cleaned up by changing the specifications, but then it gets into the same kind of problem as measuring productivity in programming. Exactly how different from my example does a student's work have to be to demonstrate that they understand the comments? "Exactly" is the key word here if the specifications are going to be objective.

I'm left with this sinking feeling that specifications grading are not worth the effort and that I should return to my tried and true minority opinion on grading: use triage grading for everything. This allows me freedom to say something like this: "Use dynamic material instances in your project in a way that shows you understand them." Then, I can fall back on saying that a student's work either clearly shows this (3/3 points), clearly doesn't (1/3 points), or is somewhere in between (2/3) points. This clear and coarse-grained numeric feedback can be combined with precise and crystal-clear written feedback to show students where they need more work, which appeals to me much more than my grimacing at the student's submitted checklist, then at the student's code, and then saying, "Yeah, I guess."

5 comments:

  1. Paul Graham just published an essay about how students often end up approaching formal education incorrectly ( http://www.paulgraham.com/lesson.html ):

    “I knew of course when I was a student that studying for a test is far from identical with actual learning. At the very least, you don't retain knowledge you cram into your head the night before an exam. But the problem is worse than that. The real problem is that most tests don't come close to measuring what they're supposed to. ...

    “Suppose you're taking a class on medieval history. ... The final exam is supposed to be a test of your knowledge of medieval history. ... So if you have a couple days between now and the exam, surely the best way to spend the time ... is to read the best books you can find about medieval history. Then you'll know a lot about it, and do well on the exam.

    “No, no, no, experienced students are saying to themselves. If you merely read good books on medieval history, most of the stuff you learned wouldn't be on the test. It's not good books you want to read, but the lecture notes and assigned reading in this class. And even most of that you can ignore, because you only have to worry about the sort of thing that could turn up as a test question. You're looking for sharply-defined chunks of information. ...

    “Getting a good grade in a class on x is so different from learning a lot about x that you have to choose one or the other, and you can't blame students if they choose grades. Everyone judges them by their grades —graduate programs, employers, scholarships, even their own parents.”

    I’m sure that as a professor you already share many of the same complaints. I’m also sure that your unconventional approaches are meant to address them. I’m not sure what can actually be done, but I suspect you would enjoy the essay.

    ReplyDelete
    Replies
    1. Yeah, Paul Graham hits the nail on the head. Thanks for sharing that essay. One one hand, I like to imagine a medieval history class that rewards learning medieval history, but on the other hand, I know it would be swimming upstream. Students are woefully unprepared in the art of learning and expressing knowledge. Graham is also right that it's hard to blame them; they are something like victims of the system. It's possible I'm just getting old and grumpy, but I swear it's worse now than when I started teaching, and I think it's related to even more emphasis on standardized and high-stakes testing.

      Thanks for reading, and thanks for sharing that essay!

      Delete
  2. The timing just happened to be perfect.

    I hadn’t thought of it before: I’m a programmer at a large bank, and while I’m happy we have a decent library of self-directed courses, their assessments are worse than what Graham complained about. I recently tested out of three Agile training courses, and the tests were full of “sharply-defined chunks of [irrelevant] information.” Instead of asking about, say, what would be off-topic for a daily standup, it asked things like “which came first: scrum or kanban?”

    I know it’s a low bar, but you are doing better than corporate training programs. And you, especially, are actually making an effort to conduct the class as a real game development project.

    ReplyDelete
  3. While I understand a strict definition of specifications grading to be a binary choice (complete/incomplete), I believe there is provision for adding a middle ground between the binary options (complete/almost complete/incomplete). Maybe that is what is needed on (some aspects of) projects.

    ReplyDelete
    Replies
    1. I had thought about adding "stars" as I've seen done with digital badges. For example, one star for copying my example code, two stars for coming up with something original. Then, grades can be determined by the number of stars achieved. At that point, though, seems like you may as well just use a traditional rubric.

      Delete