Paul Gestwicki's Blog: xml

Showing posts with label xml. Show all posts

Tuesday, October 1, 2013

Two-Week Project Showcase

The focus of my sophomore-level Advanced Programming course (CS222) is a large project at the end of the semester. In the past, this has been a six-week project, delivered in two three-week iterations. It is a project of the students' own invention: they have to pitch a project to satisfy a set of constraints I provide. This semester, I decided to expand the project to nine weeks with three iterations. This goes hand-in-glove with the revised, achievement-oriented assessment model that I am using this semester.

Most students have never worked in a programming team prior to CS222, much less defined their own projects. To warm them up, I give a two-week programming project before starting the big project. The two-week project is done in pairs, and I provide the requirements analysis as user stories. This provides a model for them to follow when they pitch their own projects using user stories.

This semester, I gave a new two-week project that was inspired by my 2013 Global Game Jam project. The students were challenged to write an application that takes a word from the user and then hits Wikipedia's API to determine the last modified date of the corresponding page. The curriculum provides no prior exposure to Web programming or networking, and so I provided a very short sample program that demonstrated how to open a URL in Java and read the stream. This project touches on some very important ideas, including Web services and structured data.

In the past, I have evaluated the two-week project in a rather conventional way: I provide a grading rubric at the start of the project, then I check out students' projects from the department's Mercurial server and give each pair a grade. I wanted to do it differently this semester, in part because of the achievement-oriented assessment. The two-week project provides a vehicle for students to earn achievements and write reflections: I'm not evaluating the project itself but rather how students use it to explore course concepts as articulated through the achievements and essential questions.

I decided to devote a class day at the end of the two-week project to a showcase. Each pair had to have a machine running a demo alongside a summary poster. We rearranged the classroom, moving all the tables to the perimeter and clustering the extras in the center. In order to foster peer interaction, I printed some sheets whereby students could vote on which team had the best UI, the best poster design, the cleanest code, and the most robust input handling.

The students enjoyed this format. There was an energy in the room as the students explored each others' solutions, talking about implementation strategies and UI decisions. A few students had trouble getting their projects working at all, and I heard one student say how disappointed he was, because it left his team unable to fully participate in the activity. This represents positive peer pressure and project orientation, which can be starkly contrasted against instructor pressure and grading-orientation.

I had recommended two strategies in class: using Joda Time to handle time formatting and using Java's DOM parser to deal with Wikipedia's output. I was surprised to see that almost every team used Joda Time (and used it to earn the Third Party Librarian achievement) but only one team used DOM. Every other team read the output stream as a single string and then searched it using Java's String class. This provided an excellent opportunity to teach about input validation. My sample Wikipedia URL queried the "soup" page for its last modified time, and the result looks like this:

<?xml version="1.0"?>
<api>
  <query-continue>
    <revisions rvcontinue="574699285" />
  </query-continue>
  <warnings>
    <revisions xml:space="preserve">Action 'rollback' is not allowed for the current user</revisions>
  </warnings>
  <query>
    <normalized>
      <n from="soup" to="Soup" />
    </normalized>
    <pages>
      <page pageid="19651298" ns="0" title="Soup">
    <revisions>
      <rev timestamp="2013-09-27T05:20:10Z" />
    </revisions>
      </page>
    </pages>
  </query>
</api>

Keep in mind that this is coming in as one continuous stream without linebreaks. Aside from the one group that did an appropriate and simple DOM lookup, students used String#indexOf(String) to search for "timestamp=", and then did manual string parsing using that reference point. This approach works for most cases, but it opens the application up to an attack that I'll explain in the next paragraph, giving the reflective reader a moment to consider it.

If you ask the application for the last modified info of the Wikipedia page "timestamp=", you get a phenomenon similar to an SQL injection attack: the indexOf operation picks up an unintended location, and the manual string manipulations fail. I had seen this when meeting with a pair the previous week who were working on their Advice Seeker achievement. They had thought their solution to be rock solid, and they were appropriately excited when I showed them how to crash it. They became my covert hitmen during the showcase, crashing solution after solution by finding holes in string parsing logic. So, while few students took the opportunity to learn XML parsing in the two-week project, maybe they learned something even better: the embarrassment of doing it the lazy way and seeing your system go down in public!

When I explained to the students what were doing at the showcase, I had expected teams to show up at their stations when I came by. However, it seems they were so excited to see each others' work that they didn't think about this. Since not every team had a person at their station when I came around, I was not able to give my expert evaluation to each group, nor was I able to model for the students how to give critical feedback. On the other hand, the students got a lot of peer feedback and I was able to meld into the group, becoming just one of many people interested in seeing demonstrations and code. I am not yet sure if I would do this part differently next time or not.

One aspect that is still unclear to me is the extent to which students were working for intrinsic motivation versus extrinsic reward. I was approached after class by a student from one of the teams whose solution was not working. During the hour, they had talked to other students and realized what they did wrong, which is an activity I certainly want to foster. The student asked, clearly in a state of agitation, if his group could fix their application even though it was due to be completed the previous night. I confirmed that this would be fine, and the student went away expressing joy and thanksgiving. I suspect his perspective was that he had just been given an opportunity to save his grade. What I really gave him was an opportunity to make his project work and feel good about getting it done, even if a bit late. I don't think he realized that there's no entry in the course grading policy for the two-week project, that the whole thing was just a fun context for us all to play and learn together. I hope that when he figures this out, he sees this as a reflective learning opportunity and not simply smoke and mirrors.

In conclusion, I am very happy with the showcase format. It was definitely worth using a class meeting for this event. I think this two-week project was particularly well-suited to the showcase format since it's fairly small, permits multiple solutions, helps students build better understandings of the modern computing ecosystem, and can have interesting failure states. Perhaps next time around I need to add an achievement related to XML parsing, since this seemed to promote students' use of Joda Time quite well.

(I have some nice pictures of the showcase that I took while standing on top of the teaching station, but I feel like I can't post them here without my students' permission. Sorry.)

Thursday, May 26, 2011

JSON vs XML for data representation in GWT

(This is Part 1 of a series. See also Part 2.)

In a few weeks, I will be working with my Morgan's Raid collaborator Ron Morris (History), Mark Groover from Anthropology, and about ten undergraduate students to develop a prototypical digital archaeology simulation. The intent of the project is to create a technological tool to teach 4th-graders about historical archaeology, that it is a scientific process and more than just a dig. I will not dwell on the project design here, but rather I would like to share some of my experimentations with JSON, XML, and GWT.

We want to have the widest possible adoption, and since the expected interactions are fairly simple, making this a Web application seemed best. GWT stands out as an excellent candidate technology since it handles cross-browser issues better than any of our team can. Flash would be a contender, of course, but I ruled it out early since I don't know enough Flash/Actionscript to be confident in leading the team in an appropriate software architecture.

We will only have five weeks to develop the prototype, and the team will include a technical team (mostly Computer Science majors and minors) and a content team (mostly Anthropology and History majors). Unlike Morgan's Raid, in which we could create content directly in Java since the designers were also developers, this project would benefit from an intermediary domain-specific language that the designers can use. An architecture I had kicked around in my head involved using Ruby to define an internal DSL that would be processed in GWT. However, it's not possible to put JRuby into a GWT project since the GWT compiler would try to transform the whole kit and kaboodle into Javascript, and that's not possible or even sensible. I'll just have to find another project in which to experiment with Ruby DSLs; for now it's off the table.

Javascript

Considering a data representation layer, I started by considering how Javascript could be used directly for configuration-based programming. That is, the technical team could write an interpreter on top of Javascript data created by the content team. I played with JSON last Summer when working with the Wave API, and it seemed like a nice way to represent domain objects. Overlay types provide a well-documented approach for using JSON to represent domain objects in GWT. As a proof of concept, I whipped up a simple message wrapper. The interface looks like this:


@SingleJsoImpl(LittleMessageImpl.class)
public interface LittleMessage {

    /**
     * Get the text of this message.
     * 
     * @return message text
     */
    public String text();
}

The annotation references another class that is shown below. Note that Google's documentation, as of this writing, does not make this relationship clear, but a little tinkering with the annotation value revealed the proper approach.


public class LittleMessageImpl extends JavaScriptObject { 

    /**
     * Create a {@link LittleMessageImpl} from JSON text.
     * @param json the JSON string
     * @return new message object
     */
    public static final native LittleMessageImpl buildMessageFromJSON(String json) /*-{
        return eval('(' + json + ')');
    }-*/;
    
    // Required for the GWT compilation process
    protected LittleMessageImpl() {}
    
    /**
     * @return the text of this message
     */
    public final native String text() /*-{
        return $wnd.checkNotNull(this.text);
    }-*/;

}

The two important bits are the factory method, which uses native Javascript to evaluate a JSON string into an object, and the Javascript native implementation of the text() method.

In debugging this program, I encountered a problem where I had called the field by two different names, and of course because Javascript is dynamically typed, I had no compiler support to detect this. To ease debugging, I wrote a nigh-trivial Javascript method, checkNotNull, that behaves like Preconditions.checkNotNull from Guava: if the argument is not null, it is returned, and if it is null, it bombs out. I put this method into a file called preconditions.js that lives in the "public" folder of my project. Making this work took some trial and error, but the crux of it is this: if your application's root package is com.example, then you can put a folder under com/example/public and put resources there, and these resources will be loaded prior to your GWT applications' execution. This is defined in the documentation, although I had a hard time extracting the previous sentence's meaning from what was given.

Coming back to LittleMessageImpl, the json itself sits right in the application: it is not the result of a request to a server, as JSON is frequently used. I made a file message.json that sits in my com.example.client package, and it is referenced as a text resource in my JsonResources class:

public interface JsonResources extends ClientBundle {
    
    static final JsonResources INSTANCE = GWT.create(JsonResources.class);
    
    @Source("message.json")
    TextResource message();

}

Now, to create my LittleMessage, I need only do this:

LittleMessageImpl message = LittleMessageImpl
                .buildMessageFromJSON(JsonResources.INSTANCE.message()
                        .getText());

Like many things, it's easy once you know how.

This was a helpful process for learning how GWT works, but I encountered a problem soon afterwards that perhaps I should have foreseen. One of the actual problems I want to solve with GWT is to have an image that has "hotspots" where, when they are moused over, something happens. This is trivially done in GWT using Image and MouseMoveHandler. The problem, however, is granularity of representation. I would like to be able to represent the image as a composite of hotspots that define geometric regions and actions. I could structure the whole thing as a JSON object, but in Java, I want it broken down into pieces, an InteractiveImage holding zero or more Hotspots, for example. This is where my knowledge of Javascript breaks down. It's not clear to me how I could take a JSON string and, with the same kind of elegance as simply evaling it, end up with a beautiful composite object. Two-pass parsing is of course an option, where the first pass makes a "dumb" Java representation that is then converted into a better domain model, but then you lose the elegance of the overlay types.

From here, I considered diving into Javascript, JSON, and GWT a bit more deeply, but I decided instead to go on a tangent. It's all experimentation, after all. Why not try that nasty old de facto standard for Web application data representation, XML?

XML

GWT has good XML support. It has to. Plus, the XML parser looks exactly like Java's XML parser that I've used a good many times. I had built some confidence from my Javascript experiments and decided to try to solve a real problem this time. My data representation in XML looks like this:

<place>
    <img src="Jellyfish.jpg" />
    <hotspot>
        <rect x1="10" x2="50" y1="10" y2="50">
        <text>You found it!</text>
    </rect></hotspot>
</place>
</code>

I happened to be on Windows when writing this, so I opened up my sample pictures folder and found a nice jellyfish image to use. (Why was I in Windows? To minimize time between experiments and Witcher 2.) Like message.json, I put this file—demo.xml—right in my com.example.client folder, and I load it as a text resource analogously.

Without overlay types, I need to do my own XML parsing. I took what I consider to be a standard approach, making a domain object called MouseablePlace that extends Composite and gave it a public static parseXML(Document) method. This then does rather mundane building of a MouseablePlace from the configuration data, but the result is a beautiful MouseablePlace which contains a series of Hotspot objects, each of which contains a Rect, and these are pulled from the XML via ad hoc recursive descent parsing. To the MouseablePlace is attached a MouseMoveHandler that checks whether the mouse is in any of the Hotspots, and if so, the Hotspot's message is shown in a status label on the screen.

Next steps

Both XML and JSON are contenders to represent the domain objects in this application. For that matter, we could also use an internal DSL in Java, but I would rather keep the learning curve as low as possible for the content team. Whether XML or JSON is more sensible to non-programmers, I have no idea. If we had nore time, we would wrap the whole thing in an editor, but we do not have that luxury.

If you have any experience or suggestions, please feel free to share them in the comments.

[ADDENDUM]
Go to Part 2, which covers my experimentation with AutoBeans.