Tuesday, March 31, 2009

REST in Django

So I'm now officially a Google Summer of Code mentor for the Django open source group! w00t! Now all I have to do is pull off all those cool ideas I came up with earlier, as well as have a group of open source developers all agree that they are as cool as I think they are (which could be easier said than done).

While brushing teeth this morning, was thinking "I wonder what sort of empirical study I could pull out of this situation? What sort of research is there to be done in the field of REST? What can I learn, for academic purposes, not my own, from this experience?"

Ideas? I'm going to mull it over at the gym.

Tuesday, March 24, 2009

Google and Space and Time


Has Google finally solved that whole space-time-bendy problem? Observe:

Sunday, March 22, 2009

Reading Last Week

Seaman: Qualitative Methods
  • Description of ways in which qualitative methods can be used in conjunction with a positivist stance. Tools include observational studies and interviews.
  • Interesting tradeoff: amount of data collected in an interview vs. amount of interviewee's time used vs. amount of direction in interview. Often, the importance/implication of the data collected isn't known for a long time after the interview.
  • When conducting an interview, stress that it is not an evaluation. There are no 'right' or 'wrong' answers.
Cohen: Statistical Power Analysis
  • Examines the importance of analyzing and reporting the power of a statistical relation in empirical research.
  • Author proposes that a sound target power be 0.80, as it produces feasible sample sizes for given values of alpha and ES.
  • Not a very understandable piece of literature, at least from my perspective.

Rosenthan and MiMatteo: Meta-Analysis: Recent Developments in Quantitative Methods for Literature Reviews
  • A good introduction to meta-analysis (that is, analyzing the results of many studies/experiments to determine h0, instead of directly testing subjects to prove h0).
  • Interesting points about making sure the studies in your meta-analysis are independent (if meta-analyzing multiple studies from the same research group, subjects and or data may be reused, and so the results may be overlapping).
  • Also interesting discussion of inherent bias in meta-analysis, arising in the form in which the experimentalists choose to include/exclude studies from their sample space.

Card: Ender's Game
  • Enjoyable novel about a young boy who is called upon to train as a military commander to protect the earth from the threat of alien invasion.
  • Good character development, although the author lays on the bloodlust and homo-eroticism a bit thick.
  • As a consequence, the sci-fi aspects of the story seemed secondary to the character plot, even tacked on in some places.
  • Generally, I liked it, but probably wouldn't invest the time to read the 5 or 6 sequels

Modeling Topic

After a couple months of soul searching, the students enrolled in John Mylopoulos' Conceptual Modeling course are narrowing down ideas on what to model. The requirements of the project (to 'model something') left it fairly open, but perhaps too open to be immediately tractable. However, Michalis, Alicia, and I are closing the gap. I'm leaning toward modeling certain aspects of the REST web service framework I've been working on for the last little while in CSC 2125.

Obviously, system descriptions and class diagrams are uninteresting, as John pointed out earlier in the term, but modeling the requirements of what such a system should do are not. I'm looking into Tropos, but not really sure if it applies. However, we can probably do a goal model illustrating the motivation and principles of REST, contrasted with those of ws*. Also, a use case diagram describing the interface to a general ROA service could be created. Finally, we could model certain pieces of important logic, such as a delayed get, using a description logic syntax.

Reading Last Week

Sim et. al: Using Benchmarking to Advance Research: A Challenge to Software Engineering
  • Argues the merits of creating benchmarks in software engineering as an exercise to strengthen the community and promote advancement, using the reverse engineering community as an example.

Lau: Towards a framework for action research in information systems studies
  • Proposes a framework with which action research efforts can be categorized and evaluated.
  • Describes Action Research as an iterative process, in which a researcher introduces a small change, observes its effect, and uses it as input to the next small change.
  • Reminded me of Agile. I wonder if there are any other lessons from Agile that we can apply to Action Research?

Taipale and Smolander: Improving Software Testing by Observing Practice
  • Case study conducted to shake out some ways of improving software testing, where it is deemed to be lacking. Methods used include subject interviews and grounded theory.
  • Authors found that testing practices were most strongly correlated to business processes.
  • Thought this could lend some insight into how to observe testers at work (ie. as they write tests). No such luck, though. All recommendations for improving testing had to do with business process alterations/improvements, not hands on testing stuff.

Also, while glancing at my bookshelf, I came across a couple of old undergrad texts that I would like to glance through. By looking at the spines, I don't think I've ever opened them:

Logic in Computer Science by Huth and Ryan. This was the text for my computational logic course in 3rd year. The course notes and instructor were good enough without having to read this, but my propositional logic has become so rusty, I think I need this as a refresher.

Specifying Software by Tennant. Text for a formal methods course. Turing machines, model checking & verification, etc.

Also, my Amazon shipment arrived a couple days ago, bringing with it a copy of The Annotated Turing, by Charles Petzold, and O'Reily's Programming Erlang (this one is for Aran, but when we're both done with our purchases, we'll likely swap).

Thursday, March 19, 2009

Testing Tools

UTest
Implements a reverse test oracle (submit tests to black box piece of code). Unsure of level of functionality. Also has an eclipse plugin. Mentions sandboxing of code being run. Candidate for virtualization efforts I've been looking at.

WebCat
An online grading system developed at Virginia Tech, in which students submit assignments and have the instructor's test suite run against it. Use of this system was found to encourage test-first development practices among students, as well as early assignment submission (thanks to a hint system). Impossible to install, however, unless you are Stephen Edwards, and even then only on alternating weeks.

Marmoset
System for snapshot collection and automated testing. Using Marmoset, researchers can easily gather detailed information about students development patterns, as an Eclipse plugin checks in all code changes to a central version control repository, which can be mined. Also, Marmoset provides automatic test feedback to students, which they can use during development of an assignment, the goal of which is to improve their experience while learning to program. It is unclear whether or not these tests are also used for [semi]automatic grading.

JUnit
Although I haven't found any evidence of it yet, I'm pretty sure some combination of junit and the java remote debugger can be used to create a quick and cheap reverse test oracle. More digging required.

Tuesday, March 17, 2009

ORM-REST Code Sprint - Day 2

5:00 pm, day two of the code sprint is almost wrapped up. Not as much amazing progress today as I would have liked (I was minus one team member for some reason). That aside, here's what we've got:

Mohammad blocked out and implemented the pseudocode for the URL reverser described in our blog here. I took a further look at it, and filled in the magic that inverts the django url list and gives us a url from a view name and primary key value. Yay! Plugged it into the xml serializer, and voilla! Rest-like xml representation, with hyperlinks! The only thing missing from this bit is the 'http://hostname:port' part. From past experience, I've found this to be trickier than you might think (gets hairy if you've got one web server feeding into another, or a proxy/load balancer in the way). I think we'll try just using relative URIs for now.

After a couple more feature points are implemented, this thing needs a huge refactoring pass to clean it up and encapsulate it. Also, it uses some classes from the Django Rest Interface, but this library has some pathological faults that I want to not include in the tool. Yay for open licensing.

Also, discussions with Aran produced some new feature proposals. Lots of useful, tiny, easy-to-implement things that will improve the overall RESTability of the library. Further yay!

Automatic URL localization

Automatic anything localization

Model introspection and url pattern creation

Computed resources instead of data resources - make http interface automatic

Monday, March 16, 2009

ORM-REST Code Sprint - Day 1

At quarter to 5:00, the first day of the ORM REST code sprint is winding down. Mo' and I hacked from 2:00, and this is what we've got to show:

Rory finished one direction of proper xml serialization of Django models. That is, given a [list of] model instance[s], we get either a nice xml document (unlike the object name="" pk="" garbage we had before), or a concise list of objects with names and placeholder URIs, which will be changed to live urls when Mo' gets his piece working.

Mohammad synchronized with the svn, set up a django development environment, and familiarized himself with the code I had written. Following this, he began work on the reverse URL mapper. Given a model classname and a primary key value, he's pulling a live instance from the django ORM, and using it to query the Django URL dispatcher. This gives us the regular expression which will match URLs to access the specified object. Now he's got to turn the whole thing on its head! Good luck Mo! You can do it!

So, if you're keeping score, we are 1/2 + 1/2 = 1 feature point finished, out of 4. Might just make it by end of term :)

CSC2125 Live

My first live-ish blog. Hope it turns out well.

Class in the GSU pub

This class we did a quick series of elevator pitches by each group, as a way to practice their presentation skills. The first few groups were pretty good. Subsequent groups had to take two or three tries at the pitch.

Mohammad and I both had to present. I got to stand on a chair because I'm too short and Greg likes to pick on me:)

The bar will not make Irish coffees.

The last week of classes is in three weeks. The demo day is supposed to be the monday after that, but that is easter. This demo day will be moved further into the future, either some other time that week, or the following monday.

Greg's token plot twist: do another lap of elevator pitches, but this time for thesis work instead of 2125 project. Grad students have to explain their topic, undergrads need to come up with a thesis idea on the spot. 2 minutes prep time.

My topic: Studying the effects of integrating unit testing into standard CS undergrad programs. Questions: can be determine a measure for unit test effectiveness, can we track testing improvement over course of career, and is RTO useful?

I got cut off after my 20 seconds :( Only had like 3 words left.

Everyone's job outside of class is to come up with a thesis topic for nick. The winner gets ice cream.

I'm willing to bet we're going to go around again to try to pear things down. I don't know how to modify my speech, though.

Yup. This time I was too short.

Now we're talking about consulting fees. Greg charges $150/hr plus expenses. Clearly, the undergrads have no idea how much to charge. They've never seen the Entrepeneurship 101 talk about this. You should discount a yearly salary for an employee doing the same job you're shipping out, and normalize it over the length of the project.

If you get some consulting work on the side, U of T Legal Services can look over any contract you may have. It's free, but not speedy (couple of weeks). See Jason Betcham.

Combined degree CS + Law = name your price. Too bad law is dreadfully dull.

Next round of interrogation: what special skills do you have that cannot be easily picked up by other CS grads? I don't think I have one. Maybe experience in ECM?
Speaking another language is a big one!
Also, having a well connected professional network is important.

Last question: if you don't have something special, what are you doing to fix that?

Go get Garth Gibson's PHD thesis: how do raids work? Really good communication.

Pay attention to email for info on last class.

Friday, March 13, 2009

Zak is the worst person! The worst!

Muller and Pfahl: Simulation Methods
  • Chapter describing the way in which simulation can be used to project the outcome of a software project.
  • Most readers found this method to be too clunky, or simply inappropriate for software development estimation. The counter example of embedded or safety critical systems seemed to sway a few minds, however.
  • Interesting discussion about whether this actually qualifies as an empirical method. Also, everyone seemed to agree that what the Hadley Center is doing is valid science, even though it is simulation.
Atkins, et al: Using version control data to evaluate the impact of software tools
  • Paper evaluating possibly the worst version control system ever! At a more meta level, it was an example of how you can run an empirical study who's sole input is data mined from a past project (similar to what Samira did for her master's).
  • Nick mentioned that, despite its archaic premise, a versioned editor like this one would have been helpful at EA.
  • Discussion ensued as to whether this type of validation was actually required for this tool. It seems almost anything would be better than the existing 'version control'. In fact, there are some in the field who feel that expert intuition is ultimately more useful than empirical experimentation.

Sharp & Robinson: An Ethnographic Study of XP Practice
  • Ethnographic study of an extremely well-oiled XP team in england
  • Study found that, in this case, XP was the style best suited for maximal performance of the team
  • Threats to validity include not spending enough time (one iteration?) with the subjects
Kitchenham & Pfleeger: Personal Opinion Surveys
  • Chapter describing the process of creating and administering personal opinion serveys (questionnaires and the like)
  • Primary message is: making a questionnaire isn't easy! There's lots of confounding effects/sources of bias to worry about.
  • Interesting discussion ensued concerning the reuse of standard instruments from psychology, and whether or not SE should have similar standard instruments.
Cherubini et al: Let's go to the whiteboard: how and why software developers use drawings
  • Interesting case study conducted by Microsoft Research to see how their developers use graphical representations of code
  • Researchers were able to categorize their uses into Understanding, Design, and Communication, and the amount of investment into Transient, Reiterated, Rendered, and Archival.
  • Pretty good

Flyvbjerg: Five Misunderstandings about Case Study Research
  • This paper attempts to disprove several common misconceptions about case study research, primarily things like "case study results cannot be generalized to a larger population", "case studies cannot be used to test hypotheses".
  • A fairly good piece of advocacy. It certainly makes me feel better about considering a case study as a direction for my research.

Edwards: Using software testing to move students from trial-and-error to reflection-in-action and related papers
  • Details findings of the WebCAT system - an online assignment submission and automatic grader created at the Virginia Tech.
  • Edwards found that the system was useful and well received by both instructors and students. The primary objective, encouraging students to do test-first development, was achieved.
  • Interesting effects of introducing hints into the automatic test cases to discourage last minute submissions.
Juristo et al: Reviewing 25 Years of Testing Technique Experiments
  • A taxonomy/summary of the various means of divining test cases that have been invented over the last quarter century.
  • Focuses mainly on machine-derived cases (random input-output samples, etc), doesn't focus too much on human-created unit tests, unfortunately.

Thursday, March 12, 2009

RESTful Efforts in Django

My CSC2125 team appear to have concrete direction in what we're going to build for end of term. We've been building prototype REST web services, relying on ORM data, for the last few weeks, most recently in Django. Since Django is so closely integrated into the community here, we're going to use that as our target platform.

It seems that both myself and Bill Conrad started with the Django REST Interface, a Google SoC project from 2007. This interface takes your Django Models and throws a quick and dirty xml interface ontop of them. Functional yes, but not really complete (or even good REST). However, since this interface has some problems, they're just aching for me to fix them!

First, the xml/json/yaml returned from the interface is the standard Django serialzation format, which isn't very pretty for REST purposes. It can be cleaned up.

Most importantly, I think, is that the inter-object relations are expressed as sets of primary keys, instead of URLs to related objects. This flyes in the face of REST. It will require something akin to reversing the urls.py map to get a URL from a Model and primary key. Non-trivial, interesting, and crucial to the correctness of the resluting service.

A few other nice-to-haves include implicit delayed GET, algorithmic(query) resources, and dynamic representations.

I spoke with Bill today about his work on the Basie REST API. He seemed convinced that he solved most of what I'm after already, with the notable exception of reverse URL mapping. The REST blog post on the Basie Blog mentions the following features:

Generic Models - looks like Django Models will be turned into REST resources automagically, a la the Django REST Interface. Also, the Basie team had need of a deep synchonization of objects, and so added this to their REST API. While I'm sure this fulfulls their requirements, according to Robert Brewer via Greg's Blog this is a RESTful no no.

Intuitive Data Access - discusses url structure. There's mention in this section of an algorithmic (Bill calls them filtered) resources. If this is done, then that's one item off my list!

AJAX Friendly - not really interested in this, but good on ya.

The impression I get after going through this (and I'll admit, I haven't looked at the Basie codebase yet, but its next on my list), is that there has been effort in the area I want to pursue in 2125, but not to the extent or in the specific details I intended. Yay! Project still holds validity!