Monday, June 29, 2009

Creating Defects

I'm in the process of inserting defects into three pieces of software I've written - for the purpose of creating testing samples for the study. This process is a lot more painful that I would have anticipated. I expect it is due to being trained for so long at removing defects, not deliberately creating them. Every time I break something, I realize the cool test case that will fail because of it, and then I feel bad. Also, trying to create non-obvious errors, or defects that are a little more human than those created by my java mutator, is challenging. For example, consider a class constructor like the following:

public MyAccount(int account, int initialBalance)

The mutator would do something like this

public MyAccount(int account, int initialBalance)

following its set of one-line operator rules. I think I a more 'human' error would be something like this:

public MyAccount(int account, int initialBalance)

or this:

public MyAccount(int account, int initialBalance)

All still valid java programs, but definitely erroneous, and certainly slips that an overworked developer could make. What sets them apart from some of the mutation bugs is that many of the mutation rules require more effort on the part of the developer, rather than less (ie. a ++ at the end of a variable manipulation command is more likely to be omitted than included by accident).

Friday, June 19, 2009

Think Aloud and Coding Analysis

Chris pointed me at a paper by Mayrhauser and Vans "Identification of Dynamic Comprehension Processes During Large Scale Maintenance" that seems fairly relevant, in that they are using methods that align with mine so far. They've used a Think Aloud process and recorded participant actions while performing a maintenance change request. The activity took approx. 2 hours per subject (11 subjects. I think I can do better). Video and audio recordings were transcribed and coded. The authors posit that a) coding should be based on categories defined a priori (before the video/audio is recorded), and that b) Think Aloud does not work out of phase with the change action (thinking aloud after doing the task). This concerns me as a) I don't have a set of codes yet (I could certainly come up with some rather quickly, but they would be without significant justification), and b) I kind of liked the idea of the post-task interview.
These concerns aside, the data analysis in this paper is excellent. The authors code all the transcripts, and derive a set of patterns that the participants take while performing the tasks. These are formulated as finite state machines, in which each state represents a code. This, to me, validates their choice of codes. This may be a good model to follow for at least part of my analysis procedure.

Wednesday, June 17, 2009

Nielsen's Heuristics for Software Testing

I had a quick conversation today with Dustin, of DGP and MSR fame, and he asked me if there was anything similar to Nielsen's heuristics for usability that might be used when looking for errors in code. There wasn't anything that jumped to my mind, but that certainly doesn't mean that nothing in fact exists. However, the first 10 hits for "software testing heuristics", "nielsen heuristics code", "nielsen heuristics software testing errors" didn't contain what I was looking for, either. It makes me think that the imagined output from my research study could be a valid contribution to knowledge. I think the list would probably contain things like:

  • always try negative numbered parameters
  • always try null values
  • how well does the .equals() method work?
  • add-remove-add to/from the collection, is it the same semantically?
  • always check date-based roll-overs

Monday, June 15, 2009

Simple Web Services are Too Hard to Find

So I'm building some software to use as a System Under Test for my thesis experiment, and found several seminal examples in Paul Jorgensen's "Software Testing". Since I had already implemented the Simple ATM problem (well, a variant of it, without any GUI), and the Triangle problem, I decided that the Currency Exchange problem might fit well in between these two, in terms of complexity and size of code. Basically, the program takes in a value and source currency, and converts it to a destination currency of your choice (4 options in Jorgensen's text). This seemed a bit outdated to me, as it relies on the programmer hard-coding the exchange rates by hand. Boo-urns, I say! So, I figured I'd use some benevolent, free web service to pull down live exchange rates and use them in my program. This should be simple, I imagine, because a) this is the type of service presented in every tutorial on making web services, and b) it provides exactly the type of functionality for which web services are suited: some mysterious online entity that has a little glob of information that I want to use in my application. Google searching for such a web service turned up a disappointing lack of results. There are several online currency converters, of which I'm sure the reader is aware, but none of those are particularly machine friendly - I would have to hack up and throw away a bunch of HTML to get a single number out of the page. I encountered one service that was designed to be machine readable, but it used the bloated WS* XML web service stack, requiring me to auto-generate hundreds of lines of code, all so that I can read a single number (this is not an option for me, because I don't want my subjects attempting to write tests for a bunch of machine created JAX-WS goop). If that wasn't enough, I also had to apply for a Trial API Key which would allow me to access the single number I needed for a period of two weeks, after which I would need to purchase a commercial API license. Grrrrrrr! A subsequent search of "REST currency exchange web service" turned up bupkiss. Why is this so hard? There should be dozens of services like this. Maybe I'm just not looking in the right place.

Thursday, June 4, 2009

Idea for a Meta-Study

Idea for a meta-study: lots of papers have been published in which a controlled experiment is performed to examine the potential benefits of TDD. These are mostly all of the flavor: have control group implement some spec, using code-first-test-last, have experimental group implement same spec using TDD, measure time to complete and defect count of both groups. It seems (at least in my experience), that there are two possible outcomes from this, either the results are inconclusive, or they tend slightly towards the author's own feelings on the subject, either for or against TDD. I think an interesting, and probably easy to conduct, meta-study would be to pull down copies of all papers that are performing studies like these and see what the trend is, as well as analyzing things like geographic region of the authors or other relevant, although maybe not immediately obvious, correlations.

Wednesday, June 3, 2009

Iteratvie Thesis Development

Yesterday I decided to try a new way of organizing my time. In 'the real world', summer time always correlated with reduced productivity of a development team, in part due to developers taking vacation time, but also supervisors being absent and leaving the team with a lack of direction. I have proactively given myself this direction by dividing my summer into 3 iterations, which coincide with the three remaining months of the season. In each iteration, there are four phases, in which I will examine and refine the methodology, data acquisition, and analysis of this thesis I'm pursuing, and in the 4th phase I will write up my results. I figure that by doing three iterations in this way, the pilot study this summer should give me a really good idea on how to run a successful study in the fall, as well as a head start on some of the write-up.

A User Study for Mutation-based Testing Analysis

I recently read some material by Andreas Zeller in which he discusses the merits of using mutation testing as a method of verifying the quality of a test suite for a piece of software. These methods are meant to expose tests which perform an action, and assume that it was performed properly so long as an error is not thrown by the system under test - they do not verify that the action resulted in the desired program state. Code mutation (either on source or directly on the bytecode), when combined with accurate code coverage data, can identifiy these deficient tests by mutating the code they cover and observing tests that do not fail.

I believe that there is value in using this data, if it can be presented in the appropriate manner. If a developer has to spend an hour to generate the mutation report and cross-reference it with code coverage, then the investment likely outweighs the benefit. However, if I can arrive at my desk at 9am, and have in my inbox a build report, test report with coverage, and a mutation report for every branch, all of which are properly hyperlinked to each other and backed up on a central storage, then I would certainly use it. The problem here seems to be that this degree of automation and integration is hard to set up, and often times delicate when in place. It seems that some sort of standard platform for use by build engineers for integrating all of their reports, packaging operations, tests, etc, is called for. However, I have yet to see anything more sophisticated than Ant or Perl in widespread use by engineers. Maybe I just have an unrepresentative sample, though.