Friday, February 26, 2010

Migrating

Hi all.

I'm migrating my blog from here over to http://rorytulk.wordpress.com, soon to become www.rorytulk.com. Please adjust accordingly.

Sunday, February 7, 2010

Software Testing Techniques, an Empirical Approach

Proper software testing regimes are a cornerstone of effective software engineering. Progress has been made to teach students sound testing techniques, but improvement can still be made. My master's research supervisor and I conducted a study designed to empirically determine the difference in ability between student and professional software testers, and elicit from the experts behaviours or techniques which may be used to enhance undergraduate curriculum.

Our experimental setup consisted of in-lab observational sessions where subjects wrote thorough suites of JUnit tests for sample software we'd created. Subjects were drawn from the University of Toronto’s undergraduate computer science student body and professional developers from the Greater Toronto Area. The test code and video logs created during these sessions were examined for trends present in the student and professional groups.

Our intuition going into the study was that professionals would find more defects with their test suites, an advantage stemming from some metric such as number of tests written, lines per test, code coverage per test, etc. Analysis of these metrics did not confirm these hypotheses, however. Students and professionals performed equally well in terms of number of bugs found. However, student code contained more defects and, more importantly, the types of bugs found differed strikingly between the two groups.

Bugs in the sample code were broken down into two categories: stateless and stateful. A stateless bug is uncovered by inputting invalid values into a method invocation, and the method returns invalid results or throws an exception. A stateful bug occurs when a method call corrupts the object's state, and so subsequent calls perform incorrectly. Students found a mix of stateless and stateful bugs in the code, with a strong majority being stateless. The professionals sampled found strictly stateful defects. There are several possible explanations for this effect, although no evidence to support one over the others is immediately apparent.

The full text can be found here.

Monday, February 1, 2010

Left-Fold for Bash

I'd like to share a recent bash programming experience I've had. It began while processing the reams of data generated in my M.Sc. research study. I was producing long lists of frequency data in text files, and had the need to sum up all the lines in these tables. This is of course a trivial problem in many languages, and I had a wide array of options available to me:
  • I could write another ant task to do the summing (this required more work that I was willing to invest, as ant isn't really suited for computational tasks)
  • I could write a python script that took the contents of the specified file and returned the sum. I didn't really like this approach because it involved yet another file in my build process, invoked from the ant task. I always sort of thought that if you were forced to use , you were performing a task beyond the scope of your tool.
  • I could skip the generation of the table and go straight to the sum. A few of the tables were created with XSLT, so this was a valid option. However, my XSLT programming ability is very much trial-and-error based, so I thought this might take some time. Also, some of the other tables, created with grep would not be affected.
  • Write it in shell. I liked this idea. I really liked the feel of being able to just pipe something to 'sum' and have the sum returned. So this is what I chose.
My first version looked like this:
#!/bin/bash

sum=0
while read line
do
sum=$(($sum + $line))
done

echo $sum
exit 0
It did the trick quite well. I had suggestions from office mates for the following alternatives:
Using python (courtesy Aran Donohue):

python -c "import sys;print sum(float(x) for x in sys.stdin.read().split())"

Using tr (courtesy Zak Kincaid):

cat numbers | tr '\n' '+'|head --bytes='-1'|bc
(Note that this version doesn't quite work. bc throws a syntax error. not exactly sure why.)


Using my original design, I realized that if I abstracted out the operator, I could use this script to perform any 2-operand function I wished on the list, essentially creating a basic left fold:

#!/bin/bash
op=""
if [ "$1" == "" ]; then
op="+"
else
op=$1
fi

sum=0
while read line
do
sum=$(($sum$op$line))
done

echo $sum
exit 0

I've done a little bit of error checking, to see if the parameter supplied is blank, and if so replace it with + by default.

I've only seen this work for '+' and '-'. If I use '*', it breaks because it replaces the wildcard/multiplication character with the listing of the current directory.