Andreas Zeller, professor at Saarland University in Germany, apart from being on of the nicest people I've met so far this week, also had some very positive feedback on my research idea. He agreed with it's principle, and that research in this area would be valid. One of his first questions after I finished presenting my idea was about incentive for participation. I thought he was asking about how we would convince students and pros to give up their time to participate in our study, but this wasn't the case. he recounted results found at Saarland when teaching students to test their code. Traditional methods obviously didn't work effectively, so they switched tracks. Student assignments were assigned with a spec and a suite of unit tests that the students could use to validate their work as they progressed. Once nightly a second set of tests were run by the instructor, the source of which was kept secret, and the students were informed in the morning of the number of tests they passed and failed. To grade the assignment, a third set of (secret) tests were run. Students begin with a mark of 100%. The first failing test reduces their mark to 80%. The second, to 70%. The third to 60%. The grading continues in a similarly harsh fashion. The first assignment in the course is almost universally failed, as students drastically underestimate how thorough they need to be with their tests. Subsequent assignments are much better, and the students focus strongly on testing, and indeed collaborate by sharing test cases and testing each others assignments to eliminate as many errors as possible before the submission date. Once the students have the proper motivation to test, they eagerly consume any instruction in effective testing techniques. Andreas found that simple instruction in JUnit and maybe TDD was all that was required, and the rest the students figured out for themselves. This type of self-directed learning is encouraging, but the whole situation makes me think that these students may be working harder, but not smarter, to test their software. It may be possible that by providing instruction not only in the simple operation of an XUnit framework, but also in things like effective test case selection, they may be able to reach similar test coverage or defect discovery rates, while expending less frantic effort.
Thought: drop the first assignment from the course's final mark, as it is not used for evaluation as much as it is a learning experience (of course, don't tell the students this, otherwise the important motivational lesson will be lost).
In addition to this, Andreas, just as other folks I've spoken to so far, emphasized the need to keep my study narrowly focused on a single issue, as well as controlling the situation in a way that enables precise measurements to be made both before and after implementing the curriculum changes we hope to elicit from the study, to accurately determine the improvements (if any exist). I am beginning to see a loose correlation between a researcher's viewpoint regarding empiricism in SE research (positivist on the right and constructivist on the left) and their amount of emphasis on narrowness of focus. Often times, those people who advocate exploratory qualitative studies also recommend wider bands of observation while conducting these studies. This is likely an over generalization, and I apologize in advance to anyone who may disagree with this.
Monday, May 18, 2009
ICSE 09 - Day Two
Day Three of ICSE 09 was another great success! It began with a rousing keynote by Tom Ball of Microsoft Research fame. He described the early beginnings of version control and the emergence of mining techniques for these version systems, and continued the narrative through to the latest version of visual studio and the tool Microsoft calls CRANE, which suggests to developers area of the code which should be examined given that they are changing some other area. Pretty cool. I got to talk to him one-on-one about it over lunch.
On the previously blogged march through stanley park, I had a talk with Tom Ostrand from AT&T Labs in New Jersey. One thing I wanted to talk about (besides pitching my research idea) was leveraging the version history and reports generated from a test suite to predict bugs in target code using MSR techniques to augment the existing tools, such as mining deltas and bug trackers. This topic came up in the morning's MSR session, but aparently Tom missed it (making me look much smarter than can be measured in reality). He seemed interested in the possibility, but of course there are barriers to getting it going. Primarily, assuming that the dev team for the software we're analyzing is writing tests and putting them into the vcs, the error and coverage reports almost certainly aren't, making it difficult to to versioned history analysis over them. I thought that maybe, given the source code of the target software and the test code (both of which at some version), the MSR tool could build the code and execute the tests to create the needed reports. This will likely be extremely difficult to get going in the field, however, and make MSR mining, which is already an extremely expensive, long running process, even slower.
On the previously blogged march through stanley park, I had a talk with Tom Ostrand from AT&T Labs in New Jersey. One thing I wanted to talk about (besides pitching my research idea) was leveraging the version history and reports generated from a test suite to predict bugs in target code using MSR techniques to augment the existing tools, such as mining deltas and bug trackers. This topic came up in the morning's MSR session, but aparently Tom missed it (making me look much smarter than can be measured in reality). He seemed interested in the possibility, but of course there are barriers to getting it going. Primarily, assuming that the dev team for the software we're analyzing is writing tests and putting them into the vcs, the error and coverage reports almost certainly aren't, making it difficult to to versioned history analysis over them. I thought that maybe, given the source code of the target software and the test code (both of which at some version), the MSR tool could build the code and execute the tests to create the needed reports. This will likely be extremely difficult to get going in the field, however, and make MSR mining, which is already an extremely expensive, long running process, even slower.
Sunday, May 17, 2009
Farewell to the Borkenstocks
The time has finally come to retire the Canadian Tire brand cork bottomed sandals, so lovingly referred to as the 'Borkenstocks'. After a season in the closet and a plane ride to Vancouver, the sandals got their first taste of summer today, as they accompanied me on a walk through Stanley Park. The results of this were as follows:
blisters and uncomfortableness.
Feedback on my Research Proposal
Over the last 24 hours I've had the chance to talk to a few individuals about my research idea. It seems to go as follows: I give my 30 second pitch, the marks stares at me blankly. Then, I elaborate. The mark gets it. They tell me first a problem they see, and then what they'd like to see out of the results. In a couple of cases, the mark got particularly excited about the outcome. I've summarized below.
Chris Bird
Chris has a strong background in empirical software engineering, with particular emphasis on qualitative exploratory studies. As such, the quantitative analysis aspects of my pitch fell on deaf ears, but he was extremely interested in the in-lab observation sessions I was proposing. He felt that 5 or 6 actionable recommendations that might come out of observing professionals would be invaluable. He suggested that I find an older Microsoft Research study in which they trained new hires by having them monitor a screen shared by a senior developer for some amount of time (probably a few days), and had the new hire ask questions with the senior at a later date, by reviewing the video logs. I like this idea because it allows us to elicit the information from the developer directly, instead of trying to infer it ourselves, but since we're not bothering them during the initial testing session, we wouldn't be affecting their performance. This isn't without problems, though. Primarily, there is the risk that the subject isn't always sure why they do the things they do, and so are likely to invent reasons, or invent foresight where none necessarily exists. Interesting idea, though. Also, should we interview students in the same way? On one hand, they students likely don't have any special insights that we can leverage (assuming they are less effective at testing than pros). On the other hand, it may illuminate any areas of misconception or misunderstanding which we could address in future curriculum changes.
Jim Cordy
Jim is a professor at Queens, and was my instructor in my 4th year compilers course. After he heard my pitch, he had a warning about an affect he had seen in his industrial work, and it comes from a generational difference in the training of developers. Developers who were trained more than 15 or 20 years ago had a delay between changing the source code and the results of program execution on the order of hours; new developers are used to delays on the order of minutes. Also, the current state of the art in debugging utilizes interactive debuggers, which were either unavailable or unreliable in earlier days. This has lead to 'old-school' programmers to a) rely heavily on source code inspection and b) insert enormous amounts of instrumentation (debugging statements) when running the program becomes required. In comparison, new generation developers often use smaller amounts of instrumentation, relying on quick turn around times to find the cause of errors. In Jim's experience, the old school programmers were orders of magnitude more effective (in terms of bugs found or solved per hour) than younger programmers. If this is in fact the case, it should be an effect we can see if we recruit subjects trained during this era of computer science research.
Chris Bird
Chris has a strong background in empirical software engineering, with particular emphasis on qualitative exploratory studies. As such, the quantitative analysis aspects of my pitch fell on deaf ears, but he was extremely interested in the in-lab observation sessions I was proposing. He felt that 5 or 6 actionable recommendations that might come out of observing professionals would be invaluable. He suggested that I find an older Microsoft Research study in which they trained new hires by having them monitor a screen shared by a senior developer for some amount of time (probably a few days), and had the new hire ask questions with the senior at a later date, by reviewing the video logs. I like this idea because it allows us to elicit the information from the developer directly, instead of trying to infer it ourselves, but since we're not bothering them during the initial testing session, we wouldn't be affecting their performance. This isn't without problems, though. Primarily, there is the risk that the subject isn't always sure why they do the things they do, and so are likely to invent reasons, or invent foresight where none necessarily exists. Interesting idea, though. Also, should we interview students in the same way? On one hand, they students likely don't have any special insights that we can leverage (assuming they are less effective at testing than pros). On the other hand, it may illuminate any areas of misconception or misunderstanding which we could address in future curriculum changes.
Jim Cordy
Jim is a professor at Queens, and was my instructor in my 4th year compilers course. After he heard my pitch, he had a warning about an affect he had seen in his industrial work, and it comes from a generational difference in the training of developers. Developers who were trained more than 15 or 20 years ago had a delay between changing the source code and the results of program execution on the order of hours; new developers are used to delays on the order of minutes. Also, the current state of the art in debugging utilizes interactive debuggers, which were either unavailable or unreliable in earlier days. This has lead to 'old-school' programmers to a) rely heavily on source code inspection and b) insert enormous amounts of instrumentation (debugging statements) when running the program becomes required. In comparison, new generation developers often use smaller amounts of instrumentation, relying on quick turn around times to find the cause of errors. In Jim's experience, the old school programmers were orders of magnitude more effective (in terms of bugs found or solved per hour) than younger programmers. If this is in fact the case, it should be an effect we can see if we recruit subjects trained during this era of computer science research.
Saturday, May 16, 2009
ICSE 09 - Day One
We're wrapping up the first day of talks here at ICSE 2009. I've talked my way into the Mining Software Repositories (MSR) workshop. Here's a quick breakdown of some noteworthy points:
Keynote: Dr. Michael McAllister, Director of Academic Research Centers for SAP Business Objects
An hour and a half long talk in which he sells BI to the masses. Spoke a lot about integrating data silos, and providing an integrated, unified view of the data to business level decision makers. Also interesting anicdotes on how BI helped cure SARS. Forgot what OLAP stands for. Kind of concerning. This talk made me (after spending 2+ years working for a BI company) want a running example of what BI is and how it is used in the context of an expanding organization - from the point before any computational logistics assistance is required, and progressing forward to a Wallmart sized operation. Most examples I've seen start with a huge complex organization, complete with established silos, and then installs things like supply chain management, repository abstraction, customer relations management, document management, etc.
Mining GIT Repositories - presented the difficulties in mining data out of GIT (or DVCS systems in general) as opposed to traditional centralized systems like svn. Noteworthy items include high degree of branching and the lack of a 'mainline' of development.
Universal VCS - by looking for identical files in the repos of different projects, a single unified version control view is established for nearly all available software. Developed by creating a spider program which crawled the repos of numerous projects, downloading metadata and inferring links where appropriate.
Map Reduce - blah blah blah use idle PCs for quick pluggable clustering to chuck away on map reduce problems. Look @ Google MapReduce and Hardoop.
Alitheei Core - A software engineering research platform. Plugin framework for performing operations on heterogeneous repositories. Can define a new Metric by implementing an interface, and then evaluate the metric against all repositories in the framework. Look @ SQO OSS.
Many of these previous talks created tools for mining repositories, but with no greater purpose than that. When asked about this ('mining for the sake of mining'), none of the authors seemed to have a problem. The conclusion of the discussion was that this lack of purpose was a problem, and that the professional community should be surveyed to find out what needs they have for mining repos.
Research extensions/ideas:
The third MSR session today focused heavily on defect prediction. After showing off 3 or 4 methods of mining vcs systems to predict buggy code that improved prediction probability by 4% or 5%, the discussion boiled down to this, "What do managers/developers want in these reports to help them do their jobs?" Obviously, the room full of academics didn't have a definitive answer. One gentleman asked the question I had written down, which was "has anyone used the history and coverage of a software's test suite in combination with data from the VCS as a defect predictor (in theory, heavily tested areas are less likely to contain bugs)?" I found this particularly interesting. Also, I began to wonder, if we had one of these defect prediction reports, does it improve a developer's ability to find bugs, and if so, to what extent? Would it be measurable in the same way as I intend to measure testing ability with students and professionals?
Stay tuned for more info (and pictures of beautiful vancouver)!
Keynote: Dr. Michael McAllister, Director of Academic Research Centers for SAP Business Objects
An hour and a half long talk in which he sells BI to the masses. Spoke a lot about integrating data silos, and providing an integrated, unified view of the data to business level decision makers. Also interesting anicdotes on how BI helped cure SARS. Forgot what OLAP stands for. Kind of concerning. This talk made me (after spending 2+ years working for a BI company) want a running example of what BI is and how it is used in the context of an expanding organization - from the point before any computational logistics assistance is required, and progressing forward to a Wallmart sized operation. Most examples I've seen start with a huge complex organization, complete with established silos, and then installs things like supply chain management, repository abstraction, customer relations management, document management, etc.
Mining GIT Repositories - presented the difficulties in mining data out of GIT (or DVCS systems in general) as opposed to traditional centralized systems like svn. Noteworthy items include high degree of branching and the lack of a 'mainline' of development.
Universal VCS - by looking for identical files in the repos of different projects, a single unified version control view is established for nearly all available software. Developed by creating a spider program which crawled the repos of numerous projects, downloading metadata and inferring links where appropriate.
Map Reduce - blah blah blah use idle PCs for quick pluggable clustering to chuck away on map reduce problems. Look @ Google MapReduce and Hardoop.
Alitheei Core - A software engineering research platform. Plugin framework for performing operations on heterogeneous repositories. Can define a new Metric by implementing an interface, and then evaluate the metric against all repositories in the framework. Look @ SQO OSS.
Many of these previous talks created tools for mining repositories, but with no greater purpose than that. When asked about this ('mining for the sake of mining'), none of the authors seemed to have a problem. The conclusion of the discussion was that this lack of purpose was a problem, and that the professional community should be surveyed to find out what needs they have for mining repos.
Research extensions/ideas:
The third MSR session today focused heavily on defect prediction. After showing off 3 or 4 methods of mining vcs systems to predict buggy code that improved prediction probability by 4% or 5%, the discussion boiled down to this, "What do managers/developers want in these reports to help them do their jobs?" Obviously, the room full of academics didn't have a definitive answer. One gentleman asked the question I had written down, which was "has anyone used the history and coverage of a software's test suite in combination with data from the VCS as a defect predictor (in theory, heavily tested areas are less likely to contain bugs)?" I found this particularly interesting. Also, I began to wonder, if we had one of these defect prediction reports, does it improve a developer's ability to find bugs, and if so, to what extent? Would it be measurable in the same way as I intend to measure testing ability with students and professionals?
Stay tuned for more info (and pictures of beautiful vancouver)!
Monday, April 20, 2009
AeroPress and Number Theory
A warning to all who may use the coffee grinder in the SE lounge to make fine-grained coffee for use in an AeroPress: shaking the coffee grinder will blow the circuit breaker! It seems some of us don't realize that by applying rotating the axis of a spinning body, orthogonal to the plane of rotation, we are in fact applying a force against the angular momentum of said body. If this rotating device is powered by an electric motor, this causes the motor to draw more current to maintain its current speed. In short, don't shake the coffee grinder.
Also, I picked up the Annotated Turing again over the weekend, and read the first two chapters on number theory. I found this to be absolutely fascinating! Now, if you're already well versed in number theory, then these overviews may be redundant for you, but I thoroughly enjoyed it. Looking forward to what else is in this book.
Also, I picked up the Annotated Turing again over the weekend, and read the first two chapters on number theory. I found this to be absolutely fascinating! Now, if you're already well versed in number theory, then these overviews may be redundant for you, but I thoroughly enjoyed it. Looking forward to what else is in this book.
Sunday, April 19, 2009
Things on my Todo list
My current list of things I want/need to do:
Reading:
Petzold, "The Annotated Turing"
Homer, "The Odyssey"
Huth and Ryan, "Logic in Computer Science"
Tennant, "Specifying Software"
"Software Architecture, A Primer"
Gorton, "Essential Software Architecture"
Dickens, "Oliver Twist"
Writing:
2125 end of term summary paper
2130 empirical study paper
conceptual model of REST process & motivation
ERB paperwork
Coding:
Instrumented JUnit and PyUnit
Diplomacy relationship analyzer
Qualitative Coding Application
iPhone app for navigation and aviation
Custom Braid mod - or just re-implement the engine with Java 2D and OpenGL
Misc:
Taxes
Health Insurance Claim
Personal/professional website, portfolio, and cards. Would like these to have a unifying visual theme.
Bribe people on craigslist for sold out Johathan Coulton tickets
Plan Carmen's bachelor party, 3 camping trips, and a houseboat rental scheme.
Paperwork for windsurfing class.
Compare & choose sailing clubs for the summer
Finish Bluenose
Obviously, this list is waayyy to long to be reasonable. I think you can see where the priorities should go, however (finishing ERB paperwork > reading Oliver Twist :P ).
Reading:
Petzold, "The Annotated Turing"
Homer, "The Odyssey"
Huth and Ryan, "Logic in Computer Science"
Tennant, "Specifying Software"
"Software Architecture, A Primer"
Gorton, "Essential Software Architecture"
Dickens, "Oliver Twist"
Writing:
2125 end of term summary paper
2130 empirical study paper
conceptual model of REST process & motivation
ERB paperwork
Coding:
Instrumented JUnit and PyUnit
Diplomacy relationship analyzer
Qualitative Coding Application
iPhone app for navigation and aviation
Custom Braid mod - or just re-implement the engine with Java 2D and OpenGL
Misc:
Taxes
Health Insurance Claim
Personal/professional website, portfolio, and cards. Would like these to have a unifying visual theme.
Bribe people on craigslist for sold out Johathan Coulton tickets
Plan Carmen's bachelor party, 3 camping trips, and a houseboat rental scheme.
Paperwork for windsurfing class.
Compare & choose sailing clubs for the summer
Finish Bluenose
Obviously, this list is waayyy to long to be reasonable. I think you can see where the priorities should go, however (finishing ERB paperwork > reading Oliver Twist :P ).
Subscribe to:
Posts (Atom)