Saturday, May 16, 2009

ICSE 09 - Day One

We're wrapping up the first day of talks here at ICSE 2009. I've talked my way into the Mining Software Repositories (MSR) workshop. Here's a quick breakdown of some noteworthy points:

Keynote: Dr. Michael McAllister, Director of Academic Research Centers for SAP Business Objects
An hour and a half long talk in which he sells BI to the masses. Spoke a lot about integrating data silos, and providing an integrated, unified view of the data to business level decision makers. Also interesting anicdotes on how BI helped cure SARS. Forgot what OLAP stands for. Kind of concerning. This talk made me (after spending 2+ years working for a BI company) want a running example of what BI is and how it is used in the context of an expanding organization - from the point before any computational logistics assistance is required, and progressing forward to a Wallmart sized operation. Most examples I've seen start with a huge complex organization, complete with established silos, and then installs things like supply chain management, repository abstraction, customer relations management, document management, etc.


Mining GIT Repositories - presented the difficulties in mining data out of GIT (or DVCS systems in general) as opposed to traditional centralized systems like svn. Noteworthy items include high degree of branching and the lack of a 'mainline' of development.

Universal VCS - by looking for identical files in the repos of different projects, a single unified version control view is established for nearly all available software. Developed by creating a spider program which crawled the repos of numerous projects, downloading metadata and inferring links where appropriate.

Map Reduce - blah blah blah use idle PCs for quick pluggable clustering to chuck away on map reduce problems. Look @ Google MapReduce and Hardoop.

Alitheei Core - A software engineering research platform. Plugin framework for performing operations on heterogeneous repositories. Can define a new Metric by implementing an interface, and then evaluate the metric against all repositories in the framework. Look @ SQO OSS.

Many of these previous talks created tools for mining repositories, but with no greater purpose than that. When asked about this ('mining for the sake of mining'), none of the authors seemed to have a problem. The conclusion of the discussion was that this lack of purpose was a problem, and that the professional community should be surveyed to find out what needs they have for mining repos.

Research extensions/ideas:
The third MSR session today focused heavily on defect prediction. After showing off 3 or 4 methods of mining vcs systems to predict buggy code that improved prediction probability by 4% or 5%, the discussion boiled down to this, "What do managers/developers want in these reports to help them do their jobs?" Obviously, the room full of academics didn't have a definitive answer. One gentleman asked the question I had written down, which was "has anyone used the history and coverage of a software's test suite in combination with data from the VCS as a defect predictor (in theory, heavily tested areas are less likely to contain bugs)?" I found this particularly interesting. Also, I began to wonder, if we had one of these defect prediction reports, does it improve a developer's ability to find bugs, and if so, to what extent? Would it be measurable in the same way as I intend to measure testing ability with students and professionals?

Stay tuned for more info (and pictures of beautiful vancouver)!

Monday, April 20, 2009

AeroPress and Number Theory

A warning to all who may use the coffee grinder in the SE lounge to make fine-grained coffee for use in an AeroPress: shaking the coffee grinder will blow the circuit breaker! It seems some of us don't realize that by applying rotating the axis of a spinning body, orthogonal to the plane of rotation, we are in fact applying a force against the angular momentum of said body. If this rotating device is powered by an electric motor, this causes the motor to draw more current to maintain its current speed. In short, don't shake the coffee grinder.

Also, I picked up the Annotated Turing again over the weekend, and read the first two chapters on number theory. I found this to be absolutely fascinating! Now, if you're already well versed in number theory, then these overviews may be redundant for you, but I thoroughly enjoyed it. Looking forward to what else is in this book.

Sunday, April 19, 2009

Things on my Todo list

My current list of things I want/need to do:

Reading:
Petzold, "The Annotated Turing"
Homer, "The Odyssey"
Huth and Ryan, "Logic in Computer Science"
Tennant, "Specifying Software"
"Software Architecture, A Primer"
Gorton, "Essential Software Architecture"
Dickens, "Oliver Twist"

Writing:
2125 end of term summary paper
2130 empirical study paper
conceptual model of REST process & motivation
ERB paperwork

Coding:
Instrumented JUnit and PyUnit
Diplomacy relationship analyzer
Qualitative Coding Application
iPhone app for navigation and aviation
Custom Braid mod - or just re-implement the engine with Java 2D and OpenGL

Misc:
Taxes
Health Insurance Claim
Personal/professional website, portfolio, and cards. Would like these to have a unifying visual theme.
Bribe people on craigslist for sold out Johathan Coulton tickets
Plan Carmen's bachelor party, 3 camping trips, and a houseboat rental scheme.
Paperwork for windsurfing class.
Compare & choose sailing clubs for the summer
Finish Bluenose

Obviously, this list is waayyy to long to be reasonable. I think you can see where the priorities should go, however (finishing ERB paperwork > reading Oliver Twist :P ).

Tuesday, March 31, 2009

REST in Django

So I'm now officially a Google Summer of Code mentor for the Django open source group! w00t! Now all I have to do is pull off all those cool ideas I came up with earlier, as well as have a group of open source developers all agree that they are as cool as I think they are (which could be easier said than done).

While brushing teeth this morning, was thinking "I wonder what sort of empirical study I could pull out of this situation? What sort of research is there to be done in the field of REST? What can I learn, for academic purposes, not my own, from this experience?"

Ideas? I'm going to mull it over at the gym.

Tuesday, March 24, 2009

Google and Space and Time


Has Google finally solved that whole space-time-bendy problem? Observe:

Sunday, March 22, 2009

Reading Last Week

Seaman: Qualitative Methods
  • Description of ways in which qualitative methods can be used in conjunction with a positivist stance. Tools include observational studies and interviews.
  • Interesting tradeoff: amount of data collected in an interview vs. amount of interviewee's time used vs. amount of direction in interview. Often, the importance/implication of the data collected isn't known for a long time after the interview.
  • When conducting an interview, stress that it is not an evaluation. There are no 'right' or 'wrong' answers.
Cohen: Statistical Power Analysis
  • Examines the importance of analyzing and reporting the power of a statistical relation in empirical research.
  • Author proposes that a sound target power be 0.80, as it produces feasible sample sizes for given values of alpha and ES.
  • Not a very understandable piece of literature, at least from my perspective.

Rosenthan and MiMatteo: Meta-Analysis: Recent Developments in Quantitative Methods for Literature Reviews
  • A good introduction to meta-analysis (that is, analyzing the results of many studies/experiments to determine h0, instead of directly testing subjects to prove h0).
  • Interesting points about making sure the studies in your meta-analysis are independent (if meta-analyzing multiple studies from the same research group, subjects and or data may be reused, and so the results may be overlapping).
  • Also interesting discussion of inherent bias in meta-analysis, arising in the form in which the experimentalists choose to include/exclude studies from their sample space.

Card: Ender's Game
  • Enjoyable novel about a young boy who is called upon to train as a military commander to protect the earth from the threat of alien invasion.
  • Good character development, although the author lays on the bloodlust and homo-eroticism a bit thick.
  • As a consequence, the sci-fi aspects of the story seemed secondary to the character plot, even tacked on in some places.
  • Generally, I liked it, but probably wouldn't invest the time to read the 5 or 6 sequels

Modeling Topic

After a couple months of soul searching, the students enrolled in John Mylopoulos' Conceptual Modeling course are narrowing down ideas on what to model. The requirements of the project (to 'model something') left it fairly open, but perhaps too open to be immediately tractable. However, Michalis, Alicia, and I are closing the gap. I'm leaning toward modeling certain aspects of the REST web service framework I've been working on for the last little while in CSC 2125.

Obviously, system descriptions and class diagrams are uninteresting, as John pointed out earlier in the term, but modeling the requirements of what such a system should do are not. I'm looking into Tropos, but not really sure if it applies. However, we can probably do a goal model illustrating the motivation and principles of REST, contrasted with those of ws*. Also, a use case diagram describing the interface to a general ROA service could be created. Finally, we could model certain pieces of important logic, such as a delayed get, using a description logic syntax.