Rory Tulk's Blog: 2008

Monday, December 29, 2008

Relaxing Christmas Break

My relaxing Christmas break is coming to an end, and not a moment too soon. Just to recap:

Spent the first few days adjusting to the noise level in my house, which I had apparantly forgotten about
Had 5 (five) wisdom teeth removed. Important note: always get your pain medication before the Novocaine wears off. If the wait time at the pharmacy is longer than an hour, get the doctor to call in the order ahead of time. Most unpleasant two hours ever.
Acquired 1 (one) head cold from either family members at Christmas eve party, or mobs on boxing day. Just in case an aching, pounding pain in your jaw bone isn't enough, here's two more aching, pounding pains: one in your head, and one in your throat!
Sub-par productivity levels, which I'm blaming on the codine.
Modem feels the need to reset/crash about once every 5 minutes

Anyway, it will be nice to be back in Toronto again.

Sunday, December 21, 2008

Spent some quality procrastination time this afternoon migrating all of my disparate google services to be owned by my shiny new gmail account. Simply switching ownership account wasn't an option, so I had to create a new calendar, reader, and blog with my gmail account, export everything into xml files, then import them into the new services. Everything looks pretty much seamless, with the exception of this blog, which now has two authors, both named Rory Tulk. Oh well, can't win 'em all, I guess.

Friday, November 7, 2008

Power Laws in Software

A good paper explaining Power Laws, and the 80-20 observation. Some of the correlations between the distribution and actual data are a bit loose, but the authors provide error measures (r squared) for all datasets presented, and in some cases make mention of some poor fits. Overall, very helpful.

Couple of tangent thoughts while reading this:

static dependancy analysis with Java reflection sounds like fun :)

"... only a few reusable components can be reused profitably [Glass 1998]" look into this further. Does this correlate with the previous paper on code reuse?

Jogging Over a Distance – Supporting a “Jogging Together” Experience Although Being Apart

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/09-mueller.pdf

The authors have created a simple audio communication system for use by joggers. As reported in a survey they conducted, joggers enjoy the company of others while jogging, for conversation and encouragement. However, finding a jogging parter of similar ability and availability is not always possible. To this end, the Jogging Over a Distance system links two joggers who are not physically collocated. Each jogger is equipped with a cell phone, wireless headset, and small computing unit. Joggers hear the conversation of their partner though the headset, augmented to seem localized in space around them based on their partner's speed (ie. If their parter is faster, they will sound farther ahead).

Jogging Over a Distance insightfully a market of users which seem to be in a very receptive position for this device. As the study points out, 54% of surveyed participants said they engage in conversation while jogging. Also, the proposed design retains all the mentioned benefits of conversation while jogging (socializing, motivation, fun, encouragement) while lessening the barriers presented by running in pairs (differing pace, differing geographic location). In addition to this, the 3D audio positioning is an extremely creative, novel way to indicate to the user the relative speed of their jogging parter. This clearly seems to be influenced by research done in the area of ambient/passive displays.

I feel there are several areas in which this system could be improved, or concerns addressed. The device's form factor seems overly cumbersome, especially given the wide availability of mobile phones with adequate computing power for managing both calls and audio positioning. This was pointed out by the authors as an issue which will be addressed, but I wonder why it wasn't a hard requirement since design inception. Also, in relation to the audio positioning, I was surprised to read that off the shelf quadraphonic headphones were unable to give the users proper discrimination between sounds in front and behind them. The solution proposed, to have the forward axis tilted to a 1:30-7:30 orientation, seems undesirable. It would be interesting if the authors could study this orientation with respect to the route runners take, to see if it leads them on a series of gradual right-hand turns (as one runner follows the virtual runner in front of them). Also, if the audio location features were omitted, how is this technology different than a simple cell phone and head set? In this case, the merit of the authors' work is not the technology, but the sociological study they can perform with it.

This brings me to my next line of thought, which is concerned with the marketability of this 'device'. If a jogger typically jogs for an hour, once a week, and uses the Jogging Over a Distance to effectively make a one hour phone call to someone who is too far away to run physically with them, will this not present a large associated cost for the jogger? For the purposes of research, this is not an issue, but if it were to be a consumer product, this would have to be solved.

Monday, October 27, 2008

The Familiar Stranger: Anxiety, Comfort, and Play in Public Places

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/08-paulos.pdf

The authors present the results of a study conducted to examine the results of two studies conducted to determine the extent to which the Familiar Stranger concept still exists in modern urban scenarios, and whether or not a ubiquitous social networking solution could leverage this effect in a useful way. The first study accurately recreates Stanley Milgram's 1972 Familiar Stranger study, in which photos of a light-rail station at morning rush hour are distributed to individuals at the station, and familiar strangers are determined by labeling the people in the photo. The second study consisted of an urban walking tour, in which participants indicated their level of familiarity and comfort in certain locations based on four dimensions: number of familiar people in the area, degree of familiarity with those people, have familiar people visited this place before, and do the people currently here visit the same places I do? Using these results, the authors propose Jabberwocky, a device used to tag familiar items, locations, and individuals. In this system, Bluetooth connected devices (base stations, cell phones, and iMotes) communicate to provide a measure of familiarity to the user about their current location.

I found this paper to be particularly informative from a purely sociological standpoint. Although the concepts presented cannot be wholly attributed to the authors (Milgram's study was novel in 1972, but not in 2008), they are none the less insightful. One noteworthy design constraint imposed by the authors is that any ubiquitous device which functions based on familiar strangers must not encourage explicit interaction with said strangers. The authors argue that the existence of familiar strangers is an indicator of a healthy urban community, and not a negative or anti-social aspect. Further observations on people's behavior, such as frequently checking one's cell phone in unfamiliar settings, adds a depth of insight to the paper.

The majority of the limitations found with this study are attributed to the technical merits of the device proposed. The Jabberwoky platform seems to be applicable to most of the situations applied in the paper, but using the Motes to tag static objects or locations seems a bit infeasible. For example, leaving these devices attached to public structures brings into question the cost of the device and the possibility of theft. Perhaps a less physically obtrusive solution for tagging familiar locations would be to submit the GPS coordinates of the location to a central server, which can be queried at the same time and with the same frequency of the Bluetooth polling.

Augmenting the Social Space of an Academic Conference

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/08-mccarthy.pdf

The authors present the results of deploying two proactive displays into an academic conference setting: AutoSpeakerID (ASID) and Ticket2Talk (T2T). Both these systems leverage RFID tags that are physically installed into conference attendees' badges, which are paired with an profile containing personal affiliation information and a photograph. The ASID display consists of a RFID reader embedded in a microphone stand and an accompanying large display. In this way, when an attendee approaches the microphone to ask a question, their information is rendered on the display, providing context for their question. The T2T system is of similar configuration, in that it has a display which renders an attendee's profile when they come into proximate context with the display. However, T2T is installed at refreshment stations to promote personal interactions between attendees.

The novel element presented in this paper, as the authors point out, is the close focus on the evaluation of these devices. Each system is examined thoroughly by the researchers, gathering qualitative observational and questionnaire data. These results were used to gauge the systems' performance in the areas of Enhancing the Feeling of Community, Mesh with Established Practices, and Privacy Concerns. Some unexpected, yet somewhat beneficial results were produced when users attempted to 'game' the system buy providing falsified, comical profiles (ie. The Bill Gates profile).

Although the authors focused on qualitative analysis of their displays, I think further investigation is required before coming to definitive decisions on the systems' utility. Although this is clearly a difficult domain to measure, it is generally proposed that user surveys/questionnaires can skew results. For example, the results of the survey for the Ticket2Talk system reported 41% positive feedback and 3% negative feedback, with 66% of the attendees unaccounted for. If we take into consideration a variation of Self-Selecting Respondents, we could propose that the participants who found the system very useful were motivated to fill out the questionnaire, and those who strongly disliked the system were motivated to distance themselves from the system, it would not be a stretch to propose that the majority of the 66% unaccounted attendees had a negative view of the system, and so the results of the questionnaire are invalid.

A Taxonomy of Ambient Information Systems: Four Patterns of Design

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/07-a-pousman.pdf

This paper presents the current state of the art in ambient or peripheral information displays. The authors propose four dimensions on which the currently available ambient systems can be measured: information capacity, notification level, representation fidelity, and aesthetic emphasis. Information capacity measures the number of information sources a device can display. Notification level indicates the degree to which the system will interrupt the user, or demand their attention. Representation fidelity measures the level of abstraction in the data representation. Finally, aesthetic emphasis measures how important aesthetics are to the device's designers. Based on these four dimensions, the authors propose four patterns of design in this domain: Symbolic Sculptural Displays, Multiple Information Consolidators, Information Monitor Displays, and High Throughput Textual Displays.

The authors do a very nice job of describing the current areas of research and development in ambient and peripheral displays. The exercise of classifying current projects on their four selected dimensions is quite insightful, and serves greatly to provide organization and structure to the field.

This paper is lacking in a tangible contribution to knowledge, however. In the paper's introduction, it mentions at least one other existing method of categorizing ambient and peripheral displays. I can see no measure that indicates that the new classification system which is proposed here has any advantage over existing methods. Also, these patterns of design have been used to classify existing projects, but how could they be used to facilitate the creation of new products, in much the same way that OOP design patters are used? Finally, it could be suggested that the four patterns proposed are insufficient to categorize all possible ambient displays, since it is incapable of being applicable to all devices in the sample used in the paper. Perhaps there are more than four patterns that can be extracted from the four dimensions of classification proposed by the authors, and perhaps there are even anti-patterns to be found within these four dimensions, that would result in poor ambient displays.

Heuristic Evaluation of Ambient Displays

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/07-mankoff.pdf

In this paper, the authors present a method for evaluating ambient displays using a technique similar to Nielsen's Heuristic Evaluation. A set of heuristics are created, by modifying the Nielsen heuristics, specifically tuned for ambient displays, hereafter referred to as the Ambient Heuristics. Two evaluation groups are then formed to apply heuristic evaluation to two novel ambient displays, both developed by the authors: busMobile and daylight display. BusMobile is a simple ambient display which shows the locations of campus buses, relative to the building in which the display is placed. Daylight display uses a lamp to convey the brightness level outside to users in a lab with no windows. The use of Ambient Heuristics showed an increased ability to find severe usability issues over the Nielsen heuristics.

The authors have presented an insightful tool for evaluating the usability issues of ambient displays. This revised set of heuristics provide a cheap, effective way for researchers to evaluate the usability of their ambient display products.

I feel that this study would be more valid if the Ambient Heuristics were applied to more than just the two displays created by the authors. There are a number of existing products/projects that could have been used as samples in this study. This would both fulfill one avenue of future work, and reduce cost on the authors because they would not have to develop their own displays simply for the purpose of testing the evaluation scheme (that is, unless the display devices were existing projects). In addition to this, it would be interesting to see if the margin of difference between the number of issues found with Ambient Heuristics and the Nielsen Heuristics scales to larger sets of data (ie. More than 30 possible issues). If the gap were to widen, it would imply the Ambient Heuristics are at an advantage for finding issues in this domain. However, it seems then counter intuitive that the Nielsen Heuristics would be capable of finding issues that the Ambient Heuristics are not. Perhaps the heuristics need to be adjusted so that they find a superset of all issues found with the Nielsen set.

Thursday, October 16, 2008

Hardcore SE Papers

Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity and Zipf's Law
http://www.cs.toronto.edu/~gvwilson/reading/veldhuizen-libraries-reuse.pdf

This paper presents an interesting argument, stating that the entropy of a given problem domain can be measured, and in so doing we can predict the amount of library reuse that is appropriate, or indeed possible, for programs in that domain. Also, a handy proof states that the only complete library for a given domain is of infinite size, effectively securing jobs for library writers in the future.

Unfolding Abstract Datatypes
http://www.cs.toronto.edu/~gvwilson/reading/gibbons-unfolding-adt.pdf

This paper should have been called "Unfolding Abstract Datatypes in Functional Programming", so that I had a valid warning before I started reading it, and by reading I mean staring at the pages looking for something I understood. Here's what I got out of it: in functional languages, ADTs are possible, but less commonly implemented than in OOP languages. The primary reason for this is because most user-defined types expose their data structures so that you can do pattern matching on them. The paper argues that this is bad, and proper information hiding can be obtained without breaking matching ability. Also, ADTs represent codata (whatever that is).

Tuesday, October 14, 2008

Asking and Answering Questions during a Programming Change Task

http://www.cs.toronto.edu/~gvwilson/reading/sillito-questions-program-change.pdf

This paper presents a study conducted by getting programmers (students and professionals) to work while thinking out loud, then categorizing the questions they ask themselves (and their debugger). These fall into 44 distinct categories, under 4 main groups. Following this, each of the categories is analyzed to see if existing tools are able to directly answer the question proposed.

I found my mind wandering around while I was 'reading' this paper. Not that the subject matter is uninteresting, far from it. I found myself coming up with ideas for new tools all throughout this read, which eventually started to detract from the primary text itself. Anyway, the following are some thoughts:

There is mention of how programmers divide their workspace to show them, for example, code and executing program, or two code files, etc, by using emacs screen splitting, multiple windows, or multiple monitors. I wonder what the results would be in a study where we a) measure a programmer's productivity with one monitor, then b) add a second monitor (I'm pretty sure this has been done before), allow them to get used to it (productivity should plateau), then remove the second monitor. I predict that productivity will drop below that measured in a) for a while, then gradually come back to a nominal level.

Of the two groups studies (students and professionals), both had a single category of question (of 44 possible) that was asked vastly more than all others. For students, this was "Where is this method called or type referenced?". For professionals, this was "What will be (or has been) the direct impact of this change?". A couple of things come out of this.

First off, students seem more concerned with direct program behavior or structure, while professionals are concerned with impacts of code change. This seems like a much more organization-oriented behavior. I'm having trouble expressing my exact idea here, so I'll come back to it. Bottom line, is that professionals are less hack and slash than students.

Secondly, there are tools for addressing the students' question. Why aren't they using them? The tools for the developers' questions, however, are lacking. Can we make them better?

Exemplar-driven documentation. There is discussion in this paper about finding examples of the type of operation one is trying to create or modify within the subject code base, and using that as a template for the new feature/modification. I wonder if this could be applied not only to the target code base, but to any code base (or indeed every code base). Lets say, for example, I want to implement a convolution matrix to do a gaussian blur over a java BufferedImage. Imagine I had a search engine that would search the code of a vast number of open source code bases, with some natural language query, and returned code snippets of convolution matrices over BufferedImages. Useful? I dunno, just had to write it down before I forgot it.

This leads into another idea that popped up. A few months ago, when I had a job and spare time, I was playing around with the jMonkeyEngine, which is a handy little open source scene graph for java, based on JOGL. Its documentation is in the form of a Wiki, which unfortunately has a bunch of holes in it. However, I found that downloading the source trunk and looking at the extensive unit tests was a much better learning tool. I simply loaded the unit test hierarchy into the IDE, looked for a test for the feature I wanted to use, ran it to see it work, then looked at the test code, which by definition is short and concise. I propose a study where we take two groups of developers and one large API, and task them with implementing a given application off of this API. One group will have standard documentation, and one will have a complete set of unit tests. Let them go and check the results. If the unit tests turn out to be better, this would be a huge boost for the motivation for TDD.

Two more quick ideas, and then I'm done. This one relates back to finding usages of methods/classes, as was one of the prime questions asked by students in the paper's study. Using a 'Find Usages' feature in an IDE can solve this, but it is not the most efficient when looking for loose relationships between two or more elements. What if I wanted a tool that was "Find Usages of these TWO methods" or three or four or etc. Basically, find the class,method, block, or statement which uses all of the given input elements. I think this would be handy.

Lastly, the paper used ArgoUML as its code base for the student tests. The authors had the students fix bugs submitted via the ArgoUML tracker. I wonder if there's a market for shopping out bug fixing time to ethnographic research subjects?

Monday, October 13, 2008

Conceptual Modeling Extraveganza

Three papers concerned with conceptual modeling:

First off is Ross' seminal 1977 paper Structured Analysis (SA): A Language for Communicating Ideas. I don't think there's much I can say about this paper that hasn't been said already, so I'll keep it short. This paper presents the argument that "SA is the best thing since sliced bread", and continues to illustrate this point they present pretty much the entire meta-model for SA, and go though, in great detail, all the primitive constructs in the SA vocabulary.

Prof. Mylopoulos wrote an interesting opinion article Desert Island Column: A Trip to Carthea praising the previous paper's insight. One thing that Prof. Mylopoulos brings up is that "the world consists of more than things and happenings", which seems to be something that Ross argues strongly against in his paper. One of the cornerstones to the Ross paper was that anything worth talking about consists of 6 or fewer things and happenings.

Lastly, I took a look at Jennifer's paper Reflective Analysis of the syntax and Semantics of the i* Framework. Now, to be honest, I would have gotten a lot more out of this if I was more familiar with the i* syntax, but the idea of reflective analysis that this paper presents could be applied to any modeling tool. A study conducted by Horkoff et al looked at assignments and research papers in the community, and recorded the most frequent deviations from the U of T i* syntax, and proposed that these deviations were made due to non-optimal design choices in the languages syntax.
A couple of useful results came out of this investigation. First, the authors propose modifications to the i* syntax to address these common mistakes, and some conclusions are drawn about how users are learning i* (ie. not enough focus on areas where syntax mistakes occur). I wonder if a similar study has been done for SA or UML, and if so what their conclusions were (ie. UML activity diagrams are never used?)

Also, I was surprised to hear so many positive opinions about SADT. In my undergrad, it was touched on briefly in a Requirements Engineering class as something that was 'old, and not used by anyone anymore'.

Tuesday, October 7, 2008

Evaluating Effectiveness Efficiency of TDD

http://www.cs.toronto.edu/~gvwilson/reading/gupta-effectiveness-tdd.pdf

Kind of a hokey paper that describes an ethnographic study done in India comparing up-front design-code process to test driven development. Related work shows that a) tdd is more efficient, b) tdd is less efficient, and c) there is no difference, so it seems clear that there is still debate over this issue. This paper presents evidence (not vast amounts, just more ammunition for the debate) in support of TDD, ultimately coming to the conclusion that developers will probably prefer a modified version of TDD, in which more design is done up front, but still using the test-code-refactor waltz.

12 Steps to Better Code

I read this one before it showed up on the reading list :) Very good article. If more dev shops followed Joel's 12, our job would be much more enjoyable. Restated here for completeness, every successful software team should pass the following 12 tests:

The Joel Test
1. Do you use source control?
2. Can you make a build in one step?
3. Do you make daily builds?
4. Do you have a bug database?
5. Do you fix bugs before writing new code?
6. Do you have an up-to-date schedule?
7. Do you have a spec?
8. Do programmers have quiet working
conditions?
9. Do you use the best tools money can buy?
10. Do you have testers?
11. Do new candidates write code during their
interview?
12. Do you do hallway usability testing?

If you answered 'no' to more than 2 of these, you're not doing things properly. Pokes holes into some common patterns seen in software houses these days, like the low-walled cubicle bullpens and pushing off bugs to the end of the iteration. I think this is one that every developer and manager should commit to memory, but I certainly wouldn't call it science.

Monday, October 6, 2008

Learning TDD by Counting Lines

IEEE Software May/June 2007

Interesting little paper describing Nokia Networks migration from waterfall to agile, and the tools used by the group training Nokia's developers on how to use test driven development. The exercise in question talks about the first step the new developers took in creating a program to count non-commented lines in a source file, using a TDD approach. Start off with the low-hanging fruit (a test with an input program of one line), and scale it up until hitting a code-wall. The third movement in TDD, refactoring, is apparently often overlooked by new TDD developers (I'm guilty of this, too). This includes not only refactoring commonalities in the test cases, but in the production code and even refactoring the design of the production code. Emergent design, in this case, means more than just making the most logical step at each point and hoping that the best design will result.

Anyway positive results on a non-statistically significant (12) number of development teams.

Sunday, October 5, 2008

Replication in Research

Just read a couple papers concerned with replicability in research. That is, authors of academic papers publishing, along with the body of their paper, the data and code used to generate their figures and come to their conclusions. The primary moral is this: not enough structure is in place to make authors prove that their methods actually work, but the technology to distribute the materials required to double-check their findings is widely available, just not widely used.

I wonder if the previous article (size confounding metrics) has been replicated?

Saturday, October 4, 2008

The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics

A very good paper, with significant results. Research into metric code analysis on object-oriented programs shows that the most significant metric that can be used to infer the fragility of a class is the classes size (LOC), and that most other existing metrics are simply layers of indirection on top of class size (ie. large classes have higher coupling). I'm not sure the actual impact this paper has had on the community, but I would think it should be fairly revolutionary.

One thing that sort of rubbed me the wrong way was in the paper's introductory phases, they discussed procedural vs. object programming, and implied (with some references) that OO programs are in general harder to maintain. I wonder about this statement.

Tuesday, September 30, 2008

SenseCam: A Retrospective Memory Aid

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/04-hodges.pdf

The authors of this paper illustrate the findings of the initial clinical trial of SenseCam, a small form factor camera which is worn around the neck and records still images both at regular intervals and when the device's sensors are stimulated. This creates a pictorial record of events that happen in the wearer's immediate proximity. The motivation for this record is to aid in rehabilitating memory loss sufferers. The trial presented in this paper has shown a marked improvement in the cognitive ability of the test subject. However, further clinic trials are required before a definitive statement about its merits can be made.

I think that one of the most significant insights the creators of SenseCam had was that the success of the device relied critically on a small, compact form factor. Earlier incarnations, which utilized mobile PCs carried in a backpack, would be too unusable to have a net benefit for a patient. Ease of use is of vital importance. Another important point in the discussion of the clinical results is that the authors make the distinction between the patient remembering the events recorded by the SenseCam instead of remembering seeing the pictures it recorded in previous sessions. Although the patient claims to be remembering the actual events, I believe the experimental method could be altered to assert this claim more concretely.

The SenseCam represents a simple enough product, and it seems obvious that it could benefit a patient suffering from a memory dysfunction. However, before it can claim to out perform other methods, I believe further clinical study, under more controlled circumstances, needs to be carried out (it should be noted that the authors freely admit this, it is just being restated here for completeness). Primarily, more patients need to be examined; a single case study is not sufficient. In addition to increasing the number of subjects, the number of 'important events' recalled by each subject should be increased as well, preferably to some statistically significant level.

In addition to a small sample size, there is a strong possibility that the results for the single given sample, recorded by Mr. B, may have been skewed, perhaps even unintentionally, due to the nature of Mr. B's relationship with the subject. A more pure result would be obtained by using an impartial third party to administer the tests to Mrs. B.

Designing Capture Applications to Support the Education of Children with Autism

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/04-a-hayes.pdf

The authors present three prototype devices for assisting caregivers dealing with children with autism (CWA). The first, called Waldon Monitor (WM), consists of a wearable video camera and Tablet PC for observing and recording the behavior of CWA. Secondly, the Arabis system replaces traditional pencil and paper based recording with a Tablet PC, and is used by caregivers to record a subject's performance in one-on-one testing and diagnosis. Finally, CareLog is a distributed, ubiquitous system for recording data about a child from any wireless accessible device, including a cell phone, PDA, or PC. A therapist can record and access data using any of these devices, which is stored on a mobile storage unit that is co-located with the subject.

The most insightful findings that these studies show is the importance of properly planning for complicated, multi-user interaction with their devices. The proposed systems seem trivial (video recording unit, software for tabulating test scores, etc), until the use cases are presented, which span multiple users at different times, with very different goals for the data, potentially at different stages in the recurring care cycle. That is, a therapist may use CareLog to record data about a CWA in an intermediate phase of the care cycle, and an analyst will use the collected data, aggregated with past results, to make a diagnosis and set goals in the early stages of the next iteration of the cycle.

One area in which the proposed devices can be improved is in CareLog's portable storage unit. Although this is a novel approach, I believe that the same functionality could be achieved with less cost if the data were stored on a remote, web-accessible server, instead of in a device that needs to be carried around the by subject. This way the physical hardware cost is reduced, and the subject can't loose or destroy their own data. Also, using an existing commercial product to perform the data analysis required for these prototypes could reduce the upfront cost, making them easier to adopt by caregivers and therapists.

Monday, September 29, 2008

Ideas

I'm taking this opportunity to write down some ideas I had during today's talk, so that I don't forget them :)

Non-standard programming interfaces - what could we do if we wrote programs in a WYSIWYG editor? Or an auto cad type editor? Or any domain application? What would that be like? Also, could you bootstrap such a system (ie. implement the non-standard language using a non-standard language)? Intuitively not, but maybe that's why we haven't figured it out yet.

Program diff - diff a program not as a text file, but as a syntatically (and semantically?) correct piece of code.

Program representation - how can we represent a program other than a bunch of lines of text? Does this apply to the previous point?

BRAINSTORM!!!
Gdankin problems!! = design patters. According to Greg's definition (going to double check this against wikipedia in a minute), a Gdankin problem is one that results in the same solution when solved by independent domain experts. Now, as I understand it, when the gang of four first wrote Design Patterns, they analyzed large amounts of code, created by programmers in different organizations, independently of each other, and noticed that certain problems resulted in similar structures in the code. Therefore, I propose that the Design Patterns are the common solutions to a set of fundamental Gdankin problems.

Tuesday, September 23, 2008

The CHAOS Boondoggle

So the Standish CHAOS report was released in 1994, estimating that the majority of software projects go over-budget by 189%. The presentation of the report was questionable at best, and the data contained in it seemed inconsistent, with little more than hand-waving to back it up.

Enter the second paper, Jorgensen and Molokken, actively calling out Standish on the numbers it presents. Good for you, Jorgensen and Molokken!

In the interview, the interview does a pretty weak job of grilling Standish about his numbers, and they both leave having answered very little.

I'm pretty sure Standish blundered this number, and covered its tracks by a) not disclosing research methods and b) significantly altering the following year's CHAOS report.

But this is all just my opinion. I've been wrong in the past.

Why Line for Java Names

WTF-J = Whyline Toolkit For Java

WTMF-J = Whyline Toolkit (Multi-threaded) For Java

Monday, September 22, 2008

Automatic Bug Triage Using Execution Trace and Ownership Vector Space

Came up with this idea by smashing together two papers from Greg's reading list. Using this for my NSERC & OGS applications, so please don't steal it :)

Previously presented topics on automatic bug triage (directing bug reports to the appropriate member of the development team) showed very bright prospects, but lacked enough accuracy to make them a usable product [1]. This entry point for bug information is also an excellent location to apply any number of other filtering heuristics desired, such as the duplicate detection algorithm proposed by [2]. The method proposed in [2] requires, in addition to a natural language description of the problem, and execution trace that can be used to more quantitatively measure two, or more, bug's similarities. I propose that the same execution trace could be used to assist in the triage functionality described by [1]. Since the vector space created by Wang et. al. to measure bug similarity is based on function calls, a similar approach could be used, requiring the same input execution trace, to determine bug ownership. A master vector space could be created at build time from all possible called functions in a source code repository, and assigning ownership of these functions to developers based on either activity in a source revision system, or some static assignment. This would create volumes of ownership within the function vector space. In theory, a bug report, assuming it is not a duplicate, should be assigned to the developer in whose volume the bug's vector terminates. This may also have interesting and relevant applications to visualizing ownership of code, for management purposes.

[1] D. Cubranic and G. C. Murphy, "Automatic bug triage using text categorization," in Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering, F. Maurer and G. Ruhe, Eds., June 2004, pp. 92-97. [Online]. Available: http://www.cs.ubc.ca/labs/spl/papers/2004/seke04-bugzilla.pdf

[2] X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach to detecting duplicate bug reports using natural language and execution information,” in Proceedings of the Thirtyth International conference on Software engineering

Lazy Delete for Email

So I'm going through my inbox this morning (procrastinating on my nserc application, btw), and I notice I have a lot of messages that take the form "Don't forget about on ". After reading, I don't want to delete this message, because having it in my inbox serves as a nice reminder about whatever it is I'm not supposed to forget about. However, after a week my inbox is now clogged with messages about events that have passed. What I would like to be able to do is, when initially reading the message, click a button next to the Delete button, lets call it Delete On ... and then I can specify a date, so that the message will be deleted once it has become invalid.

Maybe run some kind of computational linguistic method to read the message first and propose a deletion date?

Saturday, September 20, 2008

Context Aware Communications

This paper presents research activities into the field of Context-Aware Communication, which is considered as a subset of context-aware computing applications. The authors structure their presentation into five main categories: routing, addressing, messaging, caller awareness, and screening. Routing involves directing communication (phone calls, text messages) to physical devices in close proximity to the callee, and has been successfully implemented by combining Xerox PARC's Etherphone and Olivetti's ActiveBadge system. Addressing uses context information (“is this user in the building?”) to dynamically adjust traditional email mailing lists. Messaging is similar to context-aware call routing, but will instead route text messages to any proximal device capable of displaying text information. Caller awareness provides callers with information on their contact's context, so that they can actively choose not to call at inappropriate times. Screening is an approach that works in contrast to caller awareness; it filters out incoming calls based on the callee's context.

The authors presented several insightful technologies which were eventually adopted by modern day ubiquitous systems. The first of note was customizable phone ringers, depending on context. In the example, these ringers were used to distinguish callee, even though they have been successfully applied to determining caller in current applications. MIT's Active Messenger bears a striking resemblance to modern cell phone SMS capabilities. Also, AwareNex's context feature has been replicated in countless instant messaging systems. A possible extension of this idea is to incorporate automatic context sensing, via ActiveBadges or some other comparable technology, into current applications which would benefit from context information, such as an instant messaging system. Thereby, instead of manually setting one's IM status to 'on the phone', all one would have to do is simply pick up the phone. Not sitting at your workstation would change your IM status to 'away'.

The primary limiting factor in the applications presented in this paper is the technology that was available at the time it was written. Although the devices developed successfully demonstrated the concepts intended, further effort needs to be made to make these devices more marketable before they will be widely adopted, and form the ubiquitous network required for the proposed applications. It is entirely possible, however, that these advances have been made between the time this paper was published and present day.

Also, the authors made mention of certain situations where the context-aware communication applications would, for example, hold or screen incoming traffic because the callee is at the movies or eating dinner. This was only speculated at, because at the time of publishing the context sensing network wasn't pervasive enough to determine a user's location outside of the office (with the exception of GPS, but that won't work indoors). I propose that this limitation should be included in future systems which can determine a users' context outside of the workplace, to add a measure of privacy to the system.

Context Aware Computing Applications

This paper presents an interesting summary of current (1994) activities in the field of context-aware applications. This consists primarily of applications developed for the workplace which leverage the user's physical location, as well as the locations of coworkers and resources within said workplace. The authors present four areas or application features that rely on context, implemented with Xerox PARC tabs and boards: proximate selection, automatic contextual reconfiguration, contextual information and commands, and context-triggered actions. Proximal selection is a UI technique that visually makes objects closer to the user's physical location easier to select. Automatic contextual reconfiguration refers to a process through which ubiquitous devices (boards, tabs, etc) can be accessed by a user simply by being in their immediate vicinity. Contextual information and commands can be used to display default appropriate information, or alter the standard set of commands (or parameters to these commands) based on the user's location. Lastly, context-triggered actions represent actions (in this paper, unix shell commands) that are executed by context events. That is, when a predefined context state occurs.

The application features that Schilit et. al. present in this paper have proven to be insightful in that they have found their way into many pieces of modern ubiquitous devices. The proximal selection UI outlined bears a striking resemblance to fisheye menus, most notable found in Apple's OSX. Automatic contextual reconfiguration performs a similar function to modern BlueTooth networks, with the exception that BlueTooth doesn't rely on a centralized network to control all devices within a building.

One issue that I believe the authors may have overlooked, especially in respect to proximal selection and contextual reconfiguration is that of permissions. In the example of a user printing a document, the closes printer may not be the optimal choice if it is a restricted resource. Aside from this, the only element of context that the authors seemed to use was the location of individuals and resources. Other environmental variables could be utilized to augment the function of a ubiquitous device to better suit the needs of it's user. For example, by monitoring the ambient noise level around the user, a phone could decide change its ringer volume to match, or simply vibrate if the room is too noisy for the device to compete.

Thursday, September 18, 2008

Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior

http://www.cs.toronto.edu/~gvwilson/reading/ko-debugging-reinvented.pdf

This paper is an interesting follow up to an earlier paper I read about Why Lines. If you're not familiar, a Why Line is a debugging tool that instruments a piece of code, allows you to execute the code (for a brief time, ~ a minute), then use a custom UI to ask questions about the output (ie. Why is this circle red?). Experimental results (on an admittedly small test size) showed dramatic improvement in debugging time.

There was a paragraph in this paper that I feel the authors glossed over. It concerned translating user submitted bug reports into Why Line questions. My applying some computational linguistics techniques, I bet it would be possible to automatically generate one or many Why Line questions from a bug report. Combine that with the previously presented technique on automatic bug triage, and you have a system which will (in theory) automatically assign bugs to particular developers, and present them with a Why Line question that will allow them to quickly assess the problem.

Also, I'm pretty sure there's something that can be gained in an organization by archiving/databasing their Why Line traces, but I'm not sure what yet.

Wednesday, September 17, 2008

Fantastic Contraption

This is the sole reason it took me all afternoon to read Jorge's paper:

http://fantasticcontraption.com/

Anchoring and Adjustment in Software Estimation

http://www.cs.toronto.edu/~jaranda/pubs/AnchoringAdjustment.pdf

This paper presented Jorge's results about anchoring and adjustment when estimating software project time consumption. Looking back at my notes on this paper, everything seems to be presented fairly well, all the numerical results seem sound. Couple of quick things:

The Anchoring and Adjustment phenomenon is clearly observable when the problem can be expressed as a number in a range. Can it be applied to other classes of estimations?

Why is the description of COCOMO so verbose? It don't see how it contributes to the paper, other than to demonstrate that current software estimation techniques don't work.

What is a null hypothesis?

Now, I want to take this opportunity to discuss an estimation technique that a former colleague of mine once described to me:

Estimating the time required to complete a software project is inherently a random activity, and it is generally accepted that estimating smaller, individual tasks within a project is more accurate than trying to estimate the project as a whole.

Lets divide our hypothetical project P into n subprojects, P0 ... Pn-1. For each of these, instead of guessing a completion time, we provide a low ball and high ball estimate (ie. P0 will definitely take more than a week, but less than 4).

We take the sum of all the low ball estimates, and the sum of all the high ball estimates, and we get two figures, one for the earliest possible completion time for P and one for the latest. While this may be sufficient for some, one further refinement is to apply a Gaussian distribution between these two figures, that way you could say with some degree of reasoning, that project P has an 80% chance of finishing in X weeks.

Ferenc, I apologize if I've gotten any of this wrong.

Tuesday, September 16, 2008

Google Integration and Tools

Couple of interesting google tools that I have come across in the last couple of days, probably old hat to most of you.

Google Reader (http://www.google.com/reader). This is an online, customizable RSS/Atom aggregator. Pretty handy for keeping track of slashdot, zdnet, ieee, and, for example, 15 graduate student blogs (like this one).

This last one doesn't exist yet, but I want my facebook events to be synchronized onto my google calendar. Sounds like a rainy friday afternoon project.

A Field Guide to Computational Biology

http://www.cs.toronto.edu/~lilien/CSC2431F08/readings/CompBioGuide.pdf

This magazine article presents the opinion that computational biology will without a doubt be the cause of future miraculous advances in biology, disease, and gene research. Also emphasizes that traditional biologists will have to get used to using higher level mathematics than they are used to, and much more interdisciplinary interaction with others.

Good, not great.

Can a Biologist Fix a Radio?

http://www.cs.toronto.edu/~lilien/CSC2431F08/readings/CanBiologistRadio.pdf

Explores the idea of having a formal language for expressing biological processes and structure, using the analogy of trying to fix a radio. An engineer would use a 'formal language' to describe the internal structure of the radio (amplifier, 10k ohm resistor, etc), and deduce the problem that way. A biologist would likely spend years of comparative research on other working radios, classifying parts based on phenotype, etc. This system quickly becomes too complicated for any one person to understand, primarily due to conflicting definitions of parts originating from different researchers. The author proposes that formal methods and language for biology will make cellular analysis much easier and vastly different, in the same way PowerPoint revolutionized slide-based presentations, and that biologists much catch up or be left by the way side.

Monday, September 15, 2008

NaviCam

In Ubicomp today, while discussing a paper previously presented in this blog (see The Human Experience), we got to see a video of Sony's NaviCam system in action (see link http://www.sonycsl.co.jp/person/rekimoto/navi.html). Seeing this system actually working was pretty cool, and a handful of applications extending this functionality immediately jumped to mind, all of which could be potential thesis topics, or business plans.

Combining the augmented reality capabilities of the camera & display setup with a wearable, glasses-based display could allow application developers (like me :P) to create real-time navigation software, meta-information pop-ups, and all kinds of cool stuff!

One immediate thought that jumped to mind as a detractor was the image I had seen of some geek in the wearable computing field with a webserver in a backpack that he lugged around everywhere. No consumer would buy that, but then I thought that if this backpack could be shrunk down to the size of an iPhone, which it almost certainly could, then this would be a viable market opportunity.

One target audience for such applications would be the military. Personnel in the field could have information on way points, location of friendly/hostile persons, radar & network coverage, etc, overlayed on top of real world vision, eliminating the need for secondary maps & gps devices. And, the military's level of network connectivity is legendary, so gaining access to this information is essentially a solved problem.

Iunno, just an idea. Sounds like it would be fun to tune around in one of these sets, seeing maps and stuff overlayed on regular vision.

Sunday, September 14, 2008

Gregory D. Abowd et. al. The Human Experience

Gregory D. Abowd et. al. present a follow-up to Weiser's The Computer for the 21st Centry. In this paper, the authors closely examine some of the ideas proposed by Weiser, and explore the changes in traditional design and development patterns needed to adopt ubiquitous applications. This is loosely broken down into three categories: defining physical interaction models to and from ubiquitous computing devices, discovering ubiquitous computing application features, and evolving the methods for evaluating human experiences with ubicomp. The physical interaction problem examines new ways to gather input from a user, beyond simple keyboard/mouse combinations. This includes gesture-based and implicit input. Also, non-standard output methods are explored, different from the traditional video display (ex. Ambient output). Utilizing these non-standard IO methods to create a 'killer app' is the next challenge the authors discuss. Applications which use the user's context (location, identity, etc) as input for providing useful features, as well as relative changes in context, are discussed. The use of changes in context lead to the problem of continuous input, whereby applications must respond to constant subtle input from users over extended, possibly infinite, time frames. This is in sharp contrast to current application mentalities, which are meant for discrete usage sessions (ie. Word processor). Lastly, the authors propose that traditional HCI evaluation techniques will be at a disadvantage when used with ubiquitous computing applications, and so they introduce three new cognition models: Activity Theory, Situated Activity, and Distributed Cognition.

This paper provided several new examples of ubiquitous computing devices and applications, and served to 'pin down' some of the specific details that Weiser left for further research. The showcase of new devices and technologies clearly illustrates the path of development between the time the Weiser paper was published (1991) and 'present' (2002). One noteworthy insight presented by Abowd et. al. is that of the physical means of interaction with ubiquitous devices, drawing particular attention to 'implicit input'. It seems apparent that the future of the embodied virtuality will not be interfaced with a keyboard and mouse, and the devices presented in The Human Experience demonstrate subtle input and output methods (ex. Network traffic monitor) which clearly show success.

Although this paper provided excellent insight into the concepts previously proposed by Weiser, I feel that it didn't introduce as much original work as could be possible. It can be argued that this was the purpose of the paper, in which case it has succeeded. However, I feel it may have contributed more value to the scientific community had it contained more unique ideas. Apart from this, the paper presented a discussion of using ubiquitous computing applications to perform one of the fundamental activities humans perform on a daily basis: capture and access of data. That is, the recording of information presented by a colleague and summarizing it for later retrieval. It is my opinion (and this clearly is not, nor should it be, shared by all) that automating a fundamental process such as this will contribute to a strong dependence on said application, reducing an individual's ability to be self reliant. In addition, it is not beyond the realm of possibility that regularly exercising the intellectual system by absorbing and recording information in this way is beneficial, and removing the need for this exercise could have negative impacts on cognitive ability. This, however, just just my opinion. This represents an area of further research, which should be pursued with as much importance as the technical developments in the field of ubicomp (that is, the implications of ubiquitous devices).

Mark Weiser's The Computer for the 21st Centry

The paper by Mark Weiser presents the current (as of time of publishing, 1991) efforts of Xerox PARC in the field of ubiquitous computing devices and applications, and follows this with speculation/projection of possible future scenarios. At time of publishing, these devices were divided into three classifications: tabs, pads, and boards. A tab is similar in size and function to a post-it note, except that it contains a dynamic display. A pad can be thought of as a scrap piece of paper, but again with a dynamic display and stylus-based interface. A board functions like a dynamic white board, or any large display. The intermixed use of these three classes of devices, the author argues, will form the basis of the future of ubiquitous computing in the “embodied virtuality”. Extrapolation on trends in technology evolution suggests that it would be capable to implement Weiser's embodied virtuality in the not-too-distant future.

I found the discussion of 'current' technology and research into ubiquitous computing devices quite interesting. I was previously unaware of the efforts of Xerox PARC toward these ends. The details of these devices operation, at the time, were a major innovations and greatly enhanced the field of computing. Also, the author presents a very insightful discussion on how reading became a ubiquitous technology, and how this can be used to define computing (ie. Anywhere you currently read, you could potentially compute as well). This idea, coming from the father of ubicomp, is of enormous significance to the scientific community. The paper mentions that this idea of ubiquity, in the same way as reading, means a drastic change in not only application features, but methods for measuring terse human actions which will eventually define the features of ubiquitous applications.

While the majority of Weiser's paper was interesting and beneficial to the scientific community, there were some holes which needed to be filled in before ubiquitous computing can fully take hold. The issue of privacy and security is one which I feel requires further investigation. The proposal to have relatively loose security seems like a bad idea to me. For example, in a current system, it is impossible to 'break in', until someone finds a way never thought of by the developers of the system. This makes Weiser's argument that someone can 'break in', but it is impossible to be unnoticed, invalid. This presents a possible area of further research: how to guarantee that an unwanted intruder can leave 'fingerprints' which can readily be discovered. Also, if embodied virtuality were to be implemented, I believe much more emphasis should be placed on an individual's privacy than is proposed in The Computer for the 21st Century. Simply having a system that knows where an individual is located at all times represents a serious invasion of privacy. I would propose a system where an individual can disconnect themselves from the ubiquitous network, should they so desire, and possible 'black-out' zones, in which no ubiquitous devices will operate.

A Reference Architecture for Web Servers

Don't let this paper's name fool you, this one is (at least in my opinion) presenting a process for semi-automatically generating reference architectures for any established domain, not just web servers. Having said that, a lot of really interesting, useful information is presented about three popular servers: Apache, AOL, and Jigsaw. Not particularly deep, however. The methods for automatically generating the architecture were glossed over, as the authors were using an existing tool. Next, they simply refined the architecture until it fit the 3 example architectures. I think it could use more meat. Still, a very interesting (and dare I say,entertaining) read, especially if you're familiar with the web domain.

Oh, and there were no 'results', per se. Wonder if they should have some.

Applying Complexity Metrics to Measure Programmer Productivity in HPC

This paper is based around a fairly ambitious task: to measure programmer productivity. As far as I understand it, measuring productivity of a worker outside of an assembly line scenario has been an open topic for as long as there have been assembly lines. Despite that, the authors discuss the set of tools they have used to instrument programmer's workstations, and compare and contrast differing methods of performing the same task; from a command line interface and from a GUI. Much to everyone's surprise, the GUI was quicker and easier to use!! (insert sarcasm) The two methods are measured using a metric devised by the authors, which it seems to me suffers from the same problems as the paper on Measuring Configuration Complexity I read earlier last week; the outcome depends wholly on the heuristics used to define and quantize a single step in work (or configuration), which is still a very open research problem. Also, one of the fundamental assumptions used to build the paper's heuristics, that more steps equals more complexity, is disproved by example in the paper's final movement! Overall, a valiant attempt, but I don't think I would endorse this one.

Agile Software Testing in a Large Scale Project

A fantastic article! This paper presents the findings of a team of developers employed by the Israeli Air Force. This team began the project with very strong focus on Agile processes, particularly TDD and short iteration lengths (at least these are the ones that stuck with me). This team has, self admittedly (at time of publish), one of the most complete sets of data recording an Agile project from inception to delivery. The results show a project which appears to have been run rather smoothly; the up-front testing influenced the amount of defects (compared to what previous result, though?)

More opinions, to be continued ...

In Praise of Tweaking

This paper presents the results of the second annual Matlab tweaking competition. In this contest, users are tasked with solving an algorithmic problem while minimizing a 'score' created by composing various metrics about the algorithm's performance, such as speed and resources consumed. All submitted algorithms are immediately available to all other contestants, and so a new solution can (and is encouraged) to be created by 'tweaking' someone else's work. In this way, programmers are encouraged to adopt a greater community-centered process.

Although the idea presented was intriguing, this is a magazine article and contributes little to the scientific community.

Storytest Driven Development

Started off promising, but didn't really deliver. Who writes these story tests? I don't think a customer alone will go though the effort of formalizing the requirements using FitLibrary, but the developer shouldn't be writing them alone, either, because he shouldn't be deciding the requirements of the system. Became harder to follow as it progressed (lots of flipping between figures and paragraphs of text). Also, where are the results? The idea of story tests is presented well enough ,but its not novel, and there is not indication in this paper of who actually uses this approach. Maybe I'm being too picky or it is too late at night. Maybe read this one again later.

Automatic Bug Triage Using Text Categorization

An extremely fascinating topic concerned with improving development process, not only in open-source projects, but in any project with a reasonably sized team and automatic bug tracking. A real money saver! A well presented paper (even though I got sort of lost in the details of the Bayesian algorithm). Results were presented clearly, even if they weren't as accurate as the authors would have liked. No effort is made to disguise the inaccuracy of the method. I think this idea is worth looking at, just need to swap out the Bayesian algorithm for one that actually works.

Presentations by Programmers for Programmers

Sounds like a very useful tool (gives the ability to queue up a live IDE session, with the UI tuned for a presentation, along side other presentation materials such as PowerPoint slides). I'm a bit skeptical as to whether the up-front cost of setting up this queue is worth the added visual appeal during the presentation (would have to be more of a formal programmer-to-programmer presentation, like in a conference). Combined some existing products to create this. However, where are the results? There is no study or empirical evidence to support that this product is actually useful. I would classify this as a report on a new product, not necessarily a research paper.

An Approach to Benchmarking Configuration Complexity

An interesting idea; this paper attempts to quantitatively measure the effort required to properly configure a new piece of software. In their example, they use a web container. This could provide a way to measure, and presumably, compare the configuration effort required between different products. The problem I found with this method was that the fundamental step in the process, determining the atomic configuration actions and assigning probability of failure to them, is an open research problem, and so the results reported in this paper are speculation based on hand-tuned data. Seems like they tried to slip this one past me.
Also, really bad graphs and mislabeled figures.

Inaugural Post

So this is the first post on my shiny new blog. I tried blogging once before, but it was pretty much just an exercise in setting up a persistent web container on my own hardware, and seeing how long it took for me to get bored of it. Two weeks. Now, seeing as I have neither the time nor the inclination to run dedicated web hardware, I'm using blogger.

What will follow will be a series of short posts containing my opinions of papers I'm currently reading. Enjoy :)