Monday, October 27, 2008

The Familiar Stranger: Anxiety, Comfort, and Play in Public Places

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/08-paulos.pdf

The authors present the results of a study conducted to examine the results of two studies conducted to determine the extent to which the Familiar Stranger concept still exists in modern urban scenarios, and whether or not a ubiquitous social networking solution could leverage this effect in a useful way. The first study accurately recreates Stanley Milgram's 1972 Familiar Stranger study, in which photos of a light-rail station at morning rush hour are distributed to individuals at the station, and familiar strangers are determined by labeling the people in the photo. The second study consisted of an urban walking tour, in which participants indicated their level of familiarity and comfort in certain locations based on four dimensions: number of familiar people in the area, degree of familiarity with those people, have familiar people visited this place before, and do the people currently here visit the same places I do? Using these results, the authors propose Jabberwocky, a device used to tag familiar items, locations, and individuals. In this system, Bluetooth connected devices (base stations, cell phones, and iMotes) communicate to provide a measure of familiarity to the user about their current location.

I found this paper to be particularly informative from a purely sociological standpoint. Although the concepts presented cannot be wholly attributed to the authors (Milgram's study was novel in 1972, but not in 2008), they are none the less insightful. One noteworthy design constraint imposed by the authors is that any ubiquitous device which functions based on familiar strangers must not encourage explicit interaction with said strangers. The authors argue that the existence of familiar strangers is an indicator of a healthy urban community, and not a negative or anti-social aspect. Further observations on people's behavior, such as frequently checking one's cell phone in unfamiliar settings, adds a depth of insight to the paper.

The majority of the limitations found with this study are attributed to the technical merits of the device proposed. The Jabberwoky platform seems to be applicable to most of the situations applied in the paper, but using the Motes to tag static objects or locations seems a bit infeasible. For example, leaving these devices attached to public structures brings into question the cost of the device and the possibility of theft. Perhaps a less physically obtrusive solution for tagging familiar locations would be to submit the GPS coordinates of the location to a central server, which can be queried at the same time and with the same frequency of the Bluetooth polling.

Augmenting the Social Space of an Academic Conference

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/08-mccarthy.pdf

The authors present the results of deploying two proactive displays into an academic conference setting: AutoSpeakerID (ASID) and Ticket2Talk (T2T). Both these systems leverage RFID tags that are physically installed into conference attendees' badges, which are paired with an profile containing personal affiliation information and a photograph. The ASID display consists of a RFID reader embedded in a microphone stand and an accompanying large display. In this way, when an attendee approaches the microphone to ask a question, their information is rendered on the display, providing context for their question. The T2T system is of similar configuration, in that it has a display which renders an attendee's profile when they come into proximate context with the display. However, T2T is installed at refreshment stations to promote personal interactions between attendees.

The novel element presented in this paper, as the authors point out, is the close focus on the evaluation of these devices. Each system is examined thoroughly by the researchers, gathering qualitative observational and questionnaire data. These results were used to gauge the systems' performance in the areas of Enhancing the Feeling of Community, Mesh with Established Practices, and Privacy Concerns. Some unexpected, yet somewhat beneficial results were produced when users attempted to 'game' the system buy providing falsified, comical profiles (ie. The Bill Gates profile).

Although the authors focused on qualitative analysis of their displays, I think further investigation is required before coming to definitive decisions on the systems' utility. Although this is clearly a difficult domain to measure, it is generally proposed that user surveys/questionnaires can skew results. For example, the results of the survey for the Ticket2Talk system reported 41% positive feedback and 3% negative feedback, with 66% of the attendees unaccounted for. If we take into consideration a variation of Self-Selecting Respondents, we could propose that the participants who found the system very useful were motivated to fill out the questionnaire, and those who strongly disliked the system were motivated to distance themselves from the system, it would not be a stretch to propose that the majority of the 66% unaccounted attendees had a negative view of the system, and so the results of the questionnaire are invalid.

A Taxonomy of Ambient Information Systems: Four Patterns of Design

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/07-a-pousman.pdf

This paper presents the current state of the art in ambient or peripheral information displays. The authors propose four dimensions on which the currently available ambient systems can be measured: information capacity, notification level, representation fidelity, and aesthetic emphasis. Information capacity measures the number of information sources a device can display. Notification level indicates the degree to which the system will interrupt the user, or demand their attention. Representation fidelity measures the level of abstraction in the data representation. Finally, aesthetic emphasis measures how important aesthetics are to the device's designers. Based on these four dimensions, the authors propose four patterns of design in this domain: Symbolic Sculptural Displays, Multiple Information Consolidators, Information Monitor Displays, and High Throughput Textual Displays.

The authors do a very nice job of describing the current areas of research and development in ambient and peripheral displays. The exercise of classifying current projects on their four selected dimensions is quite insightful, and serves greatly to provide organization and structure to the field.

This paper is lacking in a tangible contribution to knowledge, however. In the paper's introduction, it mentions at least one other existing method of categorizing ambient and peripheral displays. I can see no measure that indicates that the new classification system which is proposed here has any advantage over existing methods. Also, these patterns of design have been used to classify existing projects, but how could they be used to facilitate the creation of new products, in much the same way that OOP design patters are used? Finally, it could be suggested that the four patterns proposed are insufficient to categorize all possible ambient displays, since it is incapable of being applicable to all devices in the sample used in the paper. Perhaps there are more than four patterns that can be extracted from the four dimensions of classification proposed by the authors, and perhaps there are even anti-patterns to be found within these four dimensions, that would result in poor ambient displays.

Heuristic Evaluation of Ambient Displays

http://www.cs.toronto.edu/~khai/classes/csc2526-fall2008/readings/07-mankoff.pdf

In this paper, the authors present a method for evaluating ambient displays using a technique similar to Nielsen's Heuristic Evaluation. A set of heuristics are created, by modifying the Nielsen heuristics, specifically tuned for ambient displays, hereafter referred to as the Ambient Heuristics. Two evaluation groups are then formed to apply heuristic evaluation to two novel ambient displays, both developed by the authors: busMobile and daylight display. BusMobile is a simple ambient display which shows the locations of campus buses, relative to the building in which the display is placed. Daylight display uses a lamp to convey the brightness level outside to users in a lab with no windows. The use of Ambient Heuristics showed an increased ability to find severe usability issues over the Nielsen heuristics.

The authors have presented an insightful tool for evaluating the usability issues of ambient displays. This revised set of heuristics provide a cheap, effective way for researchers to evaluate the usability of their ambient display products.

I feel that this study would be more valid if the Ambient Heuristics were applied to more than just the two displays created by the authors. There are a number of existing products/projects that could have been used as samples in this study. This would both fulfill one avenue of future work, and reduce cost on the authors because they would not have to develop their own displays simply for the purpose of testing the evaluation scheme (that is, unless the display devices were existing projects). In addition to this, it would be interesting to see if the margin of difference between the number of issues found with Ambient Heuristics and the Nielsen Heuristics scales to larger sets of data (ie. More than 30 possible issues). If the gap were to widen, it would imply the Ambient Heuristics are at an advantage for finding issues in this domain. However, it seems then counter intuitive that the Nielsen Heuristics would be capable of finding issues that the Ambient Heuristics are not. Perhaps the heuristics need to be adjusted so that they find a superset of all issues found with the Nielsen set.

Thursday, October 16, 2008

Hardcore SE Papers

Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity and Zipf's Law
http://www.cs.toronto.edu/~gvwilson/reading/veldhuizen-libraries-reuse.pdf

This paper presents an interesting argument, stating that the entropy of a given problem domain can be measured, and in so doing we can predict the amount of library reuse that is appropriate, or indeed possible, for programs in that domain. Also, a handy proof states that the only complete library for a given domain is of infinite size, effectively securing jobs for library writers in the future.


Unfolding Abstract Datatypes
http://www.cs.toronto.edu/~gvwilson/reading/gibbons-unfolding-adt.pdf

This paper should have been called "Unfolding Abstract Datatypes in Functional Programming", so that I had a valid warning before I started reading it, and by reading I mean staring at the pages looking for something I understood. Here's what I got out of it: in functional languages, ADTs are possible, but less commonly implemented than in OOP languages. The primary reason for this is because most user-defined types expose their data structures so that you can do pattern matching on them. The paper argues that this is bad, and proper information hiding can be obtained without breaking matching ability. Also, ADTs represent codata (whatever that is).

Tuesday, October 14, 2008

Asking and Answering Questions during a Programming Change Task

http://www.cs.toronto.edu/~gvwilson/reading/sillito-questions-program-change.pdf

This paper presents a study conducted by getting programmers (students and professionals) to work while thinking out loud, then categorizing the questions they ask themselves (and their debugger). These fall into 44 distinct categories, under 4 main groups. Following this, each of the categories is analyzed to see if existing tools are able to directly answer the question proposed.

I found my mind wandering around while I was 'reading' this paper. Not that the subject matter is uninteresting, far from it. I found myself coming up with ideas for new tools all throughout this read, which eventually started to detract from the primary text itself. Anyway, the following are some thoughts:

There is mention of how programmers divide their workspace to show them, for example, code and executing program, or two code files, etc, by using emacs screen splitting, multiple windows, or multiple monitors. I wonder what the results would be in a study where we a) measure a programmer's productivity with one monitor, then b) add a second monitor (I'm pretty sure this has been done before), allow them to get used to it (productivity should plateau), then remove the second monitor. I predict that productivity will drop below that measured in a) for a while, then gradually come back to a nominal level.

Of the two groups studies (students and professionals), both had a single category of question (of 44 possible) that was asked vastly more than all others. For students, this was "Where is this method called or type referenced?". For professionals, this was "What will be (or has been) the direct impact of this change?". A couple of things come out of this.

First off, students seem more concerned with direct program behavior or structure, while professionals are concerned with impacts of code change. This seems like a much more organization-oriented behavior. I'm having trouble expressing my exact idea here, so I'll come back to it. Bottom line, is that professionals are less hack and slash than students.

Secondly, there are tools for addressing the students' question. Why aren't they using them? The tools for the developers' questions, however, are lacking. Can we make them better?

Exemplar-driven documentation. There is discussion in this paper about finding examples of the type of operation one is trying to create or modify within the subject code base, and using that as a template for the new feature/modification. I wonder if this could be applied not only to the target code base, but to any code base (or indeed every code base). Lets say, for example, I want to implement a convolution matrix to do a gaussian blur over a java BufferedImage. Imagine I had a search engine that would search the code of a vast number of open source code bases, with some natural language query, and returned code snippets of convolution matrices over BufferedImages. Useful? I dunno, just had to write it down before I forgot it.

This leads into another idea that popped up. A few months ago, when I had a job and spare time, I was playing around with the jMonkeyEngine, which is a handy little open source scene graph for java, based on JOGL. Its documentation is in the form of a Wiki, which unfortunately has a bunch of holes in it. However, I found that downloading the source trunk and looking at the extensive unit tests was a much better learning tool. I simply loaded the unit test hierarchy into the IDE, looked for a test for the feature I wanted to use, ran it to see it work, then looked at the test code, which by definition is short and concise. I propose a study where we take two groups of developers and one large API, and task them with implementing a given application off of this API. One group will have standard documentation, and one will have a complete set of unit tests. Let them go and check the results. If the unit tests turn out to be better, this would be a huge boost for the motivation for TDD.

Two more quick ideas, and then I'm done. This one relates back to finding usages of methods/classes, as was one of the prime questions asked by students in the paper's study. Using a 'Find Usages' feature in an IDE can solve this, but it is not the most efficient when looking for loose relationships between two or more elements. What if I wanted a tool that was "Find Usages of these TWO methods" or three or four or etc. Basically, find the class,method, block, or statement which uses all of the given input elements. I think this would be handy.

Lastly, the paper used ArgoUML as its code base for the student tests. The authors had the students fix bugs submitted via the ArgoUML tracker. I wonder if there's a market for shopping out bug fixing time to ethnographic research subjects?

Monday, October 13, 2008

Conceptual Modeling Extraveganza

Three papers concerned with conceptual modeling:

First off is Ross' seminal 1977 paper Structured Analysis (SA): A Language for Communicating Ideas. I don't think there's much I can say about this paper that hasn't been said already, so I'll keep it short. This paper presents the argument that "SA is the best thing since sliced bread", and continues to illustrate this point they present pretty much the entire meta-model for SA, and go though, in great detail, all the primitive constructs in the SA vocabulary.

Prof. Mylopoulos wrote an interesting opinion article Desert Island Column: A Trip to Carthea praising the previous paper's insight. One thing that Prof. Mylopoulos brings up is that "the world consists of more than things and happenings", which seems to be something that Ross argues strongly against in his paper. One of the cornerstones to the Ross paper was that anything worth talking about consists of 6 or fewer things and happenings.

Lastly, I took a look at Jennifer's paper Reflective Analysis of the syntax and Semantics of the i* Framework. Now, to be honest, I would have gotten a lot more out of this if I was more familiar with the i* syntax, but the idea of reflective analysis that this paper presents could be applied to any modeling tool. A study conducted by Horkoff et al looked at assignments and research papers in the community, and recorded the most frequent deviations from the U of T i* syntax, and proposed that these deviations were made due to non-optimal design choices in the languages syntax.
A couple of useful results came out of this investigation. First, the authors propose modifications to the i* syntax to address these common mistakes, and some conclusions are drawn about how users are learning i* (ie. not enough focus on areas where syntax mistakes occur). I wonder if a similar study has been done for SA or UML, and if so what their conclusions were (ie. UML activity diagrams are never used?)

Also, I was surprised to hear so many positive opinions about SADT. In my undergrad, it was touched on briefly in a Requirements Engineering class as something that was 'old, and not used by anyone anymore'.

Tuesday, October 7, 2008

Evaluating Effectiveness Efficiency of TDD

http://www.cs.toronto.edu/~gvwilson/reading/gupta-effectiveness-tdd.pdf

Kind of a hokey paper that describes an ethnographic study done in India comparing up-front design-code process to test driven development. Related work shows that a) tdd is more efficient, b) tdd is less efficient, and c) there is no difference, so it seems clear that there is still debate over this issue. This paper presents evidence (not vast amounts, just more ammunition for the debate) in support of TDD, ultimately coming to the conclusion that developers will probably prefer a modified version of TDD, in which more design is done up front, but still using the test-code-refactor waltz.

12 Steps to Better Code

I read this one before it showed up on the reading list :) Very good article. If more dev shops followed Joel's 12, our job would be much more enjoyable. Restated here for completeness, every successful software team should pass the following 12 tests:

The Joel Test
1. Do you use source control?
2. Can you make a build in one step?
3. Do you make daily builds?
4. Do you have a bug database?
5. Do you fix bugs before writing new code?
6. Do you have an up-to-date schedule?
7. Do you have a spec?
8. Do programmers have quiet working
conditions?
9. Do you use the best tools money can buy?
10. Do you have testers?
11. Do new candidates write code during their
interview?
12. Do you do hallway usability testing?

If you answered 'no' to more than 2 of these, you're not doing things properly. Pokes holes into some common patterns seen in software houses these days, like the low-walled cubicle bullpens and pushing off bugs to the end of the iteration. I think this is one that every developer and manager should commit to memory, but I certainly wouldn't call it science.

Monday, October 6, 2008

Learning TDD by Counting Lines

IEEE Software May/June 2007

Interesting little paper describing Nokia Networks migration from waterfall to agile, and the tools used by the group training Nokia's developers on how to use test driven development. The exercise in question talks about the first step the new developers took in creating a program to count non-commented lines in a source file, using a TDD approach. Start off with the low-hanging fruit (a test with an input program of one line), and scale it up until hitting a code-wall. The third movement in TDD, refactoring, is apparently often overlooked by new TDD developers (I'm guilty of this, too). This includes not only refactoring commonalities in the test cases, but in the production code and even refactoring the design of the production code. Emergent design, in this case, means more than just making the most logical step at each point and hoping that the best design will result.

Anyway positive results on a non-statistically significant (12) number of development teams.

Sunday, October 5, 2008

Replication in Research

Just read a couple papers concerned with replicability in research. That is, authors of academic papers publishing, along with the body of their paper, the data and code used to generate their figures and come to their conclusions. The primary moral is this: not enough structure is in place to make authors prove that their methods actually work, but the technology to distribute the materials required to double-check their findings is widely available, just not widely used.

I wonder if the previous article (size confounding metrics) has been replicated?

Saturday, October 4, 2008

The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics

A very good paper, with significant results. Research into metric code analysis on object-oriented programs shows that the most significant metric that can be used to infer the fragility of a class is the classes size (LOC), and that most other existing metrics are simply layers of indirection on top of class size (ie. large classes have higher coupling). I'm not sure the actual impact this paper has had on the community, but I would think it should be fairly revolutionary.

One thing that sort of rubbed me the wrong way was in the paper's introductory phases, they discussed procedural vs. object programming, and implied (with some references) that OO programs are in general harder to maintain. I wonder about this statement.