Rory Tulk's Blog: February 2009

Wednesday, February 18, 2009

Code Monkies and the Recipe for Happiness

Hanging out in my office today. Being moderately productive, but in an inexplicably good mood. The apparent recipe for happiness is as follows:

Purchase a ridiculously expensive bagel
Coffee
Install shiny new operating system on laptop
Meet with supervisor
Coffee
Create UI mockups for hypothetical testing tool
Listen to 'Code Monkey' by Jonathan Coulton a few hundred times.
Coffee

I like this song. It gives me this mental image of developers as knuckle-dragging primates. Especially the way Coulton removes all articles from the lyrics of the song. Ex. "Code monkey get up, get coffee." And after all, if you can't laugh at yourself, at whom can you laugh? (sentences also can't end in prepositions). A couple excerpts:

"Rob says Code Monkey very diligent, but his output stink. Code Monkey's code not functional or elegant. What do Code Monkey think? Code Monkey think maybe manager want to write god damn login page himself? Code Monkey not say it out loud. Code Monkey not crazy, just proud!"

"Much rather wake up, eat coffee cake. Take bath. Take nap. This job fulfilling in creative ways. What a load of crap."

Monday, February 16, 2009

Saxaphone

This is for the numerous saxophone players I know:

http://www.youtube.com/watch?v=RXyC7S-4LR8

Tuesday, February 10, 2009

Security

After a conversation with Aran about the U of T library site possibly being vulnerable to sql injection attacks, we came up with the idea that a really really cool name for a book on internet security would be

"); DROP TABLE Books;

Not only is it funny, but is very existence would serve to test the security measures of countless online bookstores and libraries across the globe. Most valuable internet security book you'll never have to read.

Thursday, February 5, 2009

A Long Train of Thought with One (1) Cool Idea and Several Tangents

So I've been complaining for months now that I'd like an operating system, or just a desktop manager, that was similar to the kind of thing you've seen in movies like Swordfish and Hackers. Some eclectic, that looks really frigging cool, and allows me to do the kinds of operations that I want to do, quickly and easily (Note to you UX experts out there, I am well aware of the fact that interfaces that look cool aren't usable. I'm not going for market appeal, this is a totally custom job). It seemed obvious that the only way to get this was to build it myself. So, I started small, thinking "What kind of operations/features would I want on a small, portable device, like a Netbook or EEE PC?" (Note: I decided I wanted an EEE PC when I saw Richard Stallman speak earlier this week. It was the only thing I really liked about his talk.) A simple window manager (with really flashy graphics), file system navigator, browser, and IDE. That's pretty much it. Oh, and the whole thing should be built on top of some flavor of Unix, so that I can still use 3rd party apps, etc. Ambitious project, I know, but I'm just fantasizing here. Also, while I was daydreaming about this, I decided to restructure my personal computing setup, but that's another story.

Anyway, I figured I'd start with the IDE, since I had always wanted to make one. This specific idea came up over the summer while I was at work. I really liked the coding features in IntelliJ, but wished it fit better with our continuous integration infrastructure. I came to the conclusion that the best IDE would have the most notable coding features from IntelliJ, but be transparent enough to allow you to plug in whatever tools you might already be using: SVN/Perforce, any JDK, any Ant, etc. Before you start saying "But Rory, Eclipse does ..." or "But Rory, NetBeans blah blah blah...", one of the strongest motivating factors for this idea was that it sounded like a lot of fun, regardless of wheter the requirements have been fulfilled by something else. Also, these heavyweight IDEs are just that, too heavy. For me, their feature set can be at times too full, and having a single application that consumes > 1GB of memory seems a bit silly, when all it really has to do is edit text files and invoke some commands from the system prompt. Also, keep in mind that I enlisted (is that the right word?) in grad school to do just this: redesign developer tools.

I mentioned this to Zak, to which he replied "It sounds like you just need to learn to use Emacs properly." That also doesn't sound like fun. I mentioned this to Aran, to which he replied "Oh my god, me too!" Yay! Someone else wants to make and IDE. Except, Aran's IDE is a Javascript IDE. Written in Javascript. That runs in a browser.

My first instinct was that this idea sounds rather boring. Then I thought more carefully about it. Google has browser based versions of all the other applications you could use on a daily basis: mail, word processor, spreadsheet, image editor, etc. Why not a browser based IDE? Are the UI controls available in a browser less expressive than those on a native client? Probably not.

Now, I had originally thought of this IDE as running in the browser, operating on local files, etc, but what if it were more closely integrated with the web? What if it were a component of an online software portal? It would automatically know which source code repository you're using. It could automatically update documentation (wiki's). It would have strong integration with the portal's bug tracker. Imagine it, you're favorite portal would have a "Code" tab in addition to "Wiki", "Docs", "Browse", "Mail", etc, at the top of the page, and when you clicked it, everything was configured and ready to go.

What can be get if we utilize the cloud for some of the processing, instead of relying on the browser to be the engine of this fantastic IDE? First off, my IDE no longer consumes > 1 GB of memory. What else? Rendering of the controls would still be done on the client, but can the cloud be used for more interesting problems, like static analysis? Anything that could benefit from a bit of parallelization is a good candidate for migration.

Could this include running unit tests? Should the compiled code be run on the server, or in the browser? Security concerns say that it should be run on the client, but if it were run on the server it could possibly be done faster, and run in multiple browsers in multiple environments instead of just the ones installed on the client. To protect security, we could run the build and execute the code in a virtual machine, like a SnowFlock VM for example. Now, as for unit tests, executing these tasks are extremely parallelizable. We could fork a vm for every test, run them all at once. Huzzah!

I may have to rewrite this in a more concise form :P

Also, Aran mentions Heroku and AppJet, which are similar to this.

Wednesday, February 4, 2009

BitWhat?

I've noticed that a lot of the tech blogs I've been reading have titles that include creative uses of the word 'bit'. Ex. The Third Bit, BitWorking, Bitgistics, etc. Now, don't get me wrong, the authors and opinions presented are insightful, but how often does a python programmer, for example, worry about bits? Shouldn't these catchy names use higher level concepts, like "Objection", "DataDemon", or "Quine 'n Cheese". Bleh, just my opinion?

Tuesday, February 3, 2009

Reading Week

While thinking about which papers/books I should consume over reading week, I came to the conclusion that what I would really rather do is finish up the planking on my scratch-built Bluenose. This was a project that I started a little over a year ago, and boxed when I moved to Toronto. My plan was to finish it before christmas, what with all the spare time I'd have as a lazy grad student. Turns out, things are a bit different than I originally estimated, and the Bluenose has stayed in the closet for the last 4 or 5 months.

The hull was carved from a single 6"x6"x48" Basswood timber, with additional pieces carved/shaped/cut from mixes of pine, balsa, and basswood. The false-underdeck is pretty much done, just needs a bit of sanding along the rails, then I get to start planking the deck. I estimate I'll spend most of the first day figuring out the scales & sizes I chose for everything. In hindsight, I wish I had written some of this stuff down, but I think I'll manage.

RESTful Questions

While working on my 2125 project, my partner and I created a quick little RESTful web service using CherryPy, and SQLAlchemy for persistence. SQLAlchemy worked wonderfully. CherryPy did a great job of making data-driven web pages, and the MethodDispatcher made it easy to invoke certain methods within a class when an http request comes in, based on the http method. This seemed almost ideal for REST, but some clunkiness in the design prevents it from being really what we're after.

What are we after, exactly? We're trying to find ways in which we can avoid duplication of effort when using both Object Relational Mappers and RESTful Web Services. In their book "RESTful Web Services", Richardson and Ruby hit on the point that the process of translating objects into REST resources is very similar to the process of translating the same objects into tables in a relational database. So, if a web service is storing objects in a database and exposing them via a rest api, we would be doing the same sort of mapping procedure twice.

My partner (in crime?) and I had a chat with Greg about this, and came up with some questions to investigate. Below are the questions and their answers:
How do REST APIs represent foreign-key relationships (ie. object aggregation)? Specifically, are references to the other objects stored/returned, or the entire object on each request?

It is common REST practice to return hyperlinks to other objects/resources that are aggregated by the given resource. This would require an additional http request for each referenced resource.

Can we uniquely identify object instances (REST resources) based on some identifier?

Yes. The resource's URI is its identifier. Every resource has one that identifies it. However, it is possible for one resource to have many URIs that point to it (ex. /releases/2_05 and /releases/latest could be the same thing).

Can we cache REST objects on the client side, based on their identifier (whatever that may happen to be)?

Yes. It would be silly not to. However, the multi-identifier problem stated in the last answer might make this less efficient.

If we assume that the meat of the service is some object graph (probably a DAG), can we reconstruct the graph on the client side, out of stubs instead of actual objects, given identifiers and caching?

I think so.

Thats all for now. Check here or on the project wiki for more information coming soon!

What I've Read This Week

Singer, J.; Vinson, N.G., "Ethical issues in empirical studies of software engineering," Software Engineering, IEEE Transactions on , vol.28, no.12, pp. 1171-1180, Dec 2002
URL: http://www.ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1158289&isnumber=25950

This paper presents a handful of ethical dilemmas that researchers who conduct empirical studies can get themselves into, along with advice on getting out or avoiding the situation all together.

What kings of studies could be create which contain no human subjects, but in which individuals can be identified (ie. from their source code)?
When can an employee's participation in an empirical study threaten their employment?
Is it possible to conduct a field study in which management doesn't know which of their employees are participating?
Should remuneration rates be adjusted to compete with a standard software engineer's salary?
Are raffles or draws valid replacements for remuneration? Does the exclusivity of the compensation (ie. only one subject wins the iPod) affect the data collected by the study? Will subjects 'try harder' in the task assigned if they think they may win a prize? Can prizes affect working relationships/situations after the researcher has left?
Does ACM Article 1.7 eliminate deceptive studies?
Regarding written concent/participation forms, does having a large number of anticipated uses of the data detract from a studies credability, and thereby make subjects less likely to participate?

John P. A. Ioannidis, "Why Most Published Research Findings Are False"
URL: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1182327

This paper describes a detailed statistical method (proof?) illustrating evidence that the majority of research papers published in this day and age go on to be refuted in the near future.

What is the 'power' the authors are referring to?
Is corollary 5 (corporations sponsoring research supress findings that they deem unfavorable for business reasons) just plain evil or misleading?
Null fields sound interesting. How do I tell if I'm stuck in a null field?
How do we determine R for a given field?

M. Jørgensen, and D. I. K. Sjøberg (2004) "Generalization and Theory Building in Software Engineering Research"
URL: http://simula.no/research/engineering/publications/SE.5.Joergensen.2004.c

Null hypotheses are a tell tale of (sometimes misused) statistical hypotheses testing. Should we as readers be concerned when we see clearly stated null hypotheses?
In their recommendations, the authors suggest that purely exploratory studies hold little or no value, given that vast amounts of knowledge concerning software engineering has been accumulated in other, older fields such as psychology. Although I agree that cross-disciplinary research is useful for SE, and many old ideas can be successfully applied in SE, I'm not sure I agree that there is no use in exploratory studies.
Proper definition of populations and subject sampling is important
It is difficult to transfer the results in one population to another. The most common example of this is performing a study on CS grad/undergrad students and expecting it to transfer to professionals. Is there any way we as CS grad students can perform studies that will be relevant to professionals, then?

Still working my way though RESTful Web Services. Just wrapped up the author's definition of ROA (resource oriented architecture). Very interesting. Hopefully this answers some questions brought up by my 2125 project.

Also on the stack are this paper about the Snowflock VM System and A Software Architecture Primer.

And, if there's time, I'll try to finish Ender's Game.

Rory Tulk's Blog