Rory Tulk's Blog: SnowFlock

Showing posts with label SnowFlock. Show all posts

Thursday, February 5, 2009

A Long Train of Thought with One (1) Cool Idea and Several Tangents

So I've been complaining for months now that I'd like an operating system, or just a desktop manager, that was similar to the kind of thing you've seen in movies like Swordfish and Hackers. Some eclectic, that looks really frigging cool, and allows me to do the kinds of operations that I want to do, quickly and easily (Note to you UX experts out there, I am well aware of the fact that interfaces that look cool aren't usable. I'm not going for market appeal, this is a totally custom job). It seemed obvious that the only way to get this was to build it myself. So, I started small, thinking "What kind of operations/features would I want on a small, portable device, like a Netbook or EEE PC?" (Note: I decided I wanted an EEE PC when I saw Richard Stallman speak earlier this week. It was the only thing I really liked about his talk.) A simple window manager (with really flashy graphics), file system navigator, browser, and IDE. That's pretty much it. Oh, and the whole thing should be built on top of some flavor of Unix, so that I can still use 3rd party apps, etc. Ambitious project, I know, but I'm just fantasizing here. Also, while I was daydreaming about this, I decided to restructure my personal computing setup, but that's another story.

Anyway, I figured I'd start with the IDE, since I had always wanted to make one. This specific idea came up over the summer while I was at work. I really liked the coding features in IntelliJ, but wished it fit better with our continuous integration infrastructure. I came to the conclusion that the best IDE would have the most notable coding features from IntelliJ, but be transparent enough to allow you to plug in whatever tools you might already be using: SVN/Perforce, any JDK, any Ant, etc. Before you start saying "But Rory, Eclipse does ..." or "But Rory, NetBeans blah blah blah...", one of the strongest motivating factors for this idea was that it sounded like a lot of fun, regardless of wheter the requirements have been fulfilled by something else. Also, these heavyweight IDEs are just that, too heavy. For me, their feature set can be at times too full, and having a single application that consumes > 1GB of memory seems a bit silly, when all it really has to do is edit text files and invoke some commands from the system prompt. Also, keep in mind that I enlisted (is that the right word?) in grad school to do just this: redesign developer tools.

I mentioned this to Zak, to which he replied "It sounds like you just need to learn to use Emacs properly." That also doesn't sound like fun. I mentioned this to Aran, to which he replied "Oh my god, me too!" Yay! Someone else wants to make and IDE. Except, Aran's IDE is a Javascript IDE. Written in Javascript. That runs in a browser.

My first instinct was that this idea sounds rather boring. Then I thought more carefully about it. Google has browser based versions of all the other applications you could use on a daily basis: mail, word processor, spreadsheet, image editor, etc. Why not a browser based IDE? Are the UI controls available in a browser less expressive than those on a native client? Probably not.

Now, I had originally thought of this IDE as running in the browser, operating on local files, etc, but what if it were more closely integrated with the web? What if it were a component of an online software portal? It would automatically know which source code repository you're using. It could automatically update documentation (wiki's). It would have strong integration with the portal's bug tracker. Imagine it, you're favorite portal would have a "Code" tab in addition to "Wiki", "Docs", "Browse", "Mail", etc, at the top of the page, and when you clicked it, everything was configured and ready to go.

What can be get if we utilize the cloud for some of the processing, instead of relying on the browser to be the engine of this fantastic IDE? First off, my IDE no longer consumes > 1 GB of memory. What else? Rendering of the controls would still be done on the client, but can the cloud be used for more interesting problems, like static analysis? Anything that could benefit from a bit of parallelization is a good candidate for migration.

Could this include running unit tests? Should the compiled code be run on the server, or in the browser? Security concerns say that it should be run on the client, but if it were run on the server it could possibly be done faster, and run in multiple browsers in multiple environments instead of just the ones installed on the client. To protect security, we could run the build and execute the code in a virtual machine, like a SnowFlock VM for example. Now, as for unit tests, executing these tasks are extremely parallelizable. We could fork a vm for every test, run them all at once. Huzzah!

I may have to rewrite this in a more concise form :P

Also, Aran mentions Heroku and AppJet, which are similar to this.

Tuesday, February 3, 2009

What I've Read This Week

Singer, J.; Vinson, N.G., "Ethical issues in empirical studies of software engineering," Software Engineering, IEEE Transactions on , vol.28, no.12, pp. 1171-1180, Dec 2002
URL: http://www.ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1158289&isnumber=25950

This paper presents a handful of ethical dilemmas that researchers who conduct empirical studies can get themselves into, along with advice on getting out or avoiding the situation all together.

What kings of studies could be create which contain no human subjects, but in which individuals can be identified (ie. from their source code)?
When can an employee's participation in an empirical study threaten their employment?
Is it possible to conduct a field study in which management doesn't know which of their employees are participating?
Should remuneration rates be adjusted to compete with a standard software engineer's salary?
Are raffles or draws valid replacements for remuneration? Does the exclusivity of the compensation (ie. only one subject wins the iPod) affect the data collected by the study? Will subjects 'try harder' in the task assigned if they think they may win a prize? Can prizes affect working relationships/situations after the researcher has left?
Does ACM Article 1.7 eliminate deceptive studies?
Regarding written concent/participation forms, does having a large number of anticipated uses of the data detract from a studies credability, and thereby make subjects less likely to participate?

John P. A. Ioannidis, "Why Most Published Research Findings Are False"
URL: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1182327

This paper describes a detailed statistical method (proof?) illustrating evidence that the majority of research papers published in this day and age go on to be refuted in the near future.

What is the 'power' the authors are referring to?
Is corollary 5 (corporations sponsoring research supress findings that they deem unfavorable for business reasons) just plain evil or misleading?
Null fields sound interesting. How do I tell if I'm stuck in a null field?
How do we determine R for a given field?

M. Jørgensen, and D. I. K. Sjøberg (2004) "Generalization and Theory Building in Software Engineering Research"
URL: http://simula.no/research/engineering/publications/SE.5.Joergensen.2004.c

Null hypotheses are a tell tale of (sometimes misused) statistical hypotheses testing. Should we as readers be concerned when we see clearly stated null hypotheses?
In their recommendations, the authors suggest that purely exploratory studies hold little or no value, given that vast amounts of knowledge concerning software engineering has been accumulated in other, older fields such as psychology. Although I agree that cross-disciplinary research is useful for SE, and many old ideas can be successfully applied in SE, I'm not sure I agree that there is no use in exploratory studies.
Proper definition of populations and subject sampling is important
It is difficult to transfer the results in one population to another. The most common example of this is performing a study on CS grad/undergrad students and expecting it to transfer to professionals. Is there any way we as CS grad students can perform studies that will be relevant to professionals, then?

Still working my way though RESTful Web Services. Just wrapped up the author's definition of ROA (resource oriented architecture). Very interesting. Hopefully this answers some questions brought up by my 2125 project.

Also on the stack are this paper about the Snowflock VM System and A Software Architecture Primer.

And, if there's time, I'll try to finish Ender's Game.

Thursday, January 22, 2009

Safe Server-Side Unit Testing

I like build systems :) My first experience with integrating a vcs, bug tracker, and ant was a very fulfilling experience, and it only got better when we added things like EMMA to give developers a feel of how their project was progressing. So, you can understand why my ears perked up when, during a conversation about the SVN setup in Dr. Project/Basie, Greg mentioned that they had tried to incorporate a continuous integration routine into Dr. Project, but failed, citing complexities and difficulty with the administration. Now, being the cinical, cold-hearted person that I am, my first thought was, "You clearly need better administrators", but then I remembered trying to do something similar with VMWare, and how rediculously hard it was to get it working, and once it was, keeping it there was almost impossible, so I held my tongue.

The basic premise here is to have the server which runs the Dr. Project/Basie installation also manage a system of virtual machines. When code is checked into the SVN repository for a given project, a virtual machine is spawned. Inside this VM, we download a copy of latest revision from the SVN, build it, run the unit tests, generate the reports, publish them, then kill the VM. Obviously, we can't do the build and test by just forking a process, without the VM, because that would allow the project groups to run arbitrary code on the Dr. Project web server, which is just about the biggest security hole I can think of. So, the goal here is to utilize the virtual machines to completely isolate the code from the web server, so that the tests are run in a completely safe environment, and at the same time providing benefits like strictly reproducible execution environments (every unit test starts from the same vm snapshot).

To accomplish these goals, we're looking at using the SnowFlock system. All vm's start from a master image, clones are quick to create (~100msecs), we can instantiate many, many clones at the same time, and the whole thing is wrapped up in a nice little Python API.

It will be interesting to see if this works for Dr. Project/Basie's needs, and if it does, I'd like to see if it could be extended to do cluster testing for larger distributed systems projects. The ease and speed of creating a new clone vm means that for each test, a small cluster of machines could be created, the test run, and torn down. I'm not sure if a tool like this exists already, it sounds like a fairly straightforward idea, but should be fun to investigate either way.

Rory Tulk's Blog