- To what extent are build processes and project maintenance discussed in CS education?
- What is the market value of a professional build engineer?
- How to build engineers perform their job? Is there a way it can be improved?
Thursday, January 22, 2009
Calling all Build Engineers
During my recent CSC301 lecture, I was surprised to find out how few students were aware that they could actually get a job as a full-time build engineer. It makes me think that there's probably a study to be done in this area.
Safe Server-Side Unit Testing
I like build systems :) My first experience with integrating a vcs, bug tracker, and ant was a very fulfilling experience, and it only got better when we added things like EMMA to give developers a feel of how their project was progressing. So, you can understand why my ears perked up when, during a conversation about the SVN setup in Dr. Project/Basie, Greg mentioned that they had tried to incorporate a continuous integration routine into Dr. Project, but failed, citing complexities and difficulty with the administration. Now, being the cinical, cold-hearted person that I am, my first thought was, "You clearly need better administrators", but then I remembered trying to do something similar with VMWare, and how rediculously hard it was to get it working, and once it was, keeping it there was almost impossible, so I held my tongue.
The basic premise here is to have the server which runs the Dr. Project/Basie installation also manage a system of virtual machines. When code is checked into the SVN repository for a given project, a virtual machine is spawned. Inside this VM, we download a copy of latest revision from the SVN, build it, run the unit tests, generate the reports, publish them, then kill the VM. Obviously, we can't do the build and test by just forking a process, without the VM, because that would allow the project groups to run arbitrary code on the Dr. Project web server, which is just about the biggest security hole I can think of. So, the goal here is to utilize the virtual machines to completely isolate the code from the web server, so that the tests are run in a completely safe environment, and at the same time providing benefits like strictly reproducible execution environments (every unit test starts from the same vm snapshot).
To accomplish these goals, we're looking at using the SnowFlock system. All vm's start from a master image, clones are quick to create (~100msecs), we can instantiate many, many clones at the same time, and the whole thing is wrapped up in a nice little Python API.
It will be interesting to see if this works for Dr. Project/Basie's needs, and if it does, I'd like to see if it could be extended to do cluster testing for larger distributed systems projects. The ease and speed of creating a new clone vm means that for each test, a small cluster of machines could be created, the test run, and torn down. I'm not sure if a tool like this exists already, it sounds like a fairly straightforward idea, but should be fun to investigate either way.
The basic premise here is to have the server which runs the Dr. Project/Basie installation also manage a system of virtual machines. When code is checked into the SVN repository for a given project, a virtual machine is spawned. Inside this VM, we download a copy of latest revision from the SVN, build it, run the unit tests, generate the reports, publish them, then kill the VM. Obviously, we can't do the build and test by just forking a process, without the VM, because that would allow the project groups to run arbitrary code on the Dr. Project web server, which is just about the biggest security hole I can think of. So, the goal here is to utilize the virtual machines to completely isolate the code from the web server, so that the tests are run in a completely safe environment, and at the same time providing benefits like strictly reproducible execution environments (every unit test starts from the same vm snapshot).
To accomplish these goals, we're looking at using the SnowFlock system. All vm's start from a master image, clones are quick to create (~100msecs), we can instantiate many, many clones at the same time, and the whole thing is wrapped up in a nice little Python API.
It will be interesting to see if this works for Dr. Project/Basie's needs, and if it does, I'd like to see if it could be extended to do cluster testing for larger distributed systems projects. The ease and speed of creating a new clone vm means that for each test, a small cluster of machines could be created, the test run, and torn down. I'm not sure if a tool like this exists already, it sounds like a fairly straightforward idea, but should be fun to investigate either way.
Labels:
Basie,
Continuous Integration,
Dr. Project,
SnowFlock,
VM,
Xen
ORM Mapping for Web Service Definition
This post is an experiment with the Blogger/Google Docs interoperability functionality. I'm not terribly impressed with the quality of the translation between doc and blog post. If you're as disgusted with the layout as I am, feel free to read the google doc here. This is a document describing an idea Greg proposed to me and another student in his CSC2125 class. In once sentence, the problem is : Can we use the object mapping definition from an Object-Relational Mapping tool to describe objects/resources in a RESTful web API, and in so doing leverage some of the benefits ORMs have lent to persistance, reduce redundancy, and generally make people happier? Unfortunately, the more I look into it, the more I think the answer is 'no'. However, we're not quite finished the investigation yet.
Figure 1 illustrates the traditional deployment situation for a client/server application, in which the client communicates with the server via an exposed web service, and the server persists data into a database using an object-relational mapper. The server-side process consists of a layered architecture, in which the business logic interfaces with the database via an ORM mapping layer. The application logic stores its data in the form of objects (hence the ORM), and a mapping is defined by the application programmer between the class definitions for these objects and a relational database schema.
The client side application code performs some useful operation with the data or service exposed by the server-side business logic. To access this information or service, the client application utilizes stub objects. These stubs expose the same interface as the live objects on the server (possibly a subset of methods for security/feasibility reasons), but the implementation of the object lives on the server; client side methods all contain logic for making calls to the server, and returning the response as if the method were implemented on the client. These stub objects are created automatically at build time by a tool which is able to read a description of the web service, and interpret into source code which can be compiled and used by the application. In traditional web services, this description is a WSDL file. <what is this for a REST web service??>
The problem with this deployment is that there are redundencies in the way the shared objects and web service is defined, which could be streamlined to the benefit of both server-side programmers and clients who wish to interface with the server. In the event that the class definition for one of the shared objects changes (ex. adding a new public member), the server-side application programmer must update both the ORM Mapping Definition file/logic, as well as the Web Service Descriptor, and the client-side programmer may be required to at least rebuild their application, to update the stubs (this is addresses in Versioning).
It has been proposed that the ORM Mapping Definition and Web Service Descriptor can be combined into one artifact, as the two separate documents both essentially describe how to serialize instances of a given class. With hopefully only a small amount of modification, an ORM Mapping Definition could be used to serve both these purposes. Also, if the ORM side of the interface to this artifact is properly preserved, it is hoped that it can still be used for many of the other functions the ORM layer uses it for, like relational database schema migration/exporting.
Note: it may be interesting to investigate this further, as the ORM Mapping Definition serializes an object's state, but a Web Service Descriptor would likely only describe an object's behavior/interface!
Figure 2: Client-Server Web Service Deployment, with single Shared Object Descriptor
Following from the automatic schema migration/updating, questions are raised about how similar functionality could be used to span the client-server gap, not just the server-database gap. Obviously, client-side stub classes can't be updated completely automatically, as this would require rebuilding the application. However, the server could expose several concurrent versions of the same service, multiplexed based on a version field in incoming requests. As part of the schema update process, in addition to modifying the database, the ORM layer (or some other piece of code) could generate the infrastructure required to support backward-compatible calls to the web service API.
ORM Mapping for Web Service Descriptors
Traditional Situation
Figure 1 illustrates the traditional deployment situation for a client/server application, in which the client communicates with the server via an exposed web service, and the server persists data into a database using an object-relational mapper. The server-side process consists of a layered architecture, in which the business logic interfaces with the database via an ORM mapping layer. The application logic stores its data in the form of objects (hence the ORM), and a mapping is defined by the application programmer between the class definitions for these objects and a relational database schema.
The client side application code performs some useful operation with the data or service exposed by the server-side business logic. To access this information or service, the client application utilizes stub objects. These stubs expose the same interface as the live objects on the server (possibly a subset of methods for security/feasibility reasons), but the implementation of the object lives on the server; client side methods all contain logic for making calls to the server, and returning the response as if the method were implemented on the client. These stub objects are created automatically at build time by a tool which is able to read a description of the web service, and interpret into source code which can be compiled and used by the application. In traditional web services, this description is a WSDL file. <what is this for a REST web service??>
Figure 1: Traditional Client-Server Web Service Deployment, with ORM Mapping Definition and a Web Service Descriptor
Problem
The problem with this deployment is that there are redundencies in the way the shared objects and web service is defined, which could be streamlined to the benefit of both server-side programmers and clients who wish to interface with the server. In the event that the class definition for one of the shared objects changes (ex. adding a new public member), the server-side application programmer must update both the ORM Mapping Definition file/logic, as well as the Web Service Descriptor, and the client-side programmer may be required to at least rebuild their application, to update the stubs (this is addresses in Versioning).
Single Mapping Definition
It has been proposed that the ORM Mapping Definition and Web Service Descriptor can be combined into one artifact, as the two separate documents both essentially describe how to serialize instances of a given class. With hopefully only a small amount of modification, an ORM Mapping Definition could be used to serve both these purposes. Also, if the ORM side of the interface to this artifact is properly preserved, it is hoped that it can still be used for many of the other functions the ORM layer uses it for, like relational database schema migration/exporting.Note: it may be interesting to investigate this further, as the ORM Mapping Definition serializes an object's state, but a Web Service Descriptor would likely only describe an object's behavior/interface!
Figure 2: Client-Server Web Service Deployment, with single Shared Object Descriptor
Versioning
Following from the automatic schema migration/updating, questions are raised about how similar functionality could be used to span the client-server gap, not just the server-database gap. Obviously, client-side stub classes can't be updated completely automatically, as this would require rebuilding the application. However, the server could expose several concurrent versions of the same service, multiplexed based on a version field in incoming requests. As part of the schema update process, in addition to modifying the database, the ORM layer (or some other piece of code) could generate the infrastructure required to support backward-compatible calls to the web service API.
My First Lecture
So, I've been neglecting this old blog for the last few weeks, so I figure it's high time I let my captive audience in on what I've been doing at grad school.
Once again, I'm TAing CSC301. This term, in addition to my standard duties of marking, critiquing assignment questions, and coming up with exam problems, Greg asked each of his TAs to give a lecture to the class. My topic was unit testing with javascript. My first reaction was "Yay, I already know about unit testing, this will be a breeze", but then I remembered that my knowledge of javascript extended to image rollovers, and no further. So, I spent about 10 or 15 hours over the next week or so learning proper OO javascript, as well as how to use (or not use) JsUnit and some of its competitors, JsCoverage, Selenium, and trying to beat CruiseControl into state that fits together with these tools (no such luck, unfortunately).
The lecture went pretty much as I expected. I had prepared a loose agenda of items, a timeline for discussion and demonstration, and a few canned questions designed to get the undergrads to turn their brains on - all of which I forgot as soon as Greg introduced me. Luckily, I had my laptop and a big desk to hide behind. After a few moments of awkwardness, things got back on track, however.
Lessons learned:
Once again, I'm TAing CSC301. This term, in addition to my standard duties of marking, critiquing assignment questions, and coming up with exam problems, Greg asked each of his TAs to give a lecture to the class. My topic was unit testing with javascript. My first reaction was "Yay, I already know about unit testing, this will be a breeze", but then I remembered that my knowledge of javascript extended to image rollovers, and no further. So, I spent about 10 or 15 hours over the next week or so learning proper OO javascript, as well as how to use (or not use) JsUnit and some of its competitors, JsCoverage, Selenium, and trying to beat CruiseControl into state that fits together with these tools (no such luck, unfortunately).
The lecture went pretty much as I expected. I had prepared a loose agenda of items, a timeline for discussion and demonstration, and a few canned questions designed to get the undergrads to turn their brains on - all of which I forgot as soon as Greg introduced me. Luckily, I had my laptop and a big desk to hide behind. After a few moments of awkwardness, things got back on track, however.
Lessons learned:
- leave your pen at your desk (aparently I click-click-click it as a nervous twitch).
- always bring a glass of water to a lecture, so that a) you can keep your throat lubricated and b) you can take short pauses to think/fabricate answers to questions without looking like you're thinking/fabricating
Subscribe to:
Posts (Atom)