Mar 31 2010

Beware Of Giving Developers Too Many Choices

Category: Architecture | Best PracticeAlexRobson @ 08:26

I’ve been working on an open source project that’s kind of like an aggregator and abstraction framework for incorporating several different open source projects. It has been a lot of fun so far and some of the developers I work with are trying to use it on our newer development efforts.

One of the issues I’ve recently run into is one of the APIs I wrote provides the user with four possible dependencies. Each dependency provides the same functionality but with varying degrees of customization to the object model they’re working with. Initially I thought this was the best way to allow users to completely customize how they choose to work with the underlying system. But recently I’ve discovered that in most cases, developers see the most simple interface and just assume that it will work with their object model regardless of how customized. Syntax errors are the only thing they get. It’s not like the build tells them “Hey, use a different interface if you’ve done this, this and this”. It just barfs on them. It looks like my API has a bug…

I remember Brad Abrams recounting his experience in designing the first version of the .Net Framework; he talks about watching developers new to the framework trying to work with the IO namespace and how developer after developer kept stumbling over the API when trying to do something as simple as open a file and read the contents. He experienced denial, shock, shame, and finally settled into a determination to make the code better and easier to use. (Framework Design Guidelines, 2nd Edition)

It wasn’t that they had bugs in the IO namespace in .Net 1.0. It was that the API wasn’t “discoverable”. How often do we as developers try to learn new APIs through intellisense alone? Especially in open source projects where documentation is scarce, you hope that the API is clearly designed enough and the assemblies structured enough that you can go off one or two examples and discover what you need.

I think that one of the mistakes I made in the design was giving the user too many choices. Not only that, but more importantly my API didn’t even have the decency to make the choices apparent to the end user.

I’m currently working on a fairly extensive re-write that will reduce the API to a single dependency which will inform the user of the constraints put in place on their object model. I’ll provide some classes that help them meet those constraints and go from there. In the end, while it reduces the flexibility of the over-all API, I think it results in a much better design because it removes the likelihood that a developer gets stuck with runtime errors that give them no clue as to what’s going on and leaves them to assume the library is broken.

Tags:

Feb 11 2010

Chasing The CI Grail - Trigger CruiseControl Builds From Git

Category: Tools | Best PracticeAlexRobson @ 01:04

In the last post I wrote up the steps required to setup a gitosis server. Now that I have a solution for source control, it’s time to start thinking about the build server. CruiseControl.Net is a nice, easy to use build server and there’s already a lot of support and documentation for it in the community.

One thing that I found a little sub-optimal about using CruiseControl and git together is that I had to build my own build trigger. The downside to the trigger is that it polls the git repository on a timer and when there’s a new commit, it tells CruiseControl, “hey pull the code and build it”. While that works, I’m not really thrilled with a chatty trigger…

A while back a fellow git user, Calvin Bottoms, told me about the hook scripts that come with git and how he was using them. The great thing about the hooks is that they’re basically just shell scripts which git calls in response to changes in the repository. This is exactly what I want; a way to “push” the event to the build server instead of polling. There are several hooks, but the hook I’m particularly interested in is the post-update script. (You can read more about the hooks in the official documentation.)

The only catch is that CruiseControl exposes integration via .Net remoting endpoints. Since my git server runs on Linux, there aren’t a lot of easy ways to call out to a .Net remoting endpoint. The simplest thing I came up with was using curl against a RESTful API. While CruiseControl doesn’t have a RESTful API, it just so happens that ASP.Net MVC comes with a wonderfully simple routing engine that I can use to build a utilitarian one.

So all I need in my MVC “application” is a controller with an action, an empty view (although there are other useful things one could do instead) and the route. The only action I’ll put in my controller is the Build action. It will take the name of the CruiseControl server and the name of the project I want to build. Here’s the code:

public ActionResult Build(string cruiseControlServer, string projectName)
{
    var uri = string.Format(@"tcp://{0}:21234/CruiseServerClient.rem", cruiseControlServer);
    var client = RemotingServices.Connect(typeof(ICruiseServerClient), uri) as ICruiseServerClient;
    var request = new ProjectRequest("build", projectName);
    var response = client.ForceBuild(request);

    ViewData["server"] = cruiseControlServer;
    ViewData["project"] = projectName;

    ViewData.Model = response;

    return new ViewResult();
}

Dead simple. Honestly, it’s too simple. To use this, you really need to recover gracefully from garbage input, or CruiseControl being down or the project name missing, etc. However, this is enough to get started and demonstrate what’s required. If you wanted to, you could actually get away with this being the only code you need to write. The downside is that without a custom route defined, the curl command to call this controller method looks like this:

curl http://localhost/cruisecontrol?cruiseControlServer=localhost&projectName=project

Yuck city. To me, that kind of URL is just begging for typos. Thanks to the Routing engine in MVC, we can remove the existing routes from the Global.asax.cs file and replace it with this:

routes.MapRoute(
   "Build",                                       
   "{cruiseControlServer}/{projectName}",
   new
       {
           controller = "CruiseControl", 
           action = "Build", 
           cruiseControlServer = "", 
           projectName = ""
       });

 

This lets me change my git hook script to:

curl http://localhost/cruisecontrol/localhost/project

Ah, much better! Adding the above call to the post-update hook in a git repository will cause it to call out to our new RESTful call which will trigger the build in CruiseControl. It’s definitely not as robust as it could be but it’s definitely enough to provide continuous integration builds on code pushes to the central/build repository.

Like I said before, I really need to add other things to this and should be able to easily extend functionality from here. What I’ve shown in this post is really just a nice starting point for better integration between CruiseControl and git.

Other posts in this series:

Introduction
ESXi, Debian and Git
Gitosis From Scratch

Tags: , ,

Jan 22 2010

My Crash Course In High Performance NHibernate

It’s never good when your boss appears in your office unexpectedly to tell you that the deadline you thought was a few days out is actually tomorrow. It’s also not good when it happens right after your analyst informs you that the system you thought was producing valid output was actually built on an oversimplification that was only just discovered. It’s especially bad when the model you’re working against is supposed to be crawling a payroll system with insufficient metadata to support the business rules. This particular model is very complex. So complex that there are professionals who dedicate their entire career just to understanding this single facet of their industry.

Welcome to my hell, circa yesterday morning. The problem is that the process I wrote to handle all this in the first place was already written under a relatively aggressive deadline. This is my preface for telling you that I wrote a crappy console app to “get-er done!”. The issue is that the sheer volume of data, coupled with the awful schema we inherited, coupled with the complex business rules and model made for a very slow loading of the better part of the database into memory so my code would be able to handle all the calculations and recreation of new structures which would then be saved back to newer (still fairly complex) schema in the database. This wonderous and unnatural process took anywhere from 1.5 to 2 hours to complete. Still, as of last Friday, we thought we were in great shape…

The real issue with a long running process like this is that when a problem is identified, you have to identify the root cause, adapt the model/logic, test, then complete a full run. When there’s a 2 hour overhead in that process, it gets really, really painful. Now I wasn’t just on the hook for this one thing, so it’s not like I’d been able to give this my full attention. I ignorantly thought “this is good enough for now…”

I’m always saying what a good team we have here. Evan Hoff and Jim Cowart really helped me a lot. In one 18 hour day we managed to turn this slow crappy process into a fast crappy process (about 4 to 5 times faster). I also have to give credit to Oren Eini for making the wonderful NHibernate Profiler, a tool no dev wishing to remain sane should be without. Anyway, here’s what I learned:

The NHibernate.Linq Library Is Dangerous
You should only use it for fun time. The eager loading does not work correctly. In situations where you don’t care about lazy loading additional child collections, it’s worked just fine for me. I actually still use it for those cases because it’s type-safe and compile time checked for typos : )

You Can Die From Lazy Loading
Lazy loading ain’t free. It doesn’t seem like it would be a huge deal but when you have a model that’s > 2 levels deep with more than just on or two nodes off each aggregate root, lazy loading will kill you dead.

Use The Future Query API To Eager Load
This is awesome. Fortunately Evan had just read Oren’s latest blog entry on this. With some HQL experimentation we figured out how incredibly powerful this is. Sadly, HQL is just a flipping string so it’s easy to mess us. The NH error messages were good enough to point me in the right direction. Read Oren’s post
here and the HQL chapter here.

Second Level Caching Is Not Your Friend For High Volume
This wasn’t what I expected but sure enough, turning off the second level cache made the writes back to the database go much, much faster. Calling flush on the session was taking seconds just for a few persists until we eliminated the second level cache.

Use The Reflection Optimizer For High Volume
There is some up-front penalty here but it did help performance. If you’re using Fluent NH like I am, it’s a simple .UseReflectionOptimizer() call during the fluent database configuration step.

You Need One Session Per Thread And Objects Cannot Be Shared Across Sessions
To get this monstrosity running faster we needed to make all this processing happen concurrently. Unfortunately, this process was very complex in how the new object model was created. Certain objects needed to be created and shared across models on an as needed basis. Before parallelizing it, I was able to store these shared objects in a hash and wrap access to them in a nice little function call that abstracted away the fact that I was creating them if they didn’t exist and retrieving them if they did.

This does not work when you’re spinning up threads with a session per thread (this is required) because as soon as you try to associate the shared instance across more than one session, NH breaks. Here’s how we got around this limitation:

Implement a double checked lock pattern so that you have a dictionary of locks per shared object id and a lock that protects access to that dictionary. When the consumer asks for a specific shared object by id, you check the database first. If the object wasn’t there, then you lock on the outer dictionary lock and then check to see if a lock exists for that shared object id. If it doesn’t you create a lock and store it by the requested object id. After that, you lock on that newly created object for the id, check the database again and if there is still now record, you create the object, save it and exit the lock. If it was in the database, you simply return it. Here’s some demo code to reinforce that messy explanation:

private object _dictionaryLock = new object();
private Dictionary<int, object> _sharedObjectLock = new Dictionary<int, object>();

public bool GetSharedInstanceFromDB(ISession session, int id, out SharedObject instance)
{
    instance = session.Linq<SharedObject>().FirstOrDefault(x => x.Id == id);
    return instance != null;
}

public SharedObject GetSharedInstance(ISession session, int id)
{
    SharedObject instance = nulll;
    if(!GetSharedInstanceFromDB(session, id, out instance)
    {
        lock(_dictionaryLock)
        {
            if(!_sharedObjectLock.ContainsKey(id))
                _sharedObjectLock.Add(id, new object());
        }
        lock(_sharedObjectLock[id])
        {
            if(!GetSharedInstanceFromDB(session, id, out instance)
            {
                // code to create instance
                session.Save(instance);
                session.Flush();
            }
        }
    }
    return instance;
}

Far from simple, but for us, unfortunately, it was necessary. The nice thing about this is that it gives you a way to multi-thread session access and still share a common object between threads without causing session collisions.

DO NOT USE IDENTITY COLUMNS! AHHHHHHHHH
We used identity columns : \ I’ve pretty much always been against them because I don’t like the idea of my database telling me what the identifier for my records are. I like to have control (does that make me crazy?). NH pros will tell you to use Hi-Lo or something like that which allows your clients to create unique, yet arbitrary ids for your tables. Why does it matter?

Well, unlike my now dead ORM, NHibernate does not attempt to write your FK values from parent objects in one go. Instead it will do a follow-up Update to all the child rows to provide the database-specified parent Id when you’re using identity columns. This can get very expensive and chatty, very, very quickly. On the other hand, if you’re specifying the id in your client code, it’s already available to the child FK rows. IGNORE THIS ADVICE AT YOUR OWN FLIPPING PERIL. Sadly, we can’t just change all the schema and models at the last minute, but it’s definitely something I will take with me moving forward.

 

And that’s all I have to say about that. Hope it’s helpful : )

 

 

Tags:

Nov 1 2009

Chasing The CI Grail - Introduction

Category: Tools | Best PracticeAlexRobson @ 16:15

It’s not exactly a secret that I have technology ADD. It’s not because I don’t have enough to do. It’s not because the technologies I use on a regular basis are boring. It’s especially not because I need more challenges. Like most technology professionals, I have the opposite problems. I’m over-stimulated. There’s too much I don’t know but I need to know already and now I’m drowning… *glub glub glub*

Only pointing this out because my posts tend to vary from open source projects I dabble in, to jQuery, and now continuous integration. I’ve only really worked on one team where we had a passable CI strategy. I’ve never been on or heard details about a team that had a great setup.

And Now For Some Whining
For the past 3 years I’ve been in TFS land. It’s not the worst thing I’ve ever used for source control. It is the worst I’ve used for just about everything else. You can’t really integrate with TFS unless you have lots of time, no immediate need to do so and a lots of patience. As of TFS 2008, I don’t really see the solution I’m looking for.

What I Want
I want source control to integrate with planning and management activities related to the development process. I want all of that to tie in nicely with continuous integration builds and tools to deploy those builds. I’d also like the system to have the appropriate amount of chattiness which lets the user decide how they get notified of what’s happening. Yeah. I know. It’s not really out there. At least not yet. I don’t mind building parts, but only if the tool provides clean and appropriate integration points and opportunities for extensibility. Yes, I am chasing the holy grail of continuous integration. I’m not sure it exists but that won’t stop me from searching.

Why Should You Care?
Because I assume that if my team faces the same challenges yours does, we have similar wants and needs. So as I go through this series describing for you the steps I took in trying to get there, you can learn from the stuff that works well and laugh at my mistakes (and maybe learn from those too).

Where I’m Starting
Well, I don’t want to give it all away, but my next post will go into detail about my experience so far with ESXi, Debian and Git.

Tags: , , , ,

Jul 16 2008

Eliminating Nulls

Category: .Net Framework | Best PracticeAlexRobson @ 14:51

I present for your reading enjoyment, my newest soap-box: eliminating the use of null in databases and code as much as possible. I realize the implications this has. I realize that Microsoft, under pressure from a lot of developers who wanted to further bastardize null, introduced nullable value types in the 2.0 version of the framework. But I think I can make a rock-solid argument for why 1) nulls are not intended to drive business or programmatic logic and 2) should only be allowed in code or database on the rarest of occasion.

Null Isn't A Value
In .Net, Null isn't a value. Null wasn't put into the framework for your uses. In .Net you have two kinds of types: value types and reference types. A value type is defined by a struct or enum and places it's data directly on the stack. A reference type is defined by a class and is called a reference because any instance of this type is actually stored out in a blob of dynamically allocated and managed memory called the heap. The variable which represents this instance is placed on the stack and is actually a pointer to the spot in the heap where the actual instance resides.

Null is what happens when an instance variable doesn't point to a memory address on the heap. Null is not an actual value which is why (traditionally) value types cannot be null because they're not an address pointer but a value directly on the stack. The sole reason for the null keyword (or Nothing for you VB folks) is so that you can programmatically test for and handle the unfortunate case when your reference type variable isn't pointing to anything you can use. The reason you get a NullReferenceException in the framework isn't because Microsoft loves to annoy you, it's because you're trying work with an object instance that no longer exists.

Testing For Null vs. Testing For Constants
A common argument against the "no more nulls" campaign I'm on comes up when discussing UI binding and validation. You may not want to display underlying values in the UI to the user if the values weren't populated from something like a database. As I understand it, the code would look something like:

 if (class.Property != null)  
 {      
      textBox.Text = class.Property;
 }


Well, what I don't get is how that's any different from:

 if (class.Property != EMPTY_CONSTANT)   
 {
      textBox.Text = class.Property;
 }
 

Where EMPTY_CONSTANT would be for whatever value type you were testing for whether that's EMPTY_DATETIME or EMPTY_STRING (though that's really unnecessary). As you can see, the point is pretty moot. You can just as easily test for a constant, it's still readable and you protect everyone from NullReferenceExceptions.

Avoiding Null In The Database
Harder for me to argue this one other than to say that I believe default values and NOT NULL exist in RDBMs for a great reason! If you allow nulls in database columns, you're going to end up having to write additional logic to code around them everywhere. The one great exception I'm aware of is foreign key columns. A column marked as FK has to have a value which ties to a row in the parent table. This means if you don't have a row for the FK, it has to be null. The work around for this would involve putting a non-sense row in the related table which would need to be filtered out all over the place (bad).

Tags: ,