Monday, July 25, 2011

Roo In Action Corner - Testing Entity Validations With A Mock Entity

This article references the Manning book Spring Roo in Action, which Ken Rimple is currently authoring.  It is available as a Manning MEAP.

I've been working hard on the more advanced chapters of Roo in Action, and as part of this I've amassed some background material that didn't make it into the book.  This topic, using Spring's Mock Entity support, was one of those topics that didn't get a lot of attention.
In Spring Roo in Action, Chapter 3, I discuss how Roo automatically executes the Bean Validators when persisting a live entity. However, when running unit tests, we don't have a live entity at all, nor do we have a Spring container - so how can we exercise the validation without actually hitting our Roo application and the database?

Monday, July 18, 2011

The Ultimate Power of Spring Batch



Chariot architect Anatoly Polinsky has created an interactive Prezi presentation about the power of Spring Batch. It is an animated blog post because Anatoly is not just a software developer, he's an artist. It is a cool way to get familiar with what Spring Batch does for you and a lot of fun to watch.

Sunday, July 10, 2011

Growing Up with Jenkins/Hudson, Nexus, and Sonar, Part 1

In my previous post I explained why I think you should use Jenkins (or his twin Hudson), Nexus, and Sonar to super-charge your Maven builds. To summarize, Jenkins is a continuous integration server that runs your builds, Nexus is an artifact repository that versions and stores your jars/wars/zips/etc, and Sonar is a metrics server that gathers code metrics and produces nice reports to help you improve code quality. All 3 products are free OSS and really useful. But scaling anything is hard. In this post I'll talk about some of the challenges that you might face when you scale up a Jenkins infrastructure from a few builds a day to thousands of builds a day, and some tips to help overcome those challenges. In the following post, I'll cover Nexus and Sonar tips.

The Demo Went Well, But Now The Honeymoon Is Over

When you first introduce any new piece of infrastructure, your biggest challenge is generally just getting it installed and working at all. That is surprisingly easy with the Jenkins/Nexus/Sonar stack. They install easily. They have good-looking, intuitive UIs and demo really well. They play nicely together. You figure, "Setting up this CI (continuous integration) thing is a slam dunk. I'll be done by lunch." And then you introduce your beautiful new system to the users. Uhhgg. The users. Everything worked fine until they showed up and started breaking it. In this case, the users are the various development teams within your organization that want to #1 build their code with Jenkins, #2 store their artifacts in Nexus, and #3 gather code metrics with Sonar.

Jenkins runs all of your builds so obviously it requires a lot of CPU for compilation, running tests, and static code analysis. You will quickly need multiple boxes to handle the daily load of CI builds. Fortunately Jenkins has support for building a server farm quite easily by running a "master" instance that distributes builds to many "slave" instances. In practice this works very well. Some tips:

Tip 1: Partition your master/slave clusters by something logical like development organization. Don't put all of the builds on a single cluster unless you work for a very small organization. This is important for 3 reasons:



  1. It isolates development organization from each other so you can, for instance, restart or upgrade one cluster without effecting others and keeps "problematic" development organizations isolated (you know how you are...the kind of developer who put infinite loops in your unit tests and stuff like that).


  2. It allows you to configure security differently for each cluster which you may need to do if the groups within your company don't like each other. Hey, you're a DevOps, not a psychiatrist so just go with it.


  3. The Jenkins UI will get very messy very quickly with hundreds of jobs to wade through. It does have filtering features, but it is still slow to render in browsers. IE, I'm looking at you.


Tip 2: Early on, come up with a strategy to automate the creation of new master and slave instances. Two good options are using a provisioning tool like Puppet or Chef, or cloning a VM. One bad option is setting things up manually from memory.

This kind of automation is important because if you are scaling up (adding more and more development teams to your infrastructure) most likely you'll end up: #1 adding more master/slave clusters, and #2 making global changes across your master/slave instances. For example, Jenkins has an awesome plugin community so it is likely you'll be finding new and useful plugins often. Say you have 5 Jenkins clusters partitioned by development organization (a good choice for partitioning). You'll have to install the plugin on all 5 master instances manually. And say you need to change something in the environment. Assume your 5 master instances each have 2 slaves also. Now you've got to make a change in 15 places. Your chances of fat-fingering a change goes way up plus who wants to do all of that typing? So get on board with the DevOps movement, and automate so your infrastructure becomes code.

Problem 1: You configure Jenkins' security permissions to allow the development teams to create / manage their own jobs. This is a problem because if you want any uniformity at all in your builds, then 500 developers all changing their jobs willy-nilly creates a mess in Jenkins. This makes any kind of global changes to Jenkins jobs using a script very difficult, and doesn't allow you to use Jenkins to do any kind of enforcement of standards or provide a software "chain-of-custody" from source repository to production which can be a very big deal in a big company. Just having standards for Jenkins job names is actually very useful, and in a "free-for-all" model no standards can be enforced.

Problem 2: The reverse. You configure Jenkins' security permissions to restrict the development teams from creating / managing their own jobs. Instead only a select number of Jenkins admins can do that task. The Jenkins admins now have new full time job which is very un-fun: manually create jobs all day.

Solution: The crux of the issue is that you want developers to be able to change some fields in a job like the source code URL to their project, but not other things like mandatory builds steps such as quality gates or auditing steps. You also probably want to prevent developers from using Jenkins' cool-but-dangerous feature of allowing a job to run arbitrary script code on the server which obviously could do all kinds of mischief. Jenkins security permissions allow you to either create / manage a job in its entirety or not at all. What you really need is to set permissions on a field by field basis.

I don't have the perfect solution for this problem. For some of you out there, the whole "Jenkins job management" problem isn't a problem at all: just let the developers own their jobs and be done with it. I was on that side of the argument for a while, but experience has beaten me to down to the realization that some controls are actually a good thing.

There are 2 solutions I can think of. One is to create a Jenkins plugin that creates a new job type that is customized to your needs. I don't really like that one.

My suggested solution is to create a simple "Jenkins job management" web application in your favorite rapid application framework (Rails, Grails, etc) that is used by developers to create jobs. This application only allows them to set the fields that are "safe" and behind the scenes does the job creation / maintenance via Jenkins easy-to-use REST API. This is the best of both worlds: self-service creation of jobs but with a measure of control.

Problem 3: The builds run really slow

Solution: There are many, many reasons why this would be true, but there are 3 things I've found helpful aside from just buying bigger hardware.

The 1st thing is to profile your build. I wrote a simple AspectJ aspect (using load time weaving) to profile Maven builds and give timings for each Maven plugin that ran. That helped break down a 45 minute build into the different steps and help explain why it was taking so long.

The 2nd thing is to take all of the build steps that can be deferred until later and process them asynchronously. A CI job needs to run compilation and it needs to run tests to provide immediate feedback. You can't defer those steps. But there are many others that you potentially can defer. For example, Maven site generation is slow. Running Sonar metrics can also be slow. So instead of running the Maven site and Sonar stuff during your build, run them asynchronously. This takes a little engineering but you are a software engineer, right? You could write a simple Jenkins plugin that puts a message in a queue after each successful build. Then have a process outside of Jenkins -- potentially on a different server -- read the queue, and run things like Maven site and Sonar. You can potentially make your builds much faster using this technique, and I've used it successfully.

The 3rd thing is to pay attention to the build time trend information that Jenkins provides, and automatically email developers if their build takes X% more than it used to. I've often see the root cause of a suddenly slower build is that a slow unit test was introduced. Fixing the test improves the build times, and it is much easier to find the offending test if you notice the slow down right away. You can get the build times via Jenkins REST API, so you can write a little script (Groovy anyone?) that is scheduled to run every night and checks the build times to provide rapid feedback if a job suddenly gets much slowers.

Problem 4: No one pays attention to failing Jenkins builds

Solution: This is a real problem. Developers face a barrage of emails daily, and sometimes the job failure email is just one more to ignore. Obviously peer pressure is the main way to make people care about failing CI jobs. There are many Jenkins plugins that provide build notification, but my personal favorite way to get people to care is to bridge the virtual world into the physical world by building a CI orb. A CI orb has both a visual representation (some kind of light -- traffic light, lava lamps, glowing orb) and an audio output ("Joel has broken the build"). Instructions for a cool one that I've personally seen can be found here. The CI orb is not just a gimmick -- it really does work. It is easy to ignore 1 or 2 failing Jenkins jobs out of 50. But it is much harder to ignore a large pulsating red orb next to your boss's cube, or your name being read on a loudspeaker. A CI orb helps your team realize the purpose of CI which is to respond to failures quickly.





In the next post, I'll give you some tips for Nexus and Sonar.

Friday, July 1, 2011

Learnings from Actor Development

I spent a fair amount of time developing actor-based systems recently, specifically with the Scala Actor library. Regardless of whether you are implementing actors with the Scala library, Akka, Lift or Scalaz, some basic gotchas can present themselves until you get a feel for what you're doing. Here are some of them that I've learned the hard way.

Never Refer Directly to Other Actors
Actors are fragile and can die easily. While you typically create a supervisor with a strategy for how to recreate that actor, any other class with a direct reference to that actor that died now has an invalid reference. If you absolutely must have actors with references to others that do not have a supervisory relationship, use a proxy reference instead - if the actor behind the proxy dies, you only have to replace it in the proxy, not in every actor with that reference. Akka solves this problem nicely with ActorRef, where the reference behind it can be recreated without updating anyone holding the reference.

If You Do Have an Actor's Reference, Avoid Synchronous Method Calls Between Actors
Regardless of whether your actors are event- (shared thread pool) or thread-based (each actor has its own dedicated thread), avoid having actors make direct method calls on another actor. It introduces concurrency into classes that are designed to avoid that very situation - the receiving actor can be operating on a thread handling a mailbox message at the same time it is dealing with your call. Use blocking or future-based message sends instead, which allows the receiving actor to handle the request through its mailbox on its own thread. Not to harp on the virtues of Akka too much, but the ActorRef type also prevents this kind of behavior.

Write Business Logic in External Idempotent Functions
Testing actors is difficult, particularly those with side effects. If you are in a supervisor hierarchy, the receipt of a message may lead to the creation of child actors that have their own side effects which may be difficult to account for in a test environment. The goal of unit tests is not to test whether actor interaction works, but that the business logic that the actor performs is sound. Externalize your business logic into functions and partial functions that can be tested outside of actors, and use integration tests to prove only that the actors executing that logic behave as expected as part of an end-to-end functional test.

Beware the Thundering Herd
When you start creating structures of actors such as supervisor hierarchies, it can seem simplest to send generic messages that are passed through the tree. However, as actors react and send their own messages, this can lead to event "storms". This can be addressed using two strategies - 1) use granular messages that target specific events for specific actor instances, and 2) ignore messages of the same type with the same parameter data for a given time period. You can even implement a common trait for all of your actors that gives them the ability to not handle the same message for an externally-configurable period of time. Be judicious in how you use this, though - tune it for the loads of your system.

Garbage Collection
In the case of a supervisor hierarchy that is responsible for configuring servers in a cluster, you may want to implement garbage collecting actors that ensure that each server is pruned of configuration that it currently has but is no longer relevant. The actors in the supervisor hierarchy will take care of that if one was created to represent that particular configuration item, but if no actor already existed to represent that state, only a garbage collector whose role is specifically to clean up a dirty environment can take care of clearing bad data from the target server.

Always Pass Copies in Immutable Messages
Copy any object instance that will be passed in a message, so as to avoid accidentally sharing any state. In almost all cases, you should ensure that your messages themselves are immutable. Dean Wampler and Alex Payne make this point specifically in their book, Programming Scala. This, combined with very granular messages, can seem expensive in terms of resources. But it is worth the cost in memory and performance to ensure that your actor behavior is what you expect at design time.

Semantic Logging
Debugging actors isn't easy. Typically, you have multiple instances of the same class with asynchronous behavior, so it is difficult to discern flow. Create trace level log output for each actor type that displays specific information about it in a clearly-visible manner. Use line breaks and tabbed indentation to make it readable, but note that doing so can make your log files even larger than they already are. This has an unfortunate side effect of forcing you to be very granular in your log configuration as to what logging level is used - package-level logging may be too much information. It may help to put a timestamp into a message, so you can grep the log for specific messages as they flow through actors. Also, log the timestamp of when the actor received and handled it.

Deteriorating Retry
If your actors have side effects where a required resource (network connection, database access) may not be available or may fail, use deteriorating retry logic to allow the actor to send itself a message to try again in an increasingly longer interval. For a good example of this, go to Gmail, disconnect from all networks and watch as it tries to reconnect in longer and longer timeframes.

Instrument via JMX for Runtime Clarity
Register every actor instance with the JVM's MBeanServer, and have their supervisors clean up the instrumentation when they die. Yes, this comes at a performance cost, but you can make the registration asynchronous through a future while you perform other tasks in initialization and startup. While you'll still need to profile the threads involved to find threading issues, having the ability to view actor existence and state in JConsole or VisualVM is a wonderful help in knowing what is happening in your system in production.

Prepare for Race Conditions
As with any asynchronous programming, the timing of actor interactions can be unpredictable. Make your actor interactions recheck state they depend on so that they can reflect an appropriate state of their own. If Actor A needs Actor B to have a specific value for its own state to be appropriate, it should not send only one message to Actor B and assume that the value returned is correct that one time. Keep checking the value (again, possibly with deteriorating retry) until you can be certain you have a correct representation of Actor B's state.