November 16, 2009

Cannon Mountain

Owen and Katy atop Cannon MountainFor several weeks, now, Katy and I have been hoping to knock off another 4000 foot mountain in our goal of hiking all of New Hampshire's 4000 footers. With winter closing in, there aren't too many weekends left before we would likely be hiking on a snow covered trail. With the recent unseasonably warm temperatures here in the Northeast, though, we planned a quick day hike up Cannon Mountain for this past weekend. Cannon Mountain is "across the street" from Mounts Lafayette and Lincoln, our first conquest, and is home to what once was the iconic Old Man of the Mountain. There are several routes up it, but we chose to start our ascent from the Lafayette Campground on the southeast side of the mountain.

The quality of the weather for this trip was in doubt right up until we took our first step on the trail. A significant rain event was forecast for all day Saturday and into Sunday morning, but it was unclear when things would clear out on Sunday. We were feeling optimistic, though, and were up before sunrise on Sunday and began the two hour ride to Franconia Notch in the dark and in the rain. By the time we reached the trailhead, the rain had stopped and patches of blue sky were trying to break through the low clouds and the fog. Things were looking up and we were on the trail at 8:40AM.

Katy navigating through some serious bouldersEven after deciding to start our hike from Lafayette Campground, there were still a number of interconnecting trails that could get us to the top of the mountain. On the way up, we hiked all the way out past Lonesome Lake via the Lonesome Lake Trail and continued our ascent up this trail to its northern terminus where it met the Kinsman Ridge Trail. The Lonesome Lake Trail was well maintained with a reasonably moderate grade as we ascended about 1700 feet of elevation in 2.3 miles. From there, we followed the Kinsman Ridge Trail to the summit. The most technically challenging and steepest part of the climb was definitely the section between the end of the Lonesome Lake Trail and the junction with the Hi-Cannon trail (about 0.4 miles from the summit). In this section we went up about 500 feet of elevation in less than half a mile, requiring quite a bit of scrambling over large boulders and roots. We reached the top a little bit after 11:00AM.

Katy doing a Matrix style pose at the summitThere's a tramway and ski area on the opposite side of the mountain that we hiked, so the summit is fairly well developed with an observation tower at the peak. It was warm, even at the top of the mountain, with temperatures in the 50's and we were able to enjoy at least some partial views with clearing skies and mountain tops peaking out of the clouds to our north and west. It was quite a bit of a different feel than our early October hike with fall foliage nearing peak season. This time around the trees were bare with evergreens peppering the mountain sides. There wasn't much to see to the south and east, though, as some low, stubborn clouds had settled in, obscuring the views over to Lafayette and the Franconia Ridge. We hung out at the top for a while and ate our lunch before heading back down.

Owen overlooking Lonesome Lake on Hi-Cannon TrailOn the hike down, we decided to take a slightly different route. Instead of going down the steepest section of the Kinsman Ridge Trail to Lonesome Lake Trail, we decided to take a left turn at the Hi-Cannon trail. This trail was narrower and seemed a little less traveled. We had to deal with some quite steep sections of long, slippery rocks, including one ledge that was so impossibly steep that a ladder was constructed to assist hikers. Some trail descriptions peg the middle portion of this trail as the most difficult trail on Cannon Mountain. There were several neat lookout ledges along the route overlooking Lonesome Lake with views down the notch. The Hi-Cannon Trail met up with the Lonesome Lake Trail less than a half mile from the trailhead and we arrived back at the car at 1:40PM. Total time was about five hours round trip for six miles of hiking including our extended break at the summit. We drove back home, stopping for some food along the way and were back home before 6PM from a quick, but satisfying and enjoyable trip to the Whites with Katy.

For those keeping track, that's 3 down with 45 to go!

November 12, 2009

Rule 1 of Programming: It's Always Your Fault (Almost)

Over the past week or so, I've been working on putting together some micro benchmarks for Project Darkstar. There has been a significant uptick in forum activity lately relating to stress testing and performance issues. In particular, we've seen many questions along the lines: "I can only connect X users to my darkstar server, what's wrong?" First of all, this is great news. It means that people are making significant progress with their darkstar based games/applications and are working to push the limits of the technology. However, I also think that this is the completely wrong question to ask. As I've demonstrated before, Project Darkstar has a pretty high ceiling for raw capacity in terms of number of users. A properly tuned app with a light load can easily handle tens of thousands of users per node. However, connecting mostly idle users to a mostly idle server is not very interesting. These capacity numbers naturally decrease as the number of messages between the clients and server and the amount of processing per message increases. This seems obvious, but people still ask the capacity question as though all games developed with darkstar are going to have identical limitations. This is simply not the case.

With this said, though, we can still strive to identify upper bounds on Project Darkstar's performance at a more fine-grained level. Project Darkstar is an event driven transactional system, so all operations are not without cost. With these micro benchmarks, I'm hoping that I can establish a relative cost to each of the operations using the DataManager, the ChannelManager, and the TaskManager. For example, how expensive is it to retrieve an object using DataManager.getBinding() vs ManagedReference.get(). How much overhead is involved with each transaction? How expensive is it to create a Channel or send a message on a Channel? With more or less users? While the cost of retrieving data from Darkstar's data store should be an order of magnitude faster than using J2EE and a RDBMS, it is also likely an order of magnitude slower than retrieving data from a data structure that is already in memory and using no synchronization. This is information that users really need to be aware of and be able to take into perspective when designing their game, structuring their tasks, and establishing their own expectations of what the performance should be like.

So over the past couple of days, I've been debugging a problem in these benchmarks. In one particular test, I was attempting to measure the raw execution time per call to DataManager.getBinding() from the Project Darkstar API. The test was pretty simple, I just set a large number of bindings in a single set of setup transactions. Then I would time the execution of another set of transactions that would make some subset of calls to getBinding() on the names that I had just setup. Taking into account previously measured transaction overhead I could then come up with a reasonable estimate of the cost per operation. Seems easy right? Well it turns out that I hit a snag. In running this test, I was repeatedly getting a situation where a seemingly random name binding was not being set properly during setup. Most of the calls to getBinding() would work fine, but a couple were throwing NameNotBoundException. What? This didn't make much sense. I went back and looked over my code many times, I tried a myriad of variations, logging output, and print lines, but still no luck. I was still getting NameNotBoundException for what seemed like a random name in the sequence. Hmmph.

At this point, I went through a whole series of exercises, all centered around one assumption, that my code was right. I tested the native edition vs. the Java edition of BDB, suspecting maybe there was a weird bug in one of them: same result. I tried longer transactions, more operations, larger serialized data objects: same result. I tried running my benchmarks in different orders: same result. I even started writing test cases for DataManager.setBinding() that simulated transaction rollback and retry, large numbers of consecutive calls to setBinding() and binding and rebinding of the same name. I thought I was going to uncover some weird corner case bug. But those tests were passing! I was at a loss. After probably two days of sporadic attempts at debugging this, I finally went back and looked really hard at my own test code. And... I found a bug (doh!). It turns out that I was being too cute with my setup transactions, and was modifying a non-local counter variable inside of my anonymous nested transaction class. In random situations, this class would abort and retry (a normal darkstar operation), but since it was modifying a variable that lived outside of the task itself, this value was not being rolled back. The result was that a name binding would be skipped periodically (exactly the behavior that I was seeing).

So here's my question. Why did I assume that code that I wrote in less than a day was more likely to be correct than Berkeley DB itself, a project that's been developed and tested for a couple of decades? Why did I assume that code that I wrote in less than a day was more likely to be correct than Project Darkstar's Data Service code which has been developed and tested for years? I mean, I knew better than to think that Tim's code is the likely culprit, but I still started writing test cases thinking I was going to heroically find some obscure bug. This, my friends, is a violation of the number 1 rule of programming: If you're having problems, It's Always Your Fault (almost). I mean, don't get me wrong, I've found (and reported) bugs in well established open source projects before, but those situations are actually few and far between. I also don't mean to suggest that Project Darkstar is bug free. I do think, though, that sometimes it's too tempting to conclude that there's a bug in that library you're using, or there's a performance limitation in that technology that is fundamentally impossible to overcome. Maybe that's true, but 99% of the time, it's your fault.

And with regard to those micro benchmarks, I'm hoping to publish some results soon (assuming I don't get hung up with any more boneheaded mistakes!)

Ultimate Ongoing

A quick update on the ultimate front. I've been playing regularly the past few weeks, but not without some discomfort. My ankle is still not 100% but it's faded into a nagging annoyance. It appears that it always feels worse in the morning and then loosens up and doesn't bother me too much throughout the day. I've just played the past three days in a row, though, so I feel it a bit more today. In any case, hopefully this injury will be just a bitter memory soon.

Ultimate Statistics (since January 2008):
Total Games Played: 252
Total Hours Played: 296

October 12, 2009

Ultimate Update

It's been over five weeks since I sprained my ankle during an ultimate game. Unfortunately, it still hasn't completely healed. It's likely that I didn't give it enough time as only a week after the incident, I was back on the field. After a couple weeks of abusing it during hat league and pickup games, I shut it back down again over two weeks ago and haven't done anything on it since (umm, except hike 9 miles with 4000 feet of elevation gain, oops!) It's not really unstable or loose, and I don't have any trouble walking on it, but there is still some residual swelling and a bit of a dull soreness to it.

The good news is that I feel like it's (very slowly) healing. I've been doing some no impact strengthening exercises over the past few days and the swelling seems to be ever so slowly going down. I've picked Wednesday in my mind as my return date to the field, but may push it back if it doesn't feel ready. It's very frustrating to have an injury like this. No matter how many times I deal with them, it always takes a physical and mental toll to not have the ability to be active and out on the field. Regardless, what I always tell others and myself; injuries are a part of every competitive sport. Having the ability to respond and deal with them in a positive way is just as important as your skills in the activity itself. Hoping to be back out there soon...

Ultimate Statistics (since January 2008):
Total Games Played: 240
Total Hours Played: 284

October 7, 2009

Lafayette Lincoln Loop

Owen and Katy on top of Mt. Lafayette This past weekend, Katy and I went on the first of hopefully many adventures hiking in the White Mountains of New Hampshire. It's been quite some time since I've done a lot of real, solid hiking. When I was much younger, my dad, brother, uncle, and I used to make semi-frequent camping/hiking trips up north. Those slowly died out, though, as college, track, ultimate, and just life in general began filling up my time. Aside from a couple backpacking trips with Brian, and a few other tiny day hikes here and there, I'm realizing that I haven't really done much hiking at all in most of this decade. So with that said, I've decided to get back into hiking with a goal. How about hiking all 48 of the 4000+ foot peaks in the White Mountains?

I mentioned this idea to Katy, and not ever being one to back down from a challenge, she was in. Now, technically I've already done probably about 10 or so peaks on this list (in fact, if I remember correctly, I've already hiked 6 of the top 7 peaks, Adams being the lone 5000+ footer that I've never scaled). However, I figured we can do it as a team, and restart the list from scratch. So we planned a trip for this past weekend, and stayed at the Lafayette Place Campground in Franconia Notch where several of the 4000 footers flank on both sides...

Owen and Katy on top of Mount LincolnWe arrived Saturday afternoon under gray skies with showers spitting at us the whole drive up. At first it seemed like we would luck out and have just spotty showers in the evening. Not so. As soon as we pulled into our campsite and began setting up, the skies opened up. It poured as we setup our tent. It poured as we scrambled to provide some makeshift shelter from the rain with a tarp tied to some trees. It poured as we raced to get our sleeping bags and pads into the tent. By the time we finished setting up camp, we were both cold, soaked, shivering, and standing under a small patch of tarp that was keeping us out of the rain. With no hope of building a fire in the constant rain, we eventually retired to the tent early with some dry clothes and board games. Not a good start to our mission!

Fortunately, Sunday was a completely different story. We woke up to clearing skies and quickly grabbed some breakfast and began getting organized for our hike. Our plan was to attack the classic day hike in the Franconia Range, the Franconia Ridge Loop. This loop actually combines three trails into a 9 mile trek that brings you over the top of Mount Lafayette and Mount Lincoln from the 4000 footers list (technically it also brings you over Little Haystack Mountain but that peak does not satisfy the criteria to be on the 4000 footers list). After collecting some warmer gear, food, water, and gatorade in our packs, we were on the trail at about 8:40AM.

We started towards the summit of Mount Lafayette via the Old Bridle Path. This trail is fairly steep pretty much the whole way, but is very well maintained and easy to navigate. There were a few other groups on the trail, but it was fairly quiet for a near peak foliage weekend. Saturday's weather likely had something to do with that. By 11:00AM we had made it to the Greenleaf Hut, which sits right at the treeline about 3 miles from the trailhead and 1 mile from the summit. After a quick snack break and topping off our water bottles, we continued on to the summit and were there at about noon. Wahoo! One peak down! We had lunch at the top and about 3 or 4 other groups were up there doing the same.

Traveling across the ridge along the Franconia Ridge Trail was pretty awesome. Weather above the tree line was cool, but mostly clear and calm, a generally rare event. We had nice views of the Pemigewasset Wilderness to our east, and Cannon and the Kinsman Mountains to our west. Foliage was getting close to peak, and the trail was busy but did not have a train of people like I've seen in the past. By about 12:30PM we had bagged the Mount Lincoln summit, and by 1:00ish we had reached the trail junction for Falling Waters Trail at Little Haystack Mountain to head back down.

Katy gladly obeying the sign after the hikeI had expected our trip down to be fairly quick, but there was quite a bit of running water on and near the trail making it slippery and tougher to navigate. We passed by several fantastic waterfalls on the way down, but also had to deal with several tricky river crossings as a result. In the end we hit the bottom at about 3:40PM, almost exactly 7 hours which is the estimated book time.

When we got back to the campground, we had just one passing shower disturb our dinner but otherwise a dry night under an almost full moon in front of the fire. It being a Sunday night, almost the entire campground was empty so we had the place almost completely to ourselves. On Monday, we originally contemplated zipping up and tagging the summit of Cannon Mountain but decided not to overreach on our first trip and get back early enough to unpack and get organized. Overall, it was a fun trip. So 2 summits down, and it may take us a few years, but 46 to go!

P.S. Additional select photos of the trip are available on facebook.

September 25, 2009

Austin GDC 2009

agdcLast week, a number of us from the Project Darkstar team were in Austin for the Austin Game Developer's Conference. Like last year, we had a large booth on the expo floor. However, while last year we were largely focused on demonstrating Project Darkstar's capabilities to scale and distribute load across multiple cores and processors on a single node, this year our focus was on showcasing the team's progress on multi-node capabilities. Here's a recap of the week's events:

If you've been following along with Project Darkstar's progress over the years, you know that transparent multi-node scaling capabilities are one of its main attractions as a platform. On the other hand, you should also know that with Project Darkstar being a still maturing research project in Sun Labs, these features are not yet done. In fact, we have still not proven if what we are attempting can even be done. Our fearless lead architect, Jim Waldo, has put together an excellent series of posts on his blog outlining why and how we hope to achieve this multi-node scaling. He calls it the "Four Miracles."

Our goal for Austin was to put together some measure of a compelling visual demonstration of our current progress towards achieving these "Four Miracles." Specifically, we hoped to show how Ann's miracle of transparently moving clients from one node to another in a cluster worked in tandem with Keith's miracle of monitoring a node's health and intelligently making decisions about which clients to relocate and when. Also, while Jane's miracle of detecting and organizing clients into groups of affinity groups based on social networking algorithms is not complete, we also hoped to portray how it should work by rigging up an app that formed affinity groups based on a players location in the game (specifically which chat room it was in).

In the weeks leading up to the conference, Keith and I came up with two demos that for the most part seemed to hit the mark. Using the JMX facilities already built into Darkstar, I hacked up a monitoring GUI that tapped into a running Darkstar cluster and displayed each of the nodes as a vertical bar. The height of the bar represented the total number of clients connected to that node, and the color optionally represented either the node health, or fragments of colors represented the affinity group of each connected client. For the first demo, we had Project Snowman running in a multi-node cluster and showed how the new node health monitoring features of darkstar caused client traffic to spill over and be intelligently distributed between the nodes. Here's a quick video of Keith talking through it on expo floor in Austin:

The second demo had Darkchat running on a multi-node cluster and was designed to show how clients would relocate between nodes depending on which rooms they were connected to in the application. Here's another quick video given by yours truly on the floor at Austin:

Both of these demos we had running throughout the week on the floor and the response from people who came by was generally positive. I think one of the main differences that I noticed between this year and last year was the amount of quality traffic that we had come through the booth. I talked to a lot of people at Austin last year, but a large percentage of them had never heard of Project Darkstar or were just vaguely familiar with it. This year many people came by who were already committed to a project using Darkstar, or were very interested in our progress, or were familiar with the technology and had a strong desire to learn more. I think the best piece of anecdotal evidence was one person from the Intel booth who came by and said "Oh, so this is for real now?" Referring, of course, to the significant headway that we are finally making and showing on the multi-node scaling capabilities of the Project Darkstar platform.

In a tough year for everyone in just about every industry, I think going to this conference and putting together these demos have injected some energy into both the Project Darkstar community and the team. A few more personal observations:
  • The expo floor seemed smaller this year and overall attendance appeared to be down. Not too surprising but hopefully a sign of things past and not things to come.
  • A little tidbit on some of the pains of getting the demos setup. Leading up to the conference, we had everything running just fine in Burlington and the demos packaged up and ready to go. After a few hours of pulling cables, moving pods around, and booting up systems, we fired up the node health demo on Tuesday, the day before the expo floor opened. Except, it didn't work. At least not like it did in Burlington. When simulating an overloaded node, instead of offloading clients onto the other node right away, there was some seemingly random and arbitrary delay of 20 - 30 seconds before any clients would be moved. Huh?
  • After some hours of debugging, and pulling Seth into the mix for help, we finally tracked it down. The (still unfinished) node health code offloads identities from a node when it gets overloaded. However, it doesn't just move client identities, it also moves identities of robots and other NPC's in the system. Since each snowman game has a number of robots, it was choosing to move the robot identities before moving the client identities. The question, of course, is why didn't we ever see this behavior in Burlington? Well it turns out that the order in which identities are chosen to be moved is deterministic and seemingly alphabetical. While in Burlington, our client simulated players were generating identity names according to the hostname of the client machine (dstar1, dstar2, dstarX, ...). The hostnames of the machines used in Austin? x2250-1, x2250-2, etc. So in our Burlington deployment, the client identities were always getting chosen before the robot identities since they started with a d; in our Austin deployment, the client identities were always getting chosen after the robot identities since they started with an x. Unbelievable.
  • Keith gave a talk during a one hour session which was awesome. He went through a number of obstacles and challenges he faced when building Darkchat for JavaOne and I think it came across as very real and genuine.
  • One final note, it appears as though Chuck Norris still doesn't need scalable server technology. All of his CPU's run faster to get away from him. Also, any code that he writes cannot be optimized. For anyone else, though, Project Darkstar could be a solution.

September 6, 2009

Ultimate Injury

I haven't written an ultimate post in a while, but needless to say, it's not because I haven't been playing! However, due to recent events, I may not be playing for at least a week or so. During Friday's lunchtime pickup game, I came down hard on an uneven section of ground and rolled my ankle inwards. I thought I could walk it off and actually kept playing the remainder of the game. That was probably not wise, though, as it didn't take long after the game for my ankle to start swelling up. Diagnosis? Classic sprained ankle (although it's the more rare form, a sprain of the ligaments on the inside of the ankle rather than the outside). I've been icing and ibuprofening all weekend, and the swelling is starting to go down, but I have a feeling it's the 15 day DL for me.

Ultimate Statistics (since January 2008):
Total Games Played: 235
Total Hours Played: 277