For several weeks, now, Katy and I have been hoping to knock off another 4000 foot mountain in our goal of hiking all of New Hampshire's 4000 footers. With winter closing in, there aren't too many weekends left before we would likely be hiking on a snow covered trail. With the recent unseasonably warm temperatures here in the Northeast, though, we planned a quick day hike up Cannon Mountain for this past weekend. Cannon Mountain is "across the street" from Mounts Lafayette and Lincoln, our first conquest, and is home to what once was the iconic Old Man of the Mountain. There are several routes up it, but we chose to start our ascent from the Lafayette Campground on the southeast side of the mountain.
The quality of the weather for this trip was in doubt right up until we took our first step on the trail. A significant rain event was forecast for all day Saturday and into Sunday morning, but it was unclear when things would clear out on Sunday. We were feeling optimistic, though, and were up before sunrise on Sunday and began the two hour ride to Franconia Notch in the dark and in the rain. By the time we reached the trailhead, the rain had stopped and patches of blue sky were trying to break through the low clouds and the fog. Things were looking up and we were on the trail at 8:40AM.
Even after deciding to start our hike from Lafayette Campground, there were still a number of interconnecting trails that could get us to the top of the mountain. On the way up, we hiked all the way out past Lonesome Lake via the Lonesome Lake Trail and continued our ascent up this trail to its northern terminus where it met the Kinsman Ridge Trail. The Lonesome Lake Trail was well maintained with a reasonably moderate grade as we ascended about 1700 feet of elevation in 2.3 miles. From there, we followed the Kinsman Ridge Trail to the summit. The most technically challenging and steepest part of the climb was definitely the section between the end of the Lonesome Lake Trail and the junction with the Hi-Cannon trail (about 0.4 miles from the summit). In this section we went up about 500 feet of elevation in less than half a mile, requiring quite a bit of scrambling over large boulders and roots. We reached the top a little bit after 11:00AM.
There's a tramway and ski area on the opposite side of the mountain that we hiked, so the summit is fairly well developed with an observation tower at the peak. It was warm, even at the top of the mountain, with temperatures in the 50's and we were able to enjoy at least some partial views with clearing skies and mountain tops peaking out of the clouds to our north and west. It was quite a bit of a different feel than our early October hike with fall foliage nearing peak season. This time around the trees were bare with evergreens peppering the mountain sides. There wasn't much to see to the south and east, though, as some low, stubborn clouds had settled in, obscuring the views over to Lafayette and the Franconia Ridge. We hung out at the top for a while and ate our lunch before heading back down.
On the hike down, we decided to take a slightly different route. Instead of going down the steepest section of the Kinsman Ridge Trail to Lonesome Lake Trail, we decided to take a left turn at the Hi-Cannon trail. This trail was narrower and seemed a little less traveled. We had to deal with some quite steep sections of long, slippery rocks, including one ledge that was so impossibly steep that a ladder was constructed to assist hikers. Some trail descriptions peg the middle portion of this trail as the most difficult trail on Cannon Mountain. There were several neat lookout ledges along the route overlooking Lonesome Lake with views down the notch. The Hi-Cannon Trail met up with the Lonesome Lake Trail less than a half mile from the trailhead and we arrived back at the car at 1:40PM. Total time was about five hours round trip for six miles of hiking including our extended break at the summit. We drove back home, stopping for some food along the way and were back home before 6PM from a quick, but satisfying and enjoyable trip to the Whites with Katy.
For those keeping track, that's 3 down with 45 to go!
November 16, 2009
November 12, 2009
Rule 1 of Programming: It's Always Your Fault (Almost)
Over the past week or so, I've been working on putting together some micro benchmarks for Project Darkstar. There has been a significant uptick in forum activity lately relating to stress testing and performance issues. In particular, we've seen many questions along the lines: "I can only connect X users to my darkstar server, what's wrong?" First of all, this is great news. It means that people are making significant progress with their darkstar based games/applications and are working to push the limits of the technology. However, I also think that this is the completely wrong question to ask. As I've demonstrated before, Project Darkstar has a pretty high ceiling for raw capacity in terms of number of users. A properly tuned app with a light load can easily handle tens of thousands of users per node. However, connecting mostly idle users to a mostly idle server is not very interesting. These capacity numbers naturally decrease as the number of messages between the clients and server and the amount of processing per message increases. This seems obvious, but people still ask the capacity question as though all games developed with darkstar are going to have identical limitations. This is simply not the case.
With this said, though, we can still strive to identify upper bounds on Project Darkstar's performance at a more fine-grained level. Project Darkstar is an event driven transactional system, so all operations are not without cost. With these micro benchmarks, I'm hoping that I can establish a relative cost to each of the operations using the DataManager, the ChannelManager, and the TaskManager. For example, how expensive is it to retrieve an object using DataManager.getBinding() vs ManagedReference.get(). How much overhead is involved with each transaction? How expensive is it to create a Channel or send a message on a Channel? With more or less users? While the cost of retrieving data from Darkstar's data store should be an order of magnitude faster than using J2EE and a RDBMS, it is also likely an order of magnitude slower than retrieving data from a data structure that is already in memory and using no synchronization. This is information that users really need to be aware of and be able to take into perspective when designing their game, structuring their tasks, and establishing their own expectations of what the performance should be like.
So over the past couple of days, I've been debugging a problem in these benchmarks. In one particular test, I was attempting to measure the raw execution time per call to DataManager.getBinding() from the Project Darkstar API. The test was pretty simple, I just set a large number of bindings in a single set of setup transactions. Then I would time the execution of another set of transactions that would make some subset of calls to getBinding() on the names that I had just setup. Taking into account previously measured transaction overhead I could then come up with a reasonable estimate of the cost per operation. Seems easy right? Well it turns out that I hit a snag. In running this test, I was repeatedly getting a situation where a seemingly random name binding was not being set properly during setup. Most of the calls to getBinding() would work fine, but a couple were throwing NameNotBoundException. What? This didn't make much sense. I went back and looked over my code many times, I tried a myriad of variations, logging output, and print lines, but still no luck. I was still getting NameNotBoundException for what seemed like a random name in the sequence. Hmmph.
At this point, I went through a whole series of exercises, all centered around one assumption, that my code was right. I tested the native edition vs. the Java edition of BDB, suspecting maybe there was a weird bug in one of them: same result. I tried longer transactions, more operations, larger serialized data objects: same result. I tried running my benchmarks in different orders: same result. I even started writing test cases for DataManager.setBinding() that simulated transaction rollback and retry, large numbers of consecutive calls to setBinding() and binding and rebinding of the same name. I thought I was going to uncover some weird corner case bug. But those tests were passing! I was at a loss. After probably two days of sporadic attempts at debugging this, I finally went back and looked really hard at my own test code. And... I found a bug (doh!). It turns out that I was being too cute with my setup transactions, and was modifying a non-local counter variable inside of my anonymous nested transaction class. In random situations, this class would abort and retry (a normal darkstar operation), but since it was modifying a variable that lived outside of the task itself, this value was not being rolled back. The result was that a name binding would be skipped periodically (exactly the behavior that I was seeing).
So here's my question. Why did I assume that code that I wrote in less than a day was more likely to be correct than Berkeley DB itself, a project that's been developed and tested for a couple of decades? Why did I assume that code that I wrote in less than a day was more likely to be correct than Project Darkstar's Data Service code which has been developed and tested for years? I mean, I knew better than to think that Tim's code is the likely culprit, but I still started writing test cases thinking I was going to heroically find some obscure bug. This, my friends, is a violation of the number 1 rule of programming: If you're having problems, It's Always Your Fault (almost). I mean, don't get me wrong, I've found (and reported) bugs in well established open source projects before, but those situations are actually few and far between. I also don't mean to suggest that Project Darkstar is bug free. I do think, though, that sometimes it's too tempting to conclude that there's a bug in that library you're using, or there's a performance limitation in that technology that is fundamentally impossible to overcome. Maybe that's true, but 99% of the time, it's your fault.
And with regard to those micro benchmarks, I'm hoping to publish some results soon (assuming I don't get hung up with any more boneheaded mistakes!)
With this said, though, we can still strive to identify upper bounds on Project Darkstar's performance at a more fine-grained level. Project Darkstar is an event driven transactional system, so all operations are not without cost. With these micro benchmarks, I'm hoping that I can establish a relative cost to each of the operations using the DataManager, the ChannelManager, and the TaskManager. For example, how expensive is it to retrieve an object using DataManager.getBinding() vs ManagedReference.get(). How much overhead is involved with each transaction? How expensive is it to create a Channel or send a message on a Channel? With more or less users? While the cost of retrieving data from Darkstar's data store should be an order of magnitude faster than using J2EE and a RDBMS, it is also likely an order of magnitude slower than retrieving data from a data structure that is already in memory and using no synchronization. This is information that users really need to be aware of and be able to take into perspective when designing their game, structuring their tasks, and establishing their own expectations of what the performance should be like.
So over the past couple of days, I've been debugging a problem in these benchmarks. In one particular test, I was attempting to measure the raw execution time per call to DataManager.getBinding() from the Project Darkstar API. The test was pretty simple, I just set a large number of bindings in a single set of setup transactions. Then I would time the execution of another set of transactions that would make some subset of calls to getBinding() on the names that I had just setup. Taking into account previously measured transaction overhead I could then come up with a reasonable estimate of the cost per operation. Seems easy right? Well it turns out that I hit a snag. In running this test, I was repeatedly getting a situation where a seemingly random name binding was not being set properly during setup. Most of the calls to getBinding() would work fine, but a couple were throwing NameNotBoundException. What? This didn't make much sense. I went back and looked over my code many times, I tried a myriad of variations, logging output, and print lines, but still no luck. I was still getting NameNotBoundException for what seemed like a random name in the sequence. Hmmph.
At this point, I went through a whole series of exercises, all centered around one assumption, that my code was right. I tested the native edition vs. the Java edition of BDB, suspecting maybe there was a weird bug in one of them: same result. I tried longer transactions, more operations, larger serialized data objects: same result. I tried running my benchmarks in different orders: same result. I even started writing test cases for DataManager.setBinding() that simulated transaction rollback and retry, large numbers of consecutive calls to setBinding() and binding and rebinding of the same name. I thought I was going to uncover some weird corner case bug. But those tests were passing! I was at a loss. After probably two days of sporadic attempts at debugging this, I finally went back and looked really hard at my own test code. And... I found a bug (doh!). It turns out that I was being too cute with my setup transactions, and was modifying a non-local counter variable inside of my anonymous nested transaction class. In random situations, this class would abort and retry (a normal darkstar operation), but since it was modifying a variable that lived outside of the task itself, this value was not being rolled back. The result was that a name binding would be skipped periodically (exactly the behavior that I was seeing).
So here's my question. Why did I assume that code that I wrote in less than a day was more likely to be correct than Berkeley DB itself, a project that's been developed and tested for a couple of decades? Why did I assume that code that I wrote in less than a day was more likely to be correct than Project Darkstar's Data Service code which has been developed and tested for years? I mean, I knew better than to think that Tim's code is the likely culprit, but I still started writing test cases thinking I was going to heroically find some obscure bug. This, my friends, is a violation of the number 1 rule of programming: If you're having problems, It's Always Your Fault (almost). I mean, don't get me wrong, I've found (and reported) bugs in well established open source projects before, but those situations are actually few and far between. I also don't mean to suggest that Project Darkstar is bug free. I do think, though, that sometimes it's too tempting to conclude that there's a bug in that library you're using, or there's a performance limitation in that technology that is fundamentally impossible to overcome. Maybe that's true, but 99% of the time, it's your fault.
And with regard to those micro benchmarks, I'm hoping to publish some results soon (assuming I don't get hung up with any more boneheaded mistakes!)
Labels:
Project Darkstar
,
Tech
Ultimate Ongoing
A quick update on the ultimate front. I've been playing regularly the past few weeks, but not without some discomfort. My ankle is still not 100% but it's faded into a nagging annoyance. It appears that it always feels worse in the morning and then loosens up and doesn't bother me too much throughout the day. I've just played the past three days in a row, though, so I feel it a bit more today. In any case, hopefully this injury will be just a bitter memory soon.
Ultimate Statistics (since January 2008):
Total Games Played: 252
Total Hours Played: 296
Ultimate Statistics (since January 2008):
Total Games Played: 252
Total Hours Played: 296
Labels:
Ultimate
Subscribe to:
Posts
(
Atom
)