owenkellett.com: Project Snowman: Lessons Learned

It was about two months ago that I was pulled into the Project Snowman effort for Austin GDC. The goal? Put together a complete, playable 3D action demo game that we can showcase on the expo floor at the conference as a demonstration of Project Darkstar's capabilities. The good news? We pulled it off. Thanks to the herculean effort by Keith, Josh, and Yi (plus the work that I did), we found ourselves with a networked, multiplayer "first person snowballer" as David described it to most of the people he encountered on the floor in Austin. [It's actually a third person snowballer with the main objective being to capture the flag but those are just details :)]. It really did work out well as the game itself drew a lot of people into the booth, and it made for a compelling demo with thousands of clients hammering away against a single server with it barely even breaking a sweat.

Now that the fun is over, though, it's time to take a step back and make some observations. What did we notice when developing this game? What did we do right? What did we do wrong? What advice can we give to help others trying to build a game with Project Darkstar? Here's my best shot:

HOLY CRAP! Developing a game with Project Darkstar is easy! Ok, well, maybe that's a little bit over the top. The truth is, though, Project Darkstar does a lot of things under the hood that normally a developer would have to think about and implement on her own (see the slides from my presentation at Austin GDC). Before I joined the effort in late July/early August, most of the work up to that point had been done by Yi on the client side. Animations, graphics, and building the client around JMonkeyEngine is where he had spent most of his time since starting as an intern at the beginning of the summer. Very little work had been done on the server side. A couple weeks later? We had a nearly feature complete and mostly unit-tested server side Project Snowman application. This is no hyperbole or exaggeration either, that's really how the timeline worked out.
Chuck Norris doesn't depend on external libraries, external libraries depend on him. This is actually true.
You should develop a client simulator early on. From the beginning, this project was meant to serve as a demo for Austin GDC. Therefore, we knew right off the bat that we would need some way to simulate a large amount of load on the server. We built a headless client that simulated typical client actions and not only did it prove valuable for testing, but it's also something that should be written for basically any Project Darkstar based game.
Chuck Norris can simulate maximum load on a server using only his fists and without boiling his blood. I've seen it done.
Plan to spend a lot of time tracking down performance bottlenecks and contention problems. While we were able to get a working game running in relatively short order, we quickly began noticing scalability problems and almost all of these problems were related to contention in some way. For example, one of the first contention issues we ran into was related to the AI snowmen that we introduced into the game. When the game started, the first thing that one of these snowmen did was loop through all of the game information (including all of the other snowmen and the flags) to determine who it should attack and where its opponent's flag is. On the surface, this sounds reasonable as it's just acquiring all of this information for reading. However, since the other AI snowmen were doing the same thing at the same time, each of them were attempting to acquire write locks on themselves. When the number of snowmen in a game was increased significantly, we were seeing pathological deadlock scenarios during game startup.
Code written by Chuck Norris never has any performance bottlenecks. Little known fact.
Built-in Project Darkstar profiling tools can prove to be extremely valuable. One of the most difficult things when working with a complex system like this is establishing clear ways to quantify performance. Fortunately, Project Darkstar has built in profiling capabilities that give you real-time feedback in terms of what the system is doing and how well it is handling the load. Seth has written a good blog post which can help you get started working with these profilers. In our experience, the most useful numbers were given by the SnapshotProfileOpListener which periodically output the number of attempted tasks, the number of successful tasks, and the average task queue size over 10 second intervals. This gave us a simple metric to be able to quickly determine whether the system is keeping up (the queue size remains small), and whether there is a lot of contention in the system (a high task failure rate is indicative of high contention). Another useful tool was the SnapshotTaskListener. Using it we could quickly determine which tasks were failing giving us better insight with regards to where contention is happening in the system.
Chuck Norris doesn't need profiling tools. He stares down the server until the profiling data comes to him. There's a rumor that he took on ten servers in a multi-node deployment simultaneously.
There is clearly a need for some type of Project Darkstar application test rig. Despite the fact that we were able to track down a lot of performance problems and contention issues using the built-in profilers, it was clear that a lot of the work required to setup and run these tests was highly mechanical and error-prone. Not only that, but without very careful record keeping, it was often difficult to keep track of what results came out of what conditions and whether or not certain changes helped or hindered the performance of the system. Most of our tests were setup in a very ad-hoc way and a framework that could consistently and automatically repeat our tests and give definitive results would prove monumentally useful. (Fortunately, this is on my to-do list).
Chuck Norris doesn't test his code. It always works because he tells it to. This would also make things easier for us.
Scalable data structures will likely be useful in almost any Project Darkstar game/application. Another problem that we faced was an issue with logins. In order to introduce a considerable load into the system, we needed to login a large number of clients in a short amount of time. However, this quickly became a problem. Our original implementation to handle logins simply added incoming players to a waiting queue to be asynchronously processed and matched into a game later. However, since there was a single queue, simultaneously adding a large number of players to the back while also removing players off the front created a massive amount of contention on the one queue object. How did we solve this? With David's ScalableDeque available in the com.sun.sgs.app.util package. The ScalableDeque allows for concurrent modification by allowing simultaneously writers on both the front and the back of the deque. We provided virtual support for multiple writers on each end by using an array of ScalableDeques. See the code for more insight on what we actually did. (Clearly this is something that could be generalized as a standalone utility. Add one to the to-do list.)
Chuck Norris can concurrently modify any data structure with no contention. Convenient.
Be careful of the AppContext temptation. If you're familiar with the Project Darkstar API, you know that access to the core Project Darkstar services is given through Manager objects. These Managers are acquired directly from the Project Darkstar stack by making static method calls against the com.sun.sgs.app.AppContext class. Why is this important? Well in my experience with code written against this API, I've noticed that there is a strong tendency to litter your application with direct calls to AppContext.getDataManager() or AppContext.getTaskManager(), etc. Why is this a problem? It tightly couples just about all of your classes with the static, unchangeable AppContext class of the core Project Darkstar API. This makes it extremely difficult to isolate your individual classes from the rest of the system for unit testing purposes. Now this can be worked around by making judicious use of the AppContext method calls and by being explicit in defining each class's dependencies. However, I would like to see this taken one step further and provide an alternative means of acquiring Managers from the Project Darkstar stack without relying on so many static method calls (another one for the to-do list).
Chuck Norris doesn't need Project Darkstar, he can roundhouse kick a piece of Java code into a complete MMORPG in 2.4 seconds. Just wait until version 1.0 though. By then even Chuck Norris will be using Project Darkstar.

That's about all the insight I can offer for one blog post. Don't forget that Project Snowman is an open source project itself. We do hope that members of the community take an interest in helping move its development forward.

2 comments :

Jackal von ÖRFSeptember 23, 2008 at 1:28 PM
"However, I would like to see this taken one step further and provide an alternative means of acquiring Managers from the Project Darkstar stack without relying on so many static method calls (another one for the to-do list)."

I have already done that with Dimdwarf (http://dimdwarf.sourceforge.net/). Specifically, the net.orfjackal.dimdwarf.serial.InjectObjectsOnDeserialization class uses Guice to inject dependencies into the objects when they are deserialized by net.orfjackal.dimdwarf.serial.ObjectSerializerImpl. This supports method and field injection (constructor injection is later in my todo list). Dimdwarf uses that to inject EntityLoader (~DataManager) into EntityReferenceImpl (~ManagedReference), but the same thing will work also for application classes.

Porting it and other things to Darkstar is in my todo list.
OwenSeptember 23, 2008 at 2:08 PM
Hi Jackal,

Yes I've seen some posts about Dimdwarf on the Project Darkstar forums but haven't gotten a chance to check it out yet. It seems as though we have similar ideas as some type of dependency injection mechanism is exactly what I had in mind to alleviate this problem. Ultimately, I'd like to see Project Darkstar act more like a true application container and less like an external library dependency and this would be a big step towards achieving that goal.

September 23, 2008

Project Snowman: Lessons Learned

2 comments :