May 31, 2009

JavaOne Prep

javaone_logoI'm back in San Francisco this week in the familiar downtown area near the Moscone Center where JavaOne is being held. It wasn't that long ago when I was here at the same spot for GDC, and now I'm back for another first time experience. I've never been to JavaOne before, and what better way to gain my first exposure to it than as a first class citizen (speaker).

If you haven't already seen it, John already gave an excellent run down of the Project Darkstar related activities in the Project Darkstar team blog. At the risk of sounding repetitive, I'll be participating in three of those events this week:
  • On Tuesday, I'll be running a hands on lab on Project Darkstar. The lab will essentially be a step-by-step tutorial on coding up the Project Snowman game. I just zipped through the actual lab on the actual lab machines earlier this afternoon and I think it's going to turn out to be a pretty fun lab.
  • On Wednesday, I'll be giving a newly tweaked version of my Project Darkstar technical talk. Taking a queue from Brian (award winning high school history teacher), I've worked in some verbal sci-fi references to go along with the visuals. This should make it awesome.
  • On Thursday, I've been tagged to do a quickie podcast in the JavaOne Community Corner. I think it will be a pretty informal chat about Project Darkstar.
So those are my responsibilities this week. Other than that, I've scoped out and signed up for some other sessions that peaked my interest, including among others one on Hudson, a few on unit testing, and a couple being given by the Java veteran Josh Bloch. Then there are the keynotes, general sessions, Java Pavilion, and probably too many other events going on for me to keep track of. And of course I'll try (and most certainly fail) at keeping up with the host of parties and bashes going on in the evenings.

It will be a busy week ahead but I'm pretty excited about it. To be honest, a part of me was dreading this event. Mostly because I'm not a very outspoken person, and often like to keep to the shadows. Being put front and center for these sessions is essentially way outside of my comfort zone. I was also talking to Katy last night about how travelling can be draining for me. Between Austin, GDC, and now JavaOne, this is the third time I've travelled in a year's time. Small potatoes for some, but the most ever for me. Now that I'm here, though, I'm ready for the challenge and am looking forward to a successful (and fun!) week at JavaOne. Maybe I'll see you there...

Ultimate Blessing

The events of this weekend and upcoming week are likely a blessing in disguise. You see, I'm on a BUDA spring hat league team and the end of season tournament is today. However, I missed it, because I'm currently in San Francisco preparing for JavaOne (more on that in a later post). Disappointing yes, but there could have been an easy and obvious solution to this problem: participating in yesterday's tournament instead! BUDA's hat leagues are actually split into two leagues, and they each have their own tournament. I could have jumped in with a team and played as a pickup player in yesterday's tournament. I even went so far as to submit my name as a potential pickup player to BUDA so they could contact me if they needed players.

The problem is that I submitted my name kind of late (Friday) so they never got back to me. However, I easily could've still just shown up and jumped in with a team (I did this for last year's summer BUDA hat tournament and it was a blast). Why didn't I though? My hamstring. It's still bothering me, and I'm sure if I had played in either yesterday's or today's tournament then I would have seriously damaged my chances at getting through this summer's super busy ultimate schedule (read: 5 days a week) without an injury incident. So I sat it out yesterday on an absolutely perfect day, and spent today traveling on a plane.

Sigh. Like I said, it's a blessing in disguise. This week I'm at JavaOne all week so I'll be giving my hammy a much needed continuous period of rest. Then next week summer club games startup along with regularly scheduled Burlington lunchtime pickup. Hopefully by then I'll be back to 100%!

Ultimate Statistics (since January 2008):
Total Games Played: 188
Total Hours Played: 219

May 22, 2009

Ultimate Week

In case you were wondering, I'm still playing ultimate. I've decided to compress my ultimate posts into periodic summaries. This way, they won't seem so monotonous.

We've had some warm weather over the last couple of weeks, pushing back into the 90's for a couple of games. I made it to the Burlington pickup games but this past weekend's BUDA spring hat league game was rained out.

Ultimate Statistics (since January 2008):
Total Games Played: 187
Total Hours Played: 218

May 12, 2009

Capacity Testing

There's one question that we get asked a lot about Project Darkstar: "How many users can you connect to one server?" This is a difficult question to answer, mainly because it's extremely sensitive to the context. The game type, game behavior, and hardware specifications all can have an extremely large effect.

Today I decided to see if I can establish an upper bound for this question. My goal was to put together an ad-hoc test to see how many idle clients I can log into a server. I used Tim's request app which is basically a little performance testing widget that accepts commands from clients (such as "JOIN_CHANNEL", "LEAVE_CHANNEL", etc.). It doesn't do anything when a client logs in, though, and will happily sit idly if the client doesn't send any commands. This makes it a perfect candidate for this test. I wrote a simple client that does nothing but login a configurable number of users. Here's what I found:

Machine configurations (1 used as server, 4 as clients):
Sunblade 6220
2 dual core AMD 2200 2.8Ghz
16GB RAM
Solaris 10u6

Maximum connected clients:
32bit JVM, 128MB max heap : ~800
32bit JVM, 1GB max heap : ~6000
32bit JVM, 2GB max heap : ~12000

I noticed when a limit was reached because each time, the server would throw an exception that looked something like this:

[INFO] SEVERE: acceptor error on 0.0.0.0/0.0.0.0:11469
[INFO] java.lang.OutOfMemoryError: Direct buffer memory
[INFO]  at java.nio.Bits.reserveMemory(Bits.java:633)
[INFO]  at java.nio.DirectByteBuffer.(DirectByteBuffer.java:95)
[INFO]  at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
[INFO]  at com.sun.sgs.impl.protocol.simple.AsynchronousMessageChannel.(AsynchronousMessageChannel.java:86)
[INFO]  at com.sun.sgs.impl.protocol.simple.SimpleSgsProtocolImpl.(SimpleSgsProtocolImpl.java:167)
[INFO]  at com.sun.sgs.impl.protocol.simple.SimpleSgsProtocolImpl.(SimpleSgsProtocolImpl.java:139)
[INFO]  at com.sun.sgs.impl.protocol.simple.SimpleSgsProtocolAcceptor$ConnectionHandlerImpl.newConnection(SimpleSgsProtocolAcceptor.java:316)
[INFO]  at com.sun.sgs.impl.transport.tcp.TcpTransport$AcceptorListener.completed(TcpTransport.java:331)
[INFO]  at com.sun.sgs.impl.nio.AsyncGroupImpl$CompletionRunner.run(AsyncGroupImpl.java:161)
[INFO]  at com.sun.sgs.impl.nio.Reactor$ReactiveAsyncKey.runCompletion(Reactor.java:858)
[INFO]  at com.sun.sgs.impl.nio.Reactor$PendingOperation$1.done(Reactor.java:630)
[INFO]  at java.util.concurrent.FutureTask$Sync.innerSet(FutureTask.java:251)
[INFO]  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[INFO]  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
[INFO]  at com.sun.sgs.impl.nio.Reactor$PendingOperation.selected(Reactor.java:563)
[INFO]  at com.sun.sgs.impl.nio.Reactor$ReactiveAsyncKey.selected(Reactor.java:803)
[INFO]  at com.sun.sgs.impl.nio.Reactor.performWork(Reactor.java:323)
[INFO]  at com.sun.sgs.impl.nio.ReactiveChannelGroup$Worker.run(ReactiveChannelGroup.java:268)
[INFO]  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[INFO]  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[INFO]  at java.lang.Thread.run(Thread.java:619)

From these numbers and the exception above, it looks like the maximum capacity of the server closely correlates with the configured maximum heap size, which makes sense. However, there are a few things that are odd:
  • Why are the numbers so low? The clients aren't doing anything once they login and yet they are eating up memory seemingly very quickly.
  • Even though increasing the heap size helps, connecting to the server JVM using JConsole shows that the memory usage never comes close to the max heap limit.
After digging through the stack trace as well as the darkstar I/O code, I discovered that the culprit lies in our use of DirectByteBuffers. First, for each client that connects, a DirectByteBuffer of length 128K is allocated to serve as buffer space for incoming packets. Second, memory allocated for DirectByteBuffers is not recorded as used in the Java heap space (even though the heap limit seemingly does have an effect) so it is confusing to monitor the JVM.

Fortunately, there are a couple of things I can do with this information to help improve my numbers. First, Project Darkstar provides a configuration property (com.sun.sgs.impl.protocol.simple.read.buffer.size) where you can change the read buffer size. Instead of 128K, I switched it to 8K, it's specified minimum. In most games, packet sizes should be very small, much much smaller than 128K, so changing this limit may be an acceptable solution in many cases. Second, and more of a big hammer approach is to switch to using a 64bit JVM. This would allow us to configure a heap limit greater than 2GB. Here's what I observed with these changes:

Maximum connected clients:
32bit JVM, 2GB max heap, com.sun.sgs.impl.protocol.simple.read.buffer.size=8192 : ~64000
64bit JVM, 16GB max heap, com.sun.sgs.impl.protocol.simple.read.buffer.size=131072 : ~64000
In both of these cases, the limit was tripped up by throwing a different exception this time:
[INFO] SEVERE: acceptor error on 0.0.0.0/0.0.0.0:11469
[INFO] java.io.IOException: Too many open files
[INFO]  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
[INFO]  at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
[INFO]  at com.sun.sgs.impl.nio.AsyncServerSocketChannelImpl$1.call(AsyncServerSocketChannelImpl.java:254)
[INFO]  at com.sun.sgs.impl.nio.AsyncServerSocketChannelImpl$1.call(AsyncServerSocketChannelImpl.java:251)
[INFO]  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[INFO]  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
[INFO]  at com.sun.sgs.impl.nio.Reactor$PendingOperation.selected(Reactor.java:563)
[INFO]  at com.sun.sgs.impl.nio.Reactor$ReactiveAsyncKey.selected(Reactor.java:803)
[INFO]  at com.sun.sgs.impl.nio.Reactor.performWork(Reactor.java:323)
[INFO]  at com.sun.sgs.impl.nio.ReactiveChannelGroup$Worker.run(ReactiveChannelGroup.java:268)
[INFO]  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[INFO]  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[INFO]  at java.lang.Thread.run(Thread.java:619)
This is a much better looking number, and our exception also suggests that we're now running into a different problem, most likely the max file descriptors limitation of the OS. This is likely configurable as well, but I haven't tried to increase it. A few closing thoughts:
  • The current, default implementation of the server has a perhaps overly conservative, fixed buffer size allocated for each connected client. This can be tweaked with the com.sun.sgs.impl.protocol.simple.read.buffer.size to reduce memory usage.
  • Properly tweaking this property gives us an upper bound of approximately 64000 connected clients (on Solaris 10, without making an effort to increase the max file descriptor setting for the OS).
  • It should be noted that the server handled login storms with minimal effort. In the final tests I bombarded the server with about 20000 logins at a time. Under these circumstances each client averaged a round trip (login initiation to login completion) of anywhere from 500 milliseconds to 15 seconds.
  • The clients were overloaded long before the server. I was unable to spin up more than 2000 clients per JVM before hitting out of memory errors and was forced to manage 20 to 30 client JVM's spread across 4 machines in these tests. A bit of a pain, and suggests that both the client could/should be optimized, and that some automation would be helpful.

May 11, 2009

Ultimate Recap

Over the past month or so I've been battling a nagging tight hamstring and sat out the last few Burlington pickup games as well as yesterday's BUDA Hat league game (in which we apparently squashed the other team for the fourth game out of four). Today was too nice for me to resist, though, and I went out to play pickup at lunch. The tweaked hamstring is still a bit of a problem, though. I hope it heals up soon, but it seems like playing and healing may be mutually exclusive in this case...

In other news, I'm officially part of the "Spawning Alewives" BUDA summer club league team for this summer. I'm looking forward to it but I need to somehow get my hammy back to 100% while simultaneously finishing up the spring hat season as well as continue Burlington pickup!

Ultimate Statistics (since January 2008):
Total Games Played: 183
Total Hours Played: 214