A friend pointed me to an interesting ACM Queue article about Darkstar, Sun's open source Java-based MMOG server technology. It's a few months old, but hey, I just saw it!
One immediate reaction is to pooh-pooh Java for games, or high-concurrency servers; but the truth is, both have been done in Java before (the Star Wars Galaxies server tech was reportedly a mix of Java and C++, with most of the game logic - I think - being in Java). And anyway, as evidenced in the big debate from late '07 about scripting languages, we've been debating whether a pure compiled C++ (or C, in the case of Cryptic) or a lower-performing, easier-to-develop hybrid of C++ and Python/Lua/whatever for quite some time - with different developers taking different stances.
So "Java is too slow! Nobody uses Java for games!" is a tempting, but not entirely accurate response.1
Another reaction is one of indignation: I think linking MMOGs to stand-alone, single player games (or regular multi-player games like Unreal Tournament) ignores the strong historical connection from MUDs to MMOGs. Meridian 59, Ultima Online, EverQuest, the early games that helped define this genre are all MUD-inspired, featuring people who started as MUD players and wizards doing everything on the server. I don't particularly care for being referred to as "people who grew up on a single (PC) machine, running a single thread," for whom being asked "to master the intricacies of concurrent programming or distributed systems takes them away from their concentration on the game or world experience itself."
How Dr. Waldo describes us is, still, not a very compelling argument against Darkstar. Take a look at how some game developers talk about their players... the games still stand on their own merit, good or bad. One of the biggest problems I see, actually, is that a Java server and a C++ client (i.e., even a moderately demanding 3D client) will require two separate code bases. There will be a lot of duplicated code for game logic, network protocol, and so on, with a natural inclination to make the two look very similar (or possibly be generated from some master source, a significant engineering project on its own). I have no idea how SWG approached this problem, but they at least had the benefit of a hybrid C++/Java system where presumably the shared code stayed in C++.
I'm also very negatively inclined toward the data access layer. Everything is atomic? Collision detection and aggro calculation are just two of the things that come to mind that could lock the data for every player in a vicinity while they're accessed, and make the game stutter. Mix in the argument against geographic decomposition, and things get even worse - much, much faster than your ability to buy additional hardware.
I am pretty skeptical of Darkstar. At the same time, I expect the trend of which it's a part - making MMOG development more accessible to more people - to continue. Through a combination of middleware, new languages, and more web-centric development, MMOG clients and servers will continue to cover more of the spectrum between "EVE Online" and "Facebook."
Bleh. I should go to sleep. :-)
1. Part of me still resists: in a lot of ways I don't see Java as being particularly better than C++, especially as I've been exploring Boost further. In some ways, Java seems like the worst of two worlds: very little of the dynamic flexibility of proper scripting languages like Python or Lua or Ruby, and very little of the performance from C++ (never mind C). There are other perspectives, of course, like the fact that the learning curve on C++ is much steeper than any of the other languages; or that Java supports a view of good software engineering that the others - including C++, with its operator overloading and template specialization and so on - simply can't support the same way.
Monday, January 19. 2009
Darkstar
at
15:18
| 12 Comments
| No Trackbacks
Defined tags for this entry: c++, you're doing it wrong
Current mood:
tired
Monday, December 15. 2008
My LotRO
I am quite impressed with what Turbine did on My LotRO, and I wanted to take a minute here to make a note of it.
They do a few things I consider odd, like using WordPress as (what looks like) the starting point of the whole thing, but overall it's impressive in its polish and thoroughness. I can't say I'm surprised they have so much data about their players - Darius Kazemi got his start in MMOG metrics at Turbine - but that's just one more thing to be jealous of. Kinship event calendars, event logs going back at least to March of '07 (that's the earliest I found in a brief perusal), lots of AJAXy interactive bits all over the page and (by-and-large) good clean URLs1.
Initially they chose (mistakenly, in my opinion) to show all the characters that were tied into a single account, but it looks like they've done a quick hack to hide that while they go back and reconsider it. Hopefully, they'll take it out of the site design completely. We often have a lot of different virtual identities, and it feels like an invasion of privacy to show the links between them. What if I'm avoiding letting an annoying friend know about the character I usually play as, but I keep around another character so we can play when I'm up to it? Busted. It may have also been a security hole, if the player names being displayed were the actual logins.
The journals are great, and it's clear they got a lot of benefit out of WordPress here; it looks like the event calendars are a WordPress plug-in as well, so that probably saved them quite a bit of effort on the web development side. Using WordPress also meant that the journals are easily exposed in RSS format, something I had to hand-code on Dungeon Runners for the character blogs2.
Still, I think it would be more awesome if players could add to their journal from in-game - I'm pretty sure this is not the case, but can someone correct me if I'm wrong? "Balrog has killed me three times in a row! Getting frustrated" would be a lot more likely to show up - and less likely to disengage the player - if they didn't have to log out to type it. Also, I'm curious about whether links are allowed in the journals? I haven't seen any links yet, but perhaps no one has had the urge yet. The ability to link to other players' journals, and comment on specific journal entries, might be enough to render official forums obsolete3.
And my final question... where's the raw data? It looks like character profiles, guild rosters, and so on wouldn't be too hard to screenscrape, so someone is probably already starting on it. Is Turbine working on providing this data?
1. Should something go terribly wrong with WordPress (and it's happened before), "replace it" is a reasonably valid option. Implementation independence is a big win!
2. As far as I know, this still isn't actually live in Dungeon Runners. I hope it is one day, but it's out of my hands now.
3. Maaaaaybe. I haven't really given it much thought, but it seems plausible and is worth discussion, I think.
They do a few things I consider odd, like using WordPress as (what looks like) the starting point of the whole thing, but overall it's impressive in its polish and thoroughness. I can't say I'm surprised they have so much data about their players - Darius Kazemi got his start in MMOG metrics at Turbine - but that's just one more thing to be jealous of. Kinship event calendars, event logs going back at least to March of '07 (that's the earliest I found in a brief perusal), lots of AJAXy interactive bits all over the page and (by-and-large) good clean URLs1.
Initially they chose (mistakenly, in my opinion) to show all the characters that were tied into a single account, but it looks like they've done a quick hack to hide that while they go back and reconsider it. Hopefully, they'll take it out of the site design completely. We often have a lot of different virtual identities, and it feels like an invasion of privacy to show the links between them. What if I'm avoiding letting an annoying friend know about the character I usually play as, but I keep around another character so we can play when I'm up to it? Busted. It may have also been a security hole, if the player names being displayed were the actual logins.
The journals are great, and it's clear they got a lot of benefit out of WordPress here; it looks like the event calendars are a WordPress plug-in as well, so that probably saved them quite a bit of effort on the web development side. Using WordPress also meant that the journals are easily exposed in RSS format, something I had to hand-code on Dungeon Runners for the character blogs2.
Still, I think it would be more awesome if players could add to their journal from in-game - I'm pretty sure this is not the case, but can someone correct me if I'm wrong? "Balrog has killed me three times in a row! Getting frustrated" would be a lot more likely to show up - and less likely to disengage the player - if they didn't have to log out to type it. Also, I'm curious about whether links are allowed in the journals? I haven't seen any links yet, but perhaps no one has had the urge yet. The ability to link to other players' journals, and comment on specific journal entries, might be enough to render official forums obsolete3.
And my final question... where's the raw data? It looks like character profiles, guild rosters, and so on wouldn't be too hard to screenscrape, so someone is probably already starting on it. Is Turbine working on providing this data?
1. Should something go terribly wrong with WordPress (and it's happened before), "replace it" is a reasonably valid option. Implementation independence is a big win!
2. As far as I know, this still isn't actually live in Dungeon Runners. I hope it is one day, but it's out of my hands now.
3. Maaaaaybe. I haven't really given it much thought, but it seems plausible and is worth discussion, I think.
Tuesday, December 2. 2008
Metrics: Planning
I haven't said much about metrics here for about eight months, but that doesn't mean I haven't had more to say. I've just been sorting it out. :)
As I mentioned last time, there are different kinds of metrics to meet different needs. Operational metrics can often be gathered into 5-minute averages as they come out, those can be aggregated into hour-long averages a few hours after they're generated, and a week or two after that they can be rolled into 24-hour data points - the amount of data you need to actually store is tiny. It's trivial to store this data in MySQL, never mind every other relational database on the planet. Most of the work is even already done; just feed it to things like Cacti or MRTG and you're done.
The data that is ultimately visible to players on the web is also a relatively small amount of data: keeping the last 1,000 character events around for a blog system, a full character sheet, PvP rankings, etc. could all be stored in a handful of GB. Any relational database you care to name can probably handle it, on a cheap server from whatever vendor you like, on any storage solution you like - the whole thing can even be backed up onto a $20 flash drive if you want backups. Again, this is not a big deal.
...And then there's the meaty metrics, the things that let you generate pretty graphs of "death by level and zone" for the past month of game activity. Turns out, that's quite a bit of data. Darius Kazemi identified storage to be one of the big hurdles in dealing with gameplay metrics, in his talk with Larry Mellon at Austin GDC: Wake Up and Smell the Metrics! From the informal survey I've performed, a terabyte of data per month is easy to generate with decent coverage and a decent number of players; the field of suitable databases suddenly narrows.
To handle a terabyte of data (assuming you can condense most of the data down to reports once it's a month old), you have to invest a bit more in your database server: a RAID of high end disks is the tip of the iceberg, 8GB RAM is a good starting point, and 4 cores is a minimum (CPU is almost definitely not your first bottleneck). You're probably shelling out big money for your database license now, and adding capacity is far from a linear exercise.
In game development, we're often inclined to point to these kinds of problems and proudly declare that we're working on problems no one else is, so we probably can't use any one else's expertise. It's quite often true in game AI programming, graphics, and so on, but in this case the opposite is true: we've got options on who to emulate. The approach I outlined above is close to a low-end enterprise data warehousing model; they probably use more Java and have bigger budgets, but I think it's recognizable. MMOG servers can resemble enterprise software so thoroughly at times that even Sun Microsystems has started getting involved!
...But it's not the only model to follow. As it turns out, web developers deal with very similar problems: the access log on web servers gathers usage data at the same kind of granularity as "every time a character swings their axe," and moderately popular sites probably see data quantities on par with most MMOGs. Facebook claims it's logging system handles tens of billions1 of messages per day, which is hundreds of thousands of messages per second.
And you know what? These guys aren't logging all data to a central server immediately, because it doesn't scale. They aren't even trying to parse and load into databases. They aren't buying really expensive machines to choke down all of that data. Instead, they're going distributed, and utilizing something like Google's MapReduce - often in the form of Hadoop.
For comparison to the aforementioned "baseline beefy database server," I took a look at what a small cluster of decent Dell machines could provide. $10,000 - what could be, by itself, the price of a database license - can build a small cluster with 8TB of triply redundant storage2. With a system like MapReduce, you can actually utilize all that distributed horsepower to chew through data analysis. What's more, storage and processing capabality scales somewhat linearly with cost!3
I haven't been able to put this talk to any serious test just yet, but I have some ideas. Time-to-level data should be easy to generate, deaths per level, and even the relative popularity of different zones should all be straight-forward. It's a very different paradigm from relational databases, but then SQL presents its own unique challenges too. It's not a full solution yet by any means, but on the other hand the ability to store that much data, and process it however you like... it has a lot of potential.
1. High Scalability has the article where I first heard the number, although their math is off by an order of magnitude. Some of my arguments here are echoes of the conclusions over there.
2. You can easily hit 16GB RAM total, 32 cores, and 24 1TB disks spread out across 4 machines. Each datum being stored exists on three of the four machines, so whatever kind of hardware failure you have - up to and including one of the machines completely failing - data isn't lost. You also have higher total I/O bandwidth.
3. Some things, like network bandwidth or data center footprint or heat output, cause plateaus where the next step up requires only a modicum of computer hardware cost, but significant other expenses. So it's not completely linear, but compare it to scaling up a database server :-)
EDIT: added a footnote, and re-worded a few things - nothing big. :-)
As I mentioned last time, there are different kinds of metrics to meet different needs. Operational metrics can often be gathered into 5-minute averages as they come out, those can be aggregated into hour-long averages a few hours after they're generated, and a week or two after that they can be rolled into 24-hour data points - the amount of data you need to actually store is tiny. It's trivial to store this data in MySQL, never mind every other relational database on the planet. Most of the work is even already done; just feed it to things like Cacti or MRTG and you're done.
The data that is ultimately visible to players on the web is also a relatively small amount of data: keeping the last 1,000 character events around for a blog system, a full character sheet, PvP rankings, etc. could all be stored in a handful of GB. Any relational database you care to name can probably handle it, on a cheap server from whatever vendor you like, on any storage solution you like - the whole thing can even be backed up onto a $20 flash drive if you want backups. Again, this is not a big deal.
...And then there's the meaty metrics, the things that let you generate pretty graphs of "death by level and zone" for the past month of game activity. Turns out, that's quite a bit of data. Darius Kazemi identified storage to be one of the big hurdles in dealing with gameplay metrics, in his talk with Larry Mellon at Austin GDC: Wake Up and Smell the Metrics! From the informal survey I've performed, a terabyte of data per month is easy to generate with decent coverage and a decent number of players; the field of suitable databases suddenly narrows.
To handle a terabyte of data (assuming you can condense most of the data down to reports once it's a month old), you have to invest a bit more in your database server: a RAID of high end disks is the tip of the iceberg, 8GB RAM is a good starting point, and 4 cores is a minimum (CPU is almost definitely not your first bottleneck). You're probably shelling out big money for your database license now, and adding capacity is far from a linear exercise.
In game development, we're often inclined to point to these kinds of problems and proudly declare that we're working on problems no one else is, so we probably can't use any one else's expertise. It's quite often true in game AI programming, graphics, and so on, but in this case the opposite is true: we've got options on who to emulate. The approach I outlined above is close to a low-end enterprise data warehousing model; they probably use more Java and have bigger budgets, but I think it's recognizable. MMOG servers can resemble enterprise software so thoroughly at times that even Sun Microsystems has started getting involved!
...But it's not the only model to follow. As it turns out, web developers deal with very similar problems: the access log on web servers gathers usage data at the same kind of granularity as "every time a character swings their axe," and moderately popular sites probably see data quantities on par with most MMOGs. Facebook claims it's logging system handles tens of billions1 of messages per day, which is hundreds of thousands of messages per second.
And you know what? These guys aren't logging all data to a central server immediately, because it doesn't scale. They aren't even trying to parse and load into databases. They aren't buying really expensive machines to choke down all of that data. Instead, they're going distributed, and utilizing something like Google's MapReduce - often in the form of Hadoop.
For comparison to the aforementioned "baseline beefy database server," I took a look at what a small cluster of decent Dell machines could provide. $10,000 - what could be, by itself, the price of a database license - can build a small cluster with 8TB of triply redundant storage2. With a system like MapReduce, you can actually utilize all that distributed horsepower to chew through data analysis. What's more, storage and processing capabality scales somewhat linearly with cost!3
I haven't been able to put this talk to any serious test just yet, but I have some ideas. Time-to-level data should be easy to generate, deaths per level, and even the relative popularity of different zones should all be straight-forward. It's a very different paradigm from relational databases, but then SQL presents its own unique challenges too. It's not a full solution yet by any means, but on the other hand the ability to store that much data, and process it however you like... it has a lot of potential.
1. High Scalability has the article where I first heard the number, although their math is off by an order of magnitude. Some of my arguments here are echoes of the conclusions over there.
2. You can easily hit 16GB RAM total, 32 cores, and 24 1TB disks spread out across 4 machines. Each datum being stored exists on three of the four machines, so whatever kind of hardware failure you have - up to and including one of the machines completely failing - data isn't lost. You also have higher total I/O bandwidth.
3. Some things, like network bandwidth or data center footprint or heat output, cause plateaus where the next step up requires only a modicum of computer hardware cost, but significant other expenses. So it's not completely linear, but compare it to scaling up a database server :-)
EDIT: added a footnote, and re-worded a few things - nothing big. :-)
Sunday, November 23. 2008
in the office
As of Friday, I'm not working from home anymore! I actually get the best of both worlds, my commute is very short but it's not "go into the other room" - I'm really excited to get to work shoulder to shoulder with these guys finally. The veil of secrecy still hasn't lifted, unfortunately...
Monday, November 17. 2008
Can they just pick a name?
I just noticed the LOGIN '09 Areas of Interest section - I looked before and either missed them, or they weren't up yet. Ever since the beginning of last year I've had a collection of slowly-evolving session ideas, and it looks like a lot of them fall into the areas of interest this time around. I must be ahead of the curve. :-P
I read from Joe Ludwig that they were accepting session proposals starting Friday, but apparently that's changed since the main site now says November 24.
I read from Joe Ludwig that they were accepting session proposals starting Friday, but apparently that's changed since the main site now says November 24.
previous page

