Via GIGAOM
-----
MongoDB might be a popular choice in NoSQL databases, but it’s not perfect — at least out of the box. At last week’s MongoSV conference
in Santa Clara, Calif., a number of users, including from Disney,
Foursquare and Wordnik, shared their experiences with the product. The
common theme: NoSQL is necessary for a lot of use cases, but it’s not
for companies afraid of hard work.
If you’re in the cloud, avoid the disk
According to Wordnik
technical co-founder and vice president of engineering Tony Tam, unless
you’re willing to spend beaucoup dollars on buying and operating
physical infrastructure, cloud computing is probably necessary to match
the scalability of NoSQL databases.
As he explained, Wordnik actually launched on Amazon Web Services and
used MySQL, but the database hit a wall at around a billion records, he
said. So, Wordnik switched to MongoDB,
which solved the scaling problem but caused its own disk I/O problems
that resulted in a major performance slowdown. So, Wordnik ported
everything back onto some big physical servers, which drastically
improved performance.
And then came the scalability problem again, only this time it was in
terms of infrastructure. So, it was back to the cloud. But this time,
Wordnik got smart and tuned the application to account for the strengths
and weaknesses of MongoDB (“Your app should be smarter than your
database,” he says), and MongoDB to account for the strengths and
weaknesses of the cloud.
Among his observations was that in the cloud, virtual disks have virtual
performance, “meaning it’s not really there.” Luckily, he said, you can
design to take advantage of virtual RAM. It will fill up fast if you
let it, though, and there’s trouble brewing if requests start hitting
the disk. “If you hit indexes on disk,” he warned, “mute your pager.”
Foursquare’s Cooper Bethea echoed much of Tam’s sentiment, noting that “for us, paging the disk is really bad.” Because Foursquare
works its servers so hard, he said, high latency and error counts start
occurring as soon as the disk is invoked. Foursquare does use disk in
the form of Amazon Elastic Block Storage, but it’s only for backup.
EBS also brings along issues of its own. At least once a day, Bethea
said, queued reads and writes to EBS start backing up excessively, and
the only solution is to “kill it with fire.” What that means changes
depending on the problem, but it generally means stopping the MongoDB
process and rebuilding the affected replica set from scratch.
Monitor everything
Curt Stevens of the Disney Interactive Media Group
explained how his team monitors the large MongoDB deployment that
underpins Disney’s online games. MongoDB actually has its own tool
called the Mongo Monitoring System that Stevens said he swears by, but
it isn’t always enough. It shows traffic and performance patterns over
time, which is helpful, but only the starting point.
Once a problem is discovered, “it’s like CSI
on your data” to figure out what the underlying problem is. Sometimes,
an instance just needs to be sharded, he explained. Other times, the
code could be buggy. One time, Stevens added, they found out a
poor-performing app didn’t have database issues at all, but was actually
split across two data centers that were experiencing WAN issues.
Oh, and just monitoring everything isn’t enough when you’re talking
about a large-scale system, Stevens said. You have to have alerts in
place to tell you when something’s wrong, and you have to monitor the
monitors. If MMS or any other monitoring tools go down, you might think
everything is just fine while the kids trying to have a magical Disney
experience online are paying the price.
By the numbers
If you’re wondering what kind of performance and scalability
requirements forced these companies to MongoDB, and then to customize it
so heavily, here are some statistics:
- Foursquare: 15 million users; 8
production MongoDB clusters; 8 shards of user data; 12 shards of
check-in data; ~250 updates per second on user database, with maximum
output of 46 MBps; ~80 check-ins per second on check-in database, with
maximum output of 45 MBps; up to 2,500 HTTP queries per second.
- Wordnik: Tens of billions of documents with more
always being added; more than 20 million REST API calls per day; mapping
layer supports 35,000 records per second.
- Disney: More than 1,400
MongoDB instances (although “your eyes start watering after 30,” Stevens
said); adding new instances every day, via a custom-built self-service
portal, to test, stage and host new games.
For more-technical details about their trials and tribulations with MongoDB, all three presentations are available online, along with the rest of the conference’s talks.
Personal Comments:
Here are some basics and information on NoSQL: Wiki, NoSQL Databases, MongoDB