Via GIGAOM
 
 -----
 
MongoDB might be a popular choice in NoSQL databases, but it’s not perfect — at least out of the box. At last week’s MongoSV conference
 in Santa Clara, Calif., a number of users, including from Disney, 
Foursquare and Wordnik, shared their experiences with the product. The 
common theme: NoSQL is necessary for a lot of use cases, but it’s not 
for companies afraid of hard work. 
 
If you’re in the cloud, avoid the disk
 
 According to Wordnik
 technical co-founder and vice president of engineering Tony Tam, unless
 you’re willing to spend beaucoup dollars on buying and operating 
physical infrastructure, cloud computing is probably necessary to match 
the scalability of NoSQL databases.
According to Wordnik
 technical co-founder and vice president of engineering Tony Tam, unless
 you’re willing to spend beaucoup dollars on buying and operating 
physical infrastructure, cloud computing is probably necessary to match 
the scalability of NoSQL databases.
 
As he explained, Wordnik actually launched on Amazon Web Services and
 used MySQL, but the database hit a wall at around a billion records, he
 said. So, Wordnik switched to MongoDB,
 which solved the scaling problem but caused its own disk I/O problems 
that resulted in a major performance slowdown. So, Wordnik ported 
everything back onto some big physical servers, which drastically 
improved performance.
 
And then came the scalability problem again, only this time it was in
 terms of infrastructure. So, it was back to the cloud. But this time, 
Wordnik got smart and tuned the application to account for the strengths
 and weaknesses of MongoDB (“Your app should be smarter than your 
database,” he says), and MongoDB to account for the strengths and 
weaknesses of the cloud.
 
Among his observations was that in the cloud, virtual disks have virtual
 performance, “meaning it’s not really there.” Luckily, he said, you can
 design to take advantage of virtual RAM. It will fill up fast if you 
let it, though, and there’s trouble brewing if requests start hitting 
the disk. “If you hit indexes on disk,” he warned, “mute your pager.”
 
 Foursquare’s Cooper Bethea echoed much of Tam’s sentiment, noting that “for us, paging the disk is really bad.” Because Foursquare
 works its servers so hard, he said, high latency and error counts start
 occurring as soon as the disk is invoked. Foursquare does use disk in 
the form of Amazon Elastic Block Storage, but it’s only for backup.
Foursquare’s Cooper Bethea echoed much of Tam’s sentiment, noting that “for us, paging the disk is really bad.” Because Foursquare
 works its servers so hard, he said, high latency and error counts start
 occurring as soon as the disk is invoked. Foursquare does use disk in 
the form of Amazon Elastic Block Storage, but it’s only for backup.
 
EBS also brings along issues of its own. At least once a day, Bethea 
said, queued reads and writes to EBS start backing up excessively, and 
the only solution is to “kill it with fire.” What that means changes 
depending on the problem, but it generally means stopping the MongoDB 
process and rebuilding the affected replica set from scratch.
 
Monitor everything
 
Curt Stevens of the Disney Interactive Media Group
 explained how his team monitors the large MongoDB deployment that 
underpins Disney’s online games. MongoDB actually has its own tool 
called the Mongo Monitoring System that Stevens said he swears by, but 
it isn’t always enough. It shows traffic and performance patterns over 
time, which is helpful, but only the starting point.
 
 Once a problem is discovered, “it’s like CSI
 on your data” to figure out what the underlying problem is. Sometimes, 
an instance just needs to be sharded, he explained. Other times, the 
code could be buggy. One time, Stevens added, they found out a 
poor-performing app didn’t have database issues at all, but was actually
 split across two data centers that were experiencing WAN issues.
Once a problem is discovered, “it’s like CSI
 on your data” to figure out what the underlying problem is. Sometimes, 
an instance just needs to be sharded, he explained. Other times, the 
code could be buggy. One time, Stevens added, they found out a 
poor-performing app didn’t have database issues at all, but was actually
 split across two data centers that were experiencing WAN issues.
 
Oh, and just monitoring everything isn’t enough when you’re talking 
about a large-scale system, Stevens said. You have to have alerts in 
place to tell you when something’s wrong, and you have to monitor the 
monitors. If MMS or any other monitoring tools go down, you might think 
everything is just fine while the kids trying to have a magical Disney 
experience online are paying the price.
 
By the numbers
 
If you’re wondering what kind of performance and scalability 
requirements forced these companies to MongoDB, and then to customize it
 so heavily, here are some statistics:
 
 
- Foursquare: 15 million users; 8 
production MongoDB clusters; 8 shards of user data; 12 shards of 
check-in data; ~250 updates per second on user database, with maximum 
output of 46 MBps; ~80 check-ins per second on check-in database, with 
maximum output of 45 MBps; up to 2,500 HTTP queries per second.
- Wordnik: Tens of billions of documents with more 
always being added; more than 20 million REST API calls per day; mapping
 layer supports 35,000 records per second.
- Disney: More than 1,400 
MongoDB instances (although “your eyes start watering after 30,” Stevens
 said); adding new instances every day, via a custom-built self-service 
portal, to test, stage and host new games.
For more-technical details about their trials and tribulations with MongoDB, all three presentations are available online, along with the rest of the conference’s talks.
 
 Personal Comments:
 
Here are some basics and information on NoSQL: Wiki, NoSQL Databases, MongoDB