Computed·Blg

Entries tagged as database

Wednesday, February 13. 2013

The Database Deluge… Who’s Who

-----

These are the top NoSQL Solutions in the market today that are open source, readily available, with a strong and active community, and actively making forward progress in development and innovations in the technology. I’ve provided them here, in no order, with basic descriptions, links to their main website presence, and with short lists of some of their top users of each database. Toward the end I’ve provided a short summary of the database and the respective history of the movement around No SQL and the direction it’s heading today.

Cassandra

http://cassandra.apache.org/

Cassandra is a distributed databases that offers high availability and scalability. Cassandra supports a host of features around replicating data across multiple datacenters, high availability, horizontal scaling for massive linear scaling, fault tolerance and a focus, like many NoSQL solutions around commodity hardware.

Cassandra is a hybrid key-value & row based database, setup on top of a configuration focused architecture. Cassandra is fairly easy to setup on a single machine or a cluster, but is intended for use on a cluster of machines. To insure the availability of features around fault tolerance, scaling, et al you will need to setup a minimal cluster, I’d suggest at least 5 nodes (5 nodes being my personal minimum clustered database setup, this always seems to be a solid and safe minimum).

Cassandra also has a query language called CQL or Cassandra Query Langauge. Cassandra also support Apache Projects Hive, Pig with Hadoop integration for map reduce.

Who uses Cassandra?

IBM
HP
Netflix
…many others…

HBase

http://hbase.apache.org/

In the book, Seven Databases in Seven Weeks, the Apache HBase Project is described as a nail gun. You would not use HBase to catalog your sales list just like you wouldn’t use a nail gun to build a dollhouse. This is an apt description of HBase.

HBase is a column-oriented database. It’s very good at scaling out. The origins of HBase are rooted in BigTable by Google. The proprietary database is described in in the 2006 white paper, “Bigtable: A Distributed Storage System for Structured Data.”

HBase stores data in buckets called tables, the tables contain cells that are at the intersection of rows and columns. Because of this HBase has a lot of similar characteristics to a relational database. However the similarities are only in name.

HBase also has several features that aren’t available in other databases, such as; versioning, compression, garbage collection and in memory tables. One other feature that is usually only available in relational databases is strong consistency guarantees.

The place where HBase really shines however is in queries against enormous datasets.

HBase is designed architecturally to be fault tolerate. It does this through write-ahead logging and distributed configuration. At the core of the architecture HBase is built on Hadoop. Hadoop is a sturdy, scalable computing platform that provides a distribute file system and mapreduce capabilities.

Who is using it?

Facebook uses HBase for its messaging infrastructure.
Stumpleupon uses it for real-time data storage and analytics.
Twitter uses HBase for data generation around people search & storing logging & monitoring data.
Meetup uses it for site data.
There are many others including Yahoo!, eBay, etc.

Mongo

http://www.mongodb.org/

MongoDB is built and maintained by a company called 10gen. MongoDB was released in 2009 and has been rising in popularity quickly and steadily since then. The name, contrary to the word mongo, comes from the word humongous. The key goals behind MongoDB are performance and easy data access.

The architecture of MongoDB is around document database principles. The data can be queried in an ad-hoc way, with the data persisted in a nested way. This database also, like most NoSQL databases enforces no schema, however can have specific document fields that can be queried off of.

Who is using it?

Foursquare
bit.ly
CERN for collecting data from the large Hadron Collider
…others…

Redis

http://redis.io/

Redis stands for Remote Dictionary Service. The most common capability Redis is known for, is blindingly fast speed. This speed comes from trading durability. At a base level Redis is a key-value store, however sometimes classifying it isn’t straight forward.

Redis is a key-value store, and often referred to as a data structure server with keys that can be string, hashes, lists, sets and sorted sets. Redis is also, stepping away from only being a key-value store, into the realm of being a publish-subscribe and queue stack. This makes Redis one very flexible tool in the tool chest.

Who is using it?

Blizzard (You know, that World of Warcraft game maker) ;)
Craigslist
flickr
…others…

Couch

http://couchdb.apache.org/

Another Apache Project, CouchDB is the idealized JSON and REST document database. It works as a document database full of key-value pairs with the values a set number of types including nested with other key-value objects.

The primary mode of querying CouchDB is to use incremental mapreduce to produce indexed views.

One other interesting characteristic about CouchDB is that it’s built with the idea of a multitude of deployment scenarios. CouchDB might be deployed to some big servers or may be a mere service running on your Android Phone or Mac OS-X Desktop.

Like many NoSQL options CouchDB is RESTful in operation and uses JSON to send data to and from clients.

The Node.js Community also has an affinity for Couch since NPM and a lot of the capabilities of Couch seem like they’re just native to JavaScript. From the server aspect of the database to the JSON format usage to other capabilities.

Who uses it?

NPM – Node Package Manager site and NPM uses CouchDB for storing and providing the packages for Node.js.

Couchbase (UPDATED January 18th)

Ok, I realized I’d neglected to add Couchbase (thus the Jan 18th update), which is an open source and interesting solution built off of Membase and Couch. Membase isn’t particularly a distributed database, or database, but between it and couch joining to form Couchbase they’ve turned it into a distributed database like couch except with some specific feature set differences.

A lot of the core architecture features of Couch are available, but the combination now adds auto-sharding clusters, live/hot swappable upgrades and changes, memchaced APIs, and built in data caching.

Who uses it?

Linkedin
Orbitz
Concur
…and others…

Neo4j

http://www.neo4j.org/

Neo4j steps away from many of the existing NoSQL databases with its use of a graph database model. It stored data as a graph, mathematically speaking, that relates to the other data in the database. This database, of all the databases among the NoSQL and SQL world, is very whiteboard friendly.

Neo4j also has a varied deployment model, being able to deploy to a small or large device or system. It has the ability to store dozens of billions of edges and nodes.

Who is using it?

Accenture
Adobe
Lufthansa
Mozilla
…others…

Riak

Riak is a key-value, distributed, fault tolerant, resilient database written in Erlang. It uses the Riak Core project as a codebase for the distributed core of the system. I further explained Riak, since yes, I work for Basho who are the makers of Riak, in a separate blog entry “Riak is… A Big List of Things“. So for a description of the features around Riak check that out.

Who is using Riak?

Best Buy
Github
Voxer
Yammer
Comcast
…many others…

In Summary

One of the things you’ll notice with a lot of these databases and the NoSQL movement in general is that it originated from companies needing to go “web scale” and RDBMSs just couldn’t handle or didn’t meet the specific requirements these companies had for the data. NoSQL is in no way a replacement to relational or SQL databases except in these specific cases where need is outside of the capability or scope of SQL & Relational Databases and RDBMSs.

Almost every NoSQL database has origins that go pretty far back, but the real impetus and push forward with the technology came about with key efforts at Google and Amazon Web Services. At Google it was with BigTable Paper and at Amazon Web Services it was with the Dynamo Paper. As time moved forward with the open source community taking over as the main innovator and development model around big data and the NoSQL database movement. Today the Apache Project has many of the projects under its guidance along with other companies like Basho and 10gen.

In the last few years, many of the larger mainstays of the existing database industry have leapt onto the bandwagon. Companies like Microsoft, Dell, HP and Oracle have made many strategic and tactical moves to stay relevant with this move toward big data and nosql databases solutions. However, the leadership is still outside of these stalwarts and in the hands of the open source community. The related companies and organizations that are focused on that community such as 10gen, Basho and the Apache Organization still hold much of the future of this technology in the strategic and tactical actions that they take since they’re born from and significant parts of the community itself.

For an even larger list of almost every known NoSQL Database in existence check out NoSQL Database .org.

Posted by Christian Babski in Software at 10:57

Defined tags for this entry: database, software

4058 hits

Tuesday, October 02. 2012

Questions abound as malicious phpMyAdmin backdoor found on SourceForge site

Via ars technica

-----

Developers of phpMyAdmin warned users they may be running a malicious version of the open-source software package after discovering backdoor code was snuck into a package being distributed over the widely used SourceForge repository.

The backdoor contains code that allows remote attackers to take control of the underlying server running the modified phpMyAdmin, which is a Web-based tool for managing MySQL databases. The PHP script is found in a file named server_sync.php, and it reads PHP code embedded in standard POST Web requests and then executes it. That allows anyone who knows the backdoor is present to execute code of his choice. HD Moore, CSO of Rapid7 and chief architect of the Metasploit exploit package for penetration testers and hackers, told Ars a module has already been added that tests for the vulnerability.

The backdoor is concerning because it was distributed on one of the official mirrors for SourceForge, which hosts more than 324,000 open-source projects, serves more than 46 million consumers, and handles more than four million downloads each day. SourceForge officials are still investigating the breach, so crucial questions remain unanswered. It's still unclear, for instance, if the compromised server hosted other maliciously modified software packages, if other official SourceForge mirror sites were also affected, and if the central repository that feeds these mirror sites might also have been attacked.

"If that one mirror was compromised, nearly every SourceForge package on that mirror could have been backdoored, too," Moore said. "So you're looking at not just phpMyAdmin, but 12,000 other projects. If that one mirror was compromised and other projects were modified this isn't just 1,000 people. This is up to a couple hundred thousand."

An advisory posted Tuesday on phpMyAdmin said: "One of the SourceForge.net mirrors, namely cdnetworks-kr-1, was being used to distribute a modified archive of phpMyAdmin, which includes a backdoor. This backdoor is located in file server_sync.php and allows an attacker to remotely execute PHP code. Another file, js/cross_framing_protection.js, has also been modified." phpMyAdmin officials didn't respond to e-mails seeking to learn how long the backdoored version had been available and how many people have downloaded it.

Update: In a blog post, SourceForge officials said they believe only the affected phpMyAdmin-3.5.2.2-all-languages.zip package was the only modified file on the cdnetworks mirror site, but they are continuing to investigate to make sure. Logs indicate that about 400 people downloaded the malicious package. The provider of the Korea-based mirror has confirmed the breach, which is believe to have happened around September 22, and indicated it was limited to that single mirror site. The machine has been taken out of rotation.

"Downloaders are at risk only if a corrupt copy of this software was obtained, installed on a server, and serving was enabled," the SourceForge post said. "Examination of web logs and other server data should help confirm whether this backdoor was accessed."

It's not the first time a widely used open-source project has been hit by a breach affecting the security of its many downstream users. In June of last year, WordPress required all account holders on WordPress.org to change their passwords following the discovery that hackers contaminated it with malicious software. Three months earlier, maintainers of the PHP programming language spent several days scouring their source code for malicious modifications after discovering the security of one of their servers had been breached.

A three-day security breach in 2010 on ProFTP caused users who downloaded the package during that time to be infected with a malicious backdoor. The main source-code repository for the Free Software Foundation was briefly shuttered that same year following the discovery of an attack that compromised some of the website's account passwords and may have allowed unfettered administrative access. And last August, multiple servers used to maintain and distribute the Linux operating system were infected with malware that gained root system access, although maintainers said the repository was unaffected.

Posted by Christian Babski in Software at 10:58

Defined tags for this entry: database, php, software, web

2999 hits

(Page 1 of 1, totaling 2 entries)

Quicksearch

Popular Entries

Show tagged entries

3d printing
ai
android
cloud
data visualisation
google
hardware
innovation&society
interface
mobile
network
os
privacy
programming
security
software
tablet
technology
web
wifi

Syndicate This Blog

Calendar

Blog Administration

Open login screen

Entries tagged as database

Entries tagged as database

Wednesday, February 13. 2013

The Database Deluge… Who’s Who

Cassandra

Mongo

Redis

Couch

Couchbase (UPDATED January 18th)

Neo4j

Riak

In Summary

Tuesday, October 02. 2012

Questions abound as malicious phpMyAdmin backdoor found on SourceForge site

Quicksearch

Popular Entries

Categories

Show tagged entries

Syndicate This Blog

Calendar

Blog Administration