Here are the slides from my talk about MongoDB at LinuxAlt. I’ll update the post as soon as the video will be available.
Myngo is a web administration interface for MongoDB. It is written in
Python, runs on Tornado and uses jQuery on the front-end. It is a
fresh, new project so there’s no package yet. If you want to try it
out follow the instructions.
and some server info. You can also do some actions with the DBs and
collections. Check the screenshots for more details.
There’s a lot of features in plan for Myngo, the most significant being:
* querying (or an interactive console)
* user auth and permission system
* slick UI
* some kind of test suite
requests to the project’s bug tracker or directly to me. I’ll also
gladly appreciate any kind of help (especially with design and layout)
so feel free to fork and hack away. If you find Myngo valuable, please consider donating so I can spend
more time improving it.
Tornado, being otherwise great, lacks session support which makes building even slightly complex websites a hurdle. I created a fork which adds session support to Tornado. So far it enables 6 different storage engines for sessions. I was curious how fast they are and how it would affect Tornado’s performance (requests per second served) overall.
I carried out a simple benchmark. All possible configurations were left in their default state. I used two servers, bot from Rackspace, both were located at the same datacenter. The first was used to run Apache Bench to simulate server load. It was run with 300 concurent requests, up to 10 000 requests total.
The second server was used to run the sample Tornado app. Nginx served as a load balancer to four different Tornado processes (one for each core). This is the recommended way of deploying a Tornado app. The server hardware was a 2.2 GHz quad core AMD Opteron with 2GB of RAM.
I always ran ab to prepopulate the storage with data before doing the measured reading. This simulated older, stored sessions. The Tornado process was a simple request handler, which stored data from a GET parameter to the session.
No session performance
To have a baseline of Tornado’s performance on the machine, I checked out the source code and installed it. The handler script was slightly altered, because, obviously, no sessions were available. Tornado scored 1626 req/s.
File based sessions
I didn’t expect much of file based sessions. It’s a naive implementation where all sessions are stored in a single file. It’s not suitable for production, but it’s ok when developing and testing. Due to poor performance I had to change ab parameters to 10 concurent requests and 1000 total to get at least some results. First run ended with approx. 160 req/s, but next batch dropped to 32 req/s and the third didn’t even finish. As I said, this way of storing sessions is good for testing purposes, at most.
Directory based sessions
If you want your sessions to be stored in files, use directory based sessions. This solution offers traditional approach of having one file per session. Of course, with lot of users there will be a lot of sessions (and a lot of files), but then again, modern filesystems can handle a large amount of small sized files easily. Plus with hundreds of thousands of users you probably wouldn’t be using this solution.
Directory based sessions performed reasonably well. The best run was 869 req/s, or 53.4% of the original performance. However, over time, as the amount of session files in the directory rose, it fell down to 608 req/s. The filesystem was ext3. I suspect reiserfs or JFS would score better.
Mysql based sessions
Tornado ships with a simple MySQL layer on top of MySQLdb module and because it is a popular choice among many web developers, I implemented support for MySQL session storage. Furthermore, it will be nice to see how it compares to its NoSQL cousins.
I used MySQL server v5.1.37 with default configuration. The results were a bit unstable. Apache bench reported 1171, 1216 and 1353 req/s in three consecutive runs. That’s 83 % when counting the best run. I didn’t investigate the root of the inconsistent performance. Test runs showed something between 1200 and 1300 req/s, with the mysqld process often consuming the whole capacity of one core.
Memcached based sessions
Being a non-persistent key-value store, Memcached has an obvious advantage over MySQL. At least on paper. I used Memcached 1.4.4. built from source. The best result was 1473 req/s (90.6 %), but the other two measured runs clocked at 1106 and 1202 req/s. Again, I don’t know why the big difference occured. The code uses pylibmc which is the fastest Python lib for interacting with memcached.
Redis based sessions
If you want persistency with session data, you can use Redis instead of Memcached. Redis is a simple, fast key-value store with advanced features.
I used v1.2.1 built from source. Redis scored very well with three consistent runs at around 1410 req/s, the best one showed 1418 req/s (87.2 %).
MongoDB based sessions
MongoDB is the last supported storage engine and it was a shock. I don’t know how the 10gen gal and guys do it, but MongoDB is FAST. All measured runs returned over 1500 req/s (1520, 1577 and 1582 req/s) which is a) supersonic and b) stable. That is 97.4 % of the original, no-sessions Tornado performance.
To be honest, MongoDB scored 960 req/s in one of the test runs. It was because the way it works – it allocates hard disk space by creating zero-filled files, starting from 2 MB, continuously up to 2 GB per file if the database needs more space. This one time the allocation happended during the test run (it was recorded in the mongod output log so it wasn’t hard to find out the reason), hence the bad performance. However, the space allocation is infrequent and in real world it would rarely be a problem.
Graph and data
I put the benchmark data along with the Tornado handler up on GitHub. The graph shows worst and best runs for easy comparsion.
Assuming you want to store sessions along with your app’s other data, I would recommend using Redis or MongoDB; it depends on your use case. Redis is fast, easy to set up and work with and offers persistence so it wins over Memcached. If you’re building something more complex, MongoDB is the way to go. It’s fast, fun and addictive with great support from the authors. For developers seeking the traditional, SQL approach, only MySQL is available at the moment. I may add PostgeSQL and SQLite support in the future. Get in touch if you need it or watch the repo to be aware of the latest changes.
Twitter accounts of folks related to MongoDB:@mongohq
@mongodb (official account)
@nosql_topsy Follow me on Twitter.