Spreading Knol: May 2015

Enhanced Query Language and Tools

Key MongoDB tools mongoimport, mongoexport, mongodump, mongorestore, mongostat, mongotop and mongooplog have been re-written as multi-threaded processes in Go, allowing faster operations and smaller binaries.

mongodump and mongorestore now execute parallelized backup and recovery for small MongoDB instances. Dumps created by earlier releases can be restored to instances running MongoDB 3.0.

mongoimport can parallelize loads across multiple collections with multi-threaded bulk inserts allowing for significantly faster imports of CSV, TSV and JSON data exported from other databases or applications. Ensuring data quality, mongoimport now also supports input validation of field names during the import process.

Improved DBA Productivity: Enhanced Query Engine Introspection The MongoDB "explain() method is an invaluable tool for DBAs in optimizing performance. Using explain() output, DBAs can review query plans, ensuring common queries are serviced by well-defined indexes, as well as eliminating any unnecessary indexes that can increase write latency and add overhead during query planning and optimization.

In the latest MongoDB 3.0 release explain() has been significantly enhanced:

The query plan can now be calculated and returned without first having to run the query. This enables DBAs to review which plan will be used to execute the query, without having to wait for the query to run to completion.
DBAs can run explain() to generate detailed statistics on all query plans considered by the optimizer. Execution statistics are available for every evaluated plan, down to the granularity of execution stage. Now, for example, it is possible for the DBA to distinguish the amount of time a query plan spent sorting the result set from the amount of time spent reading index keys.
The explain() method exposes query introspection to a wider range of operations, including find, count, update, remove, group, and aggregate, enabling DBAs to optimize for a wider range of query types.

Richer Geospatial Apps: Big Polygon Support for Multi-Hemisphere Queries

MongoDB’s geospatial indexes and queries are widely used by developers building modern location-aware applications across industries as diverse as high technology, retail, telecommunications and government. MongoDB 3.0 adds big polygon geospatial support with$intersects and $within operators, allowing execution of queries over geographic areas extending across multiple hemispheres and areas that exceed 50% of the earth’s surface. As an example, an airline can now run queries to identify all its aircraft that have traveled across multiple hemispheres in the past 24 hours.

Enhanced Data Type Support: Easier Time-Series Analytics & Reporting
The MongoDB 3.0 aggregation pipeline offers a new $dateToString operator that simplifies report generation and grouping data by time interval. The operator formats the ISO Date type as a string with a user-supplied format, allowing developers to construct rich queries with less code.

Faster Issue Resolution: Enhanced Logging
Log analysis is a critical part of identifying issues and determining root cause. Now in MongoDB 3.0 developers, QA and operations staff have much greater control over the granularity of log messages and specific functional areas of the server to more precisely investigate issues.

Deploying Geo-Distributed, Datacenter-Aware Applications
Delivering a low latency experience to customers wherever they are located is a key design consideration for distributed systems. Using MongoDB’s native replica sets, copies (replicas) of the database can be deployed to sites physically closer to users, thereby reducing the effects of network latency. Reads can be issued with the nearestread preference, ensuring the query is served from the replica closest to the user, based on ping distance.

Pluggable Storage Engines: Extending MongoDB to New Applications

Multiple storage engines can co-exist within a single MongoDB replica set, making it easy to evaluate and migrate engines. Running multiple storage engines within a replica set can also simplify the process of managing the data lifecycle. For example as different storage engines for MongoDB are developed, it would be possible to create a mixed replica set configured in such a way that:

Operational data requiring low latency and high throughput performance is managed by replica set members using the WiredTiger or in-memory storage engine (currently experimental).

Replica set members configured with an HDFS storage engine expose the operational data to analytical processes running in a Hadoop cluster, which is executing interactive or batch operations rather than real time queries.

MongoDB replication automatically migrates data between primary and secondary replica set members, independent of their underlying storage format. This eliminates complex ETL tools that have traditionally been used to manage data movement.

MongoDB 3.0 ships with two supported storage engines:

The default MMAPv1 engine, an improved version of the engine used in prior MongoDB releases, now enhanced with collection level concurrency control.

The new WiredTiger storage engine. For many applications, WiredTiger's more granular concurrency control and native compression will provide significant benefits in the areas of lower storage costs, greater hardware utilization, higher throughput, and more predictable performance.

Higher Performance & Efficiency

Between 7x and 10x Greater Write Performance
MongoDB 3.0 provides more granular document-level concurrency control, delivering between 7x and 10x greater throughput for most write-intensive applications, while maintaining predictable low latency.

Compression: Up to 80% Reduction in Storage Costs
Despite data storage costs declining 30% to 40% per annum, overall storage expenses continue to escalate as data volumes double every 12 to 18 months. To make matters worse, improvements to storage bandwidth and latency are not keeping pace with data growth, making disk I/O a common bottleneck to scaling overall database performance.

Administrators have the flexibility to configure specific compression algorithms for collections, indexes and the journal, choosing between:

Snappy (the default library for documents and the journal), providing a good balance between high compression ratio – typically around 70%, depending on document data types – and low CPU overhead.

zlib, providing higher document and journal compression ratios for storage-intensive applications, at the expense of extra CPU overhead.

Prefix compression for indexes reducing the in-memory footprint of index storage by around 50% (workload dependent), freeing up more of the working set for frequently accessed documents.

Administrators can modify the default compression settings for all collections and indexes. Compression is also configurable on a per-collection and per-index basis during collection and index creation.

By introducing compression, operations teams get higher performance per node and reduced storage costs.

Creating Multi-Temperature Storage
Combining compression with MongoDB’s location-aware sharding, administrators can build highly efficient tiered storage models to support the data lifecycle. Administrators can balance query latency with storage density and cost by assigning data sets to specific storage devices. For example, consider an application where recent data needs to be accessed quickly, while for older data, latency is less of a priority than storage costs:

Recent, frequently accessed data can be assigned to high performance SSDs with Snappy compression enabled.

Older, less frequently accessed data is tagged to higher capacity, lower-throughput hard disk drives where it is compressed with zlib to attain maximum storage density and lower cost-per-bit.

MongoDB will automatically migrate data between storage tiers based on user-defined policies without administrators having to build tools or ETL processes to manage data movement.

Source : Mongo DB

Spreading Knol

Search This Blog

Thursday, May 21, 2015

New Features in Mongo DB 3.0

Pluggable Storage Engines: Extending MongoDB to New Applications

Higher Performance & Efficiency

Search This Blog

Thursday, May 21, 2015

New Features in Mongo DB 3.0

Pluggable Storage Engines: Extending MongoDB to New Applications

Higher Performance & Efficiency

Subscribe To Sundar's Blog