Click here to Skip to main content
15,860,861 members
Articles / Database Development / MongoDB
Article

3 Best Practices for GUID data in MongoDB

5 May 2015CPOL6 min read 55.1K   6   1
Take a closer look at scenarios where working with GUID and UUID in a MongoDB environment becomes tricky. We will make you aware of those configurations and provide a set of best practices to follow.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

GUIDs are often being used to identify custom objects created in software. Software developers very often explicitly store those identifiers in the database and do not rely on identifiers generated by the database system.

MongoDB and the MongoDB drivers come with built-in support for the GUID/UUID data type. It is very easy and convenient to start using GUIDs/UUIDs immediately without having to review the DB specification. In most cases, you will most likely be fine. There are some scenarios however where you might run into unexpected issues that are very difficult to debug.

As soon as you go multi-language with your deployments, you might discover that your applications no longer work as expected, e.g. you might no longer be able to match orders to customers. It is also very likely that your data gets corrupted over time and it will be very difficult to recover from such a disaster without having to restore your backups (you do backups on a regular basis, don’t you?).

In this article, we will take a closer look at scenarios where working with GUIDs/UUIDs might become more complex. We will make you aware of those configurations and provide you with a set of best practices to follow.

What is a UUID? What is a GUID?

A Universally Unique IDentifier (UUID) is a unique number used as an identifier in computer software. Sometimes the term Globally Unique IDentifier (GUID) is used to describe it. A GUID is one of many implementations of the UUID standard.

UUIDs are 128-bit values and are usually displayed as 32 hexadecimal digits separated by hyphens, for example:

176BF052-4ECB-4E1D-B48F-F3523F7F277F

(The example above is a random Version 4 UUID)

MongoDB and GUID/UUID Support

MongoDB has built-in support for the GUID/UUID data type and most of the MongoDB drivers support GUIDs/UUIDs natively. MongoDB itself stores them as Binary fields and when such Binary fields are accessed from software, MongoDB drivers usually convert their value as found in the database to language-specific GUID or UUID objects. For example, when you read a UUID from a MongoDB database using the C# driver, an object of type System.GUID will be returned. The Java driver on the other hand will return an object of type java.util.UUID.

This makes it very convenient and easy to work with the GUID/UUID data type from your application code.

Yes, there is a catch

However, you might run into problems when working with UUIDs in multi-platform/multi-language environments because accessing your database from different programming languages using different MongoDB drivers might not always be safe, as we will demonstrate below. Furthermore, you should be careful when working with UUID data types in third-party MongoDB management software because only a few MongoDB tools take due care in handling UUID data types. MongoChef is an example of a MongoDB GUI that does so, offering different options to ensure that UUIDs are interpreted, written and updated correctly.

MongoDB drivers usually convert UUIDs between their Binary database representation and the language specific UUID data type. Initially the encoding method was platform specific and wasn’t consistently implemented across different MongoDB drivers.

Example:

An example GUID created and displayed in a C# application would be shown as follows:

00010203-0405-0607-0809-0A0B0C0D0E0F

The MongoDB C# driver (in the default configuration) will convert it to a Binary database representation and store it with the following byte order:

03 02 01 00 05 04 07 06 08 09 0A 0B 0C 0D 0E 0F

However, the Java driver used to access the same field will return and display the following:

06070405-0001-0203-0F0E-0D0C0B0A0908

The Python driver will display something different again:

03020100-0504-0706-0809-0A0B0C0D0E0F

You will notice the representation of the same UUID differs depending on the platform used to access it!

As long as you are not editing those UUIDs on both ends, you should not experience any serious problems (besides confusion when analysing your data). However, if one day you do edit a UUID, you will very likely run into issues that can be very difficult to locate and debug because you will essentially be dealing with scrambled data.

Binary subtypes 0x03 and 0x04

While reviewing the BSON specification, you will notice two Binary subtypes assigned to the UUID data type:

Image 1

Originally, only the 0x03 subtype was used to designate UUIDs. To address the portability issues in multi-platform environments that arose from the language specific UUID implementations, the MongoDB designers later introduced the new 0x04 subtype.

The byte order used to store UUIDs as Binary fields with the 0x04 subtype is now consistently implemented across all MongoDB drivers and the legacy 0x03 subtype is being supported for legacy data only.

Using the Binary subtype 0x04 when working with UUIDs guarantees that you will be able to access your data from any platform and any programming language without these compatibility issues.

Best practices

1. Make sure you always know what the UUID subtype is that your applications are using

You can find out the subtype in use by looking at the JSON representation of your existing data (e.g. in the mongo shell):

Example:

BinData(3,"B0AFBAMCAQAPDg0MCwoJgA==")
    stands for a UUID stored as Binary field with the legacy 0x03 subtype

BinData(4,"SDcTgfx0SOq0Cl7DMRAzDQ==")
    stands for a UUID stored as a Binary field with the 0x04 subtype

You can also use your MongoDB GUI with support for legacy UUIDs to access this information. MongoChef annotates regular UUIDs as Binary - UUID and for legacy UUIDs it lets you know which encoding it currently uses (e.g. Java, C# or none):

Image 2

2. Configure your drivers to work with subtype 0x04 for new deployments

MongoDB drivers usually store UUIDs as Binary fields with the legacy 0x03 subtype assigned by default. This configuration can be changed:

C#:

You can override the driver’s default settings and configure it to use the Binary 0x04 subtype by modifying the value of BsonDefaults.GuidRepresentation:

BsonDefaults.GuidRepresentation = GuidRepresentation.Standard;

You can also modify GuidRepresentation at the server, database and collection level.

Python:

You can configure the behavior of your Python driver. Read more about the uuid_subtype attribute here.

Java:

Support for UUID configuration will be added in the 3.0 release of the MongoDB driver. With current (pre 3.0) drivers you will have to perform the conversion yourself:

Java
/**
 * Convert a UUID object to a Binary with a subtype 0x04
 */
public static Binary toStandardBinaryUUID(java.util.UUID uuid) {
    long msb = uuid.getMostSignificantBits();
    long lsb = uuid.getLeastSignificantBits();

    byte[] uuidBytes = new byte[16];

    for (int i = 15; i >= 8; i--) {
        uuidBytes[i] = (byte) (lsb & 0xFFL);
        lsb >>= 8;
    }

    for (int i = 7; i >= 0; i--) {
        uuidBytes[i] = (byte) (msb & 0xFFL);
        msb >>= 8;
    }

    return new Binary((byte) 0x04, uuidBytes);
}

/**
 * Convert a Binary with a subtype 0x04 to a UUID object
 * Please note: the subtype is not being checked.
 */
public static UUID fromStandardBinaryUUID(Binary binary) {
    long msb = 0;
    long lsb = 0;
    byte[] uuidBytes = binary.getData();

    for (int i = 8; i < 16; i++) {
        lsb <<= 8;
        lsb |= uuidBytes[i] & 0xFFL;
    }

    for (int i = 0; i < 8; i++) {
        msb <<= 8;
        msb |= uuidBytes[i] & 0xFFL;
    }

    return new UUID(msb, lsb);
}

3. Configure your MongoDB GUI to handle legacy UUIDs with subtype 0x03

Only a few MongoDB GUI tools let you specify how the legacy UUID fields are to be handled.

MongoChef has built-in support for both the 0x03 and 0x04 subtypes. You can configure the behavior in the Properties dialog and select from the following options:

  • Legacy .NET / C# Encoding
    • the default encoding used by the .NET/C# driver: read, store and display the UUID the same way a .NET application would do
  • Legacy Java Encoding
    • the default encoding used by the Java driver: read, store and display the UUID the same way a Java application would do
  • Legacy Python Encoding
    • the default encoding used by the Python driver: read, store and display the UUID the same way a Python application would do
  • No Encoding / Raw Data
    • no conversion is being performed, the data is read, store and displayed in the existing byte order

Image 3

Summary

Make sure your applications do not suffer from unnecessary portability issues when working with the UUID data type and avoid the legacy Binary 0x03 subtype whenever possible. If you can’t migrate your existing deployment to the new Binary 0x04 subtype, remember to configure your MongoDB tools to the correct encoding for the legacy Binary 0x03 subtype.

About the Author

Tomasz is a software developer and a co-founder of 3T Software Labs where he and his team are building MongoDB developer tools: 3T MongoChef, 3T Data Compare & Sync, and 3T Schema Explorer & Documentation. You’ll find more information about 3T and their MongoDB GUI tools at http://3t.io/mongochef/

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United Kingdom United Kingdom
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- No messages could be retrieved (timeout) --