Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Azure DocumentDB – Understanding key features

Azure DocumentDB is a schema-less NoSQL database, which also supports the MongoDB wired protocol and tools including mongoose, mongochef, and others.

One of the key questions about Azure DocumentDB is what capabilities exposes. In this post, I focus the value proposition of Azure DocumentDB, and show you how it works for your recap.
You will find that it’s a globally and elastically scaled database, and you can take much benefit from this reliable database.

Here I show you the most sample using restful http raw for your understanding. But you can use DocumentDB SDK (Node.js, .NET, Java, etc), and it encapsulates so much difficulties (searching partitions, parallelism, etc) and will help you much.

Guarantee – SLA and Reserved Throughput

DocumentDB has SLA of 99.99% availability, and reserved throughput with less than 10ms on reads and 15ms on writes. These service level is completely transparent. (You don’t need to manage the details, and the database does.)

The performance level (reserved throughput) is determined when the DocumentDB collection is provisioned. Let’s see the following example.

Get endpoints from DocumentDB account

GET https://myaccount01.documents.azure.com/
x-ms-date: Tue, 06 Dec 2016 12:18:29 GMT
authorization: type%3dmas...
HTTP/1.1 200 OK
Content-Type: application/json

{
  "_self": "",
  "id": "myaccount01",
  "_rid": "myaccount01.documents.azure.com",
  "media": "//media/",
  "addresses": "//addresses/",
  "_dbs": "//dbs/",
  "writableLocations": [
    {
      "name": "East US",
      "databaseAccountEndpoint": "https://myaccount01-eastus.documents.azure.com:443/"
    }
  ],
  "readableLocations": [
    {
      "name": "East US",
      "databaseAccountEndpoint": "https://myaccount01-eastus.documents.azure.com:443/"
    }
  ],
  ...

}

Create a database using endpoint

POST https://myaccount01-eastus.documents.azure.com/dbs
x-ms-date: Tue, 06 Dec 2016 12:18:29 GMT
authorization: type%3dmas...
Accept: application/json

{"id":"db01"}
HTTP/1.1 201 Created
Content-Type: application/json

{
  "id": "db01",
  "_rid": "4Gt4AA==",
  "_self": "dbs/4Gt4AA==/",
  "_etag": ""00000900-0000-0000-0000-583698a10000"",
  "_colls": "colls/",
  "_users": "users/",
  "_ts": 1479973022
}

Create a collection with RUs (request units)

POST https://myaccount01-eastus.documents.azure.com/dbs/db01/colls
x-ms-offer-throughput: 10000
x-ms-date: Tue, 06 Dec 2016 12:18:29 GMT
authorization: type%3dmas...
Accept: application/json

{
  "id": "test01"
}
HTTP/1.1 201 Created
Content-Type: application/json

{
  "id": "test01",
  "indexingPolicy": {
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
      {
        "path": "/*",
        "indexes": [
          {
            "kind": "Range",
            "dataType": "Number",
            "precision": -1
          },
          {
            "kind": "Hash",
            "dataType": "String",
            "precision": 3
          }
        ]
      }
    ],
    "excludedPaths": [
      
    ]
  },
  "_rid": "6DpzAIPfiQA=",
  "_ts": 1480590914,
  "_self": "dbs/6DpzAA==/colls/6DpzAIPfiQA=/",
  "_etag": ""0000dc00-0000-0000-0000-5840064b0000"",
  "_docs": "docs/",
  "_sprocs": "sprocs/",
  "_triggers": "triggers/",
  "_udfs": "udfs/",
  "_conflicts": "conflicts/"
}

Note : The authorization request header value is the url-encoded string of “type=master&ver=1.0&sig={signature}” format. The steps of getting this value is :
1) Construct your payload string. This string depends on the x-ms-date or Date header value, so it expires soon.
2) Create base64 encoded string of HMAC with SHA256 algorithm using the previous payload.
3) Set this encoded string as signature (sig).
Please see the official document “DocumentDB – Access Control on DocumentDB Resources” for details. (If you’re using PHP, please refer my previous post “How to use Azure Storage without SDK” too.)

As you can see (see x-ms-offer-throughput request header), this example is assigning 10,000 request units (RUs) for this DocumentDB collection (named “test01”). Request unit (RU) is the measure of the DocumentDB request processing, and DocumentDB determines how much memory, CPU, and other resources are needed by the value of RUs. It affects to your fee of DocumentDB consumption.

Here I don’t describe the details about this request unit (RU), but this depends on the number of requests and required read/write transactions. And you can estimate required RUs using Request Units Calculator Site.
If you’re having the actual application and database, you can also estimate how much is consumed using x-ms-request-charge response header in CRUD request operations. For instance, the following request is consuming 3.08 RU in this single call.

POST https://myaccount01-eastus.documents.azure.com/dbs/db01/colls/test01/docs
x-ms-date: Tue, 06 Dec 2016 03:29:35 GMT
authorization: type%3dmas...
Accept: application/json
Content-Type: application/query+json

{"query":"SELECT * FROM root ... "}
HTTP/1.1 200 Ok
Content-Type: application/json
x-ms-request-charge: 3.08
...

{
  "_rid": "GZsc...",
  "Documents": [
    ...

  ],
  "_count": ...
}

The result from Request Units Calculator Site is just the reference data for your cost estimation, and it’s not perfect. For instance, you must also remember that the complexity of a query impacts the consumption of RUs. If you need to send the query with much complexity, you might need more high RUs.

Note : If a lot of heavy queries are sent and exceeds the reserved throughput per second, the exception will occur. In such a case, please check x-ms-retry-after-ms response header and retry after the specified time. (SDK does this internally.)

Partitioning

DocumentDB is having the distributed partitioning strategies. One collection can take multiple physical partitions, and documents are distributed by the partition key.

Let’s say that we set /divisionid as the partition key in the collection. The following is creating the DocumentDB collection (named “test01”) with this partition key.

POST https://myaccount01-eastus.documents.azure.com/dbs/db01/colls HTTP/1.1
authorization: type%3dmas...
Accept: application/json

{
  "id": "test01",
  "partitionKey": {
    "paths": [
      "/divisionid"
    ],
    "kind": "Hash"
  }
}
HTTP/1.1 201 Created
Content-Type: application/json

{
  "id": "test01",
  "indexingPolicy": {
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
      {
        "path": "/*",
        "indexes": [
          {
            "kind": "Range",
            "dataType": "Number",
            "precision": -1
          },
          {
            "kind": "Hash",
            "dataType": "String",
            "precision": 3
          }
        ]
      }
    ],
    "excludedPaths": [
      
    ]
  },
  "partitionKey": {
    "paths": [
      "/divisionid"
    ],
    "kind": "Hash"
  },
  "_rid": "GZscAJ56rgA=",
  "_ts": 1480675514,
  "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/",
  "_etag": ""00001202-0000-0000-0000-584150c60000"",
  "_docs": "docs/",
  "_sprocs": "sprocs/",
  "_triggers": "triggers/",
  "_udfs": "udfs/",
  "_conflicts": "conflicts/"
}

In this collection { "divisionid" : "div1", "name" : "engineering" } and { "divisionid" : "div1", "revenue" : 3000000 } resides in the same partition. (The documents with the same key are stored in the same partition.)

If you frequently pick up the documents using /divisionid, you can route your query to the appropriate partition.
Let’s see how to route into the partition. (If you’re using SDK, it’s done by SDK and you don’t need anything to do.)

First you get the information about key ranges of partitions from https://{your endpoint domain}.documents.azure.com/dbs/{your collection's uri fragment}/pkranges . In this example, “dbs/GZscAA==/colls/GZscAJ56rgA=/” (see _self property above) is the uri fragment of the collection.

GET https://myaccount01-eastus.documents.azure.com/dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges
authorization: type%3dmas...
Accept: application/json
HTTP/1.1 200 Ok
Content-Type: application/json

{
  "_rid": "GZscAJ56rgA=",
  "PartitionKeyRanges": [
    {
      "_rid": "GZscAJ56rgACAAAAAAAAUA==",
      "id": "0",
      "_etag": ""00001402-0000-0000-0000-584150c60000"",
      "minInclusive": "",
      "maxExclusive": "05C1A53DB92960",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgACAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675514
    },
    {
      "_rid": "GZscAJ56rgADAAAAAAAAUA==",
      "id": "1",
      "_etag": ""00001502-0000-0000-0000-584150c60000"",
      "minInclusive": "05C1A53DB92960",
      "maxExclusive": "05C1B53DB92960",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgADAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675514
    },
    {
      "_rid": "GZscAJ56rgAEAAAAAAAAUA==",
      "id": "2",
      "_etag": ""00001602-0000-0000-0000-584150c60000"",
      "minInclusive": "05C1B53DB92960",
      "maxExclusive": "05C1BF5D153D90",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAEAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675514
    },
    {
      "_rid": "GZscAJ56rgAFAAAAAAAAUA==",
      "id": "3",
      "_etag": ""00001702-0000-0000-0000-584150c60000"",
      "minInclusive": "05C1BF5D153D90",
      "maxExclusive": "05C1C53DB92960",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAFAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675514
    },
    {
      "_rid": "GZscAJ56rgAGAAAAAAAAUA==",
      "id": "4",
      "_etag": ""00001802-0000-0000-0000-584150c60000"",
      "minInclusive": "05C1C53DB92960",
      "maxExclusive": "05C1C9CD673378",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAGAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675514
    },
    {
      "_rid": "GZscAJ56rgAHAAAAAAAAUA==",
      "id": "5",
      "_etag": ""00001902-0000-0000-0000-584150c60000"",
      "minInclusive": "05C1C9CD673378",
      "maxExclusive": "05C1CF5D153D90",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAHAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675514
    },
    {
      "_rid": "GZscAJ56rgAIAAAAAAAAUA==",
      "id": "6",
      "_etag": ""00001a02-0000-0000-0000-584150c60000"",
      "minInclusive": "05C1CF5D153D90",
      "maxExclusive": "05C1D1F5E1A3D4",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAIAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAJAAAAAAAAUA==",
      "id": "7",
      "_etag": ""00001b02-0000-0000-0000-584150c60000"",
      "minInclusive": "05C1D1F5E1A3D4",
      "maxExclusive": "05C1D53DB92960",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAJAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAKAAAAAAAAUA==",
      "id": "8",
      "_etag": ""00001c02-0000-0000-0000-584150c60000"",
      "minInclusive": "05C1D53DB92960",
      "maxExclusive": "05C1D7858FADEC",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAKAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgALAAAAAAAAUA==",
      "id": "9",
      "_etag": ""00001d02-0000-0000-0000-584150c60000"",
      "minInclusive": "05C1D7858FADEC",
      "maxExclusive": "05C1D9CD673378",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgALAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAMAAAAAAAAUA==",
      "id": "10",
      "_etag": ""00001e02-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1D9CD673378",
      "maxExclusive": "05C1DD153DB904",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAMAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgANAAAAAAAAUA==",
      "id": "11",
      "_etag": ""00001f02-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1DD153DB904",
      "maxExclusive": "05C1DF5D153D90",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgANAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAOAAAAAAAAUA==",
      "id": "12",
      "_etag": ""00002002-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1DF5D153D90",
      "maxExclusive": "05C1E151F5E18E",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAOAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAPAAAAAAAAUA==",
      "id": "13",
      "_etag": ""00002102-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1E151F5E18E",
      "maxExclusive": "05C1E1F5E1A3D4",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAPAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAQAAAAAAAAUA==",
      "id": "14",
      "_etag": ""00002202-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1E1F5E1A3D4",
      "maxExclusive": "05C1E399CD671A",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAQAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgARAAAAAAAAUA==",
      "id": "15",
      "_etag": ""00002302-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1E399CD671A",
      "maxExclusive": "05C1E53DB92960",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgARAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgASAAAAAAAAUA==",
      "id": "16",
      "_etag": ""00002402-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1E53DB92960",
      "maxExclusive": "05C1E5E1A3EBA6",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgASAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgATAAAAAAAAUA==",
      "id": "17",
      "_etag": ""00002502-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1E5E1A3EBA6",
      "maxExclusive": "05C1E7858FADEC",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgATAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAUAAAAAAAAUA==",
      "id": "18",
      "_etag": ""00002602-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1E7858FADEC",
      "maxExclusive": "05C1E9297B7132",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAUAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAVAAAAAAAAUA==",
      "id": "19",
      "_etag": ""00002702-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1E9297B7132",
      "maxExclusive": "05C1E9CD673378",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAVAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAWAAAAAAAAUA==",
      "id": "20",
      "_etag": ""00002802-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1E9CD673378",
      "maxExclusive": "05C1EB7151F5BE",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAWAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAXAAAAAAAAUA==",
      "id": "21",
      "_etag": ""00002902-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1EB7151F5BE",
      "maxExclusive": "05C1ED153DB904",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAXAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAYAAAAAAAAUA==",
      "id": "22",
      "_etag": ""00002a02-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1ED153DB904",
      "maxExclusive": "05C1EDB9297B4A",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAYAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAZAAAAAAAAUA==",
      "id": "23",
      "_etag": ""00002b02-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1EDB9297B4A",
      "maxExclusive": "05C1EF5D153D90",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAZAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    },
    {
      "_rid": "GZscAJ56rgAaAAAAAAAAUA==",
      "id": "24",
      "_etag": ""00002c02-0000-0000-0000-584150c70000"",
      "minInclusive": "05C1EF5D153D90",
      "maxExclusive": "FF",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/pkranges/GZscAJ56rgAaAAAAAAAAUA==/",
      "throughputFraction": 0.04,
      "_ts": 1480675515
    }
  ],
  "_count": 25
}

Here’s 25 partitions. (The number of partitions is automatically determined by the storage size and RUs of the collection, and it’s not managed by your own.)
Each minInclusive and maxExclusive properties are the hex number and meaning the range of hash. In DocumentDB (server side), the hex binary encoding string of MurmurHash is used by the partitioning hash algorithm. For example, if the key value is “div200“, the computed partitioned key value is “05C1EDBFC1A70A“. (I uploaded the hash partitioning programming in “Github – tsmatsuz/DocumentDbPartitionResolveSample“.)
That is :

"05C1EDB9297B4A" 

As a result, this document (which is having the key “div200“) is deployed to the 23rd partition (id=”23”).

When you search documents using the partition key with “div200“, you send the query only to the 23rd partition as follows, and you don’t need to send to the others. The following “GZscAJ56rgA=,23” in the http request header means the 23rd partition.

Note : When you send query to the partitioned collections, the following “x-ms-documentdb-query-enablecrosspartition : True” is needed in the request header.

POST https://myaccount01-eastus.documents.azure.com/dbs/db01/colls/test01/docs
x-ms-continuation: 
x-ms-documentdb-isquery: True
x-ms-documentdb-query-enablecrosspartition: True
x-ms-documentdb-partitionkeyrangeid: GZscAJ56rgA=,23
authorization: type%3dmas...
Accept: application/json
Content-Type: application/query+json

{"query":"SELECT * FROM root WHERE (root["divisionid"] = "div200") "}
HTTP/1.1 200 Ok
Content-Type: application/json
x-ms-item-count: 3

{
  "_rid": "GZscAJ56rgA=",
  "Documents": [
    {
      "divisionid": "div200",
      "content": "cont200",
      "id": "75233879-4ebf-4c37-ac9d-a3b373ac4441",
      "_rid": "GZscAJ56rgAOAAAAAACADg==",
      "_self": "dbs/GZscAA==/colls/GZscAJ56rgA=/docs/GZscAJ56rgAOAAAAAACADg==/",
      "_etag": ""0001ee0b-0000-0000-0000-584152af0000"",
      "_attachments": "attachments/",
      "_ts": 1480676011
    },
    {
      "divisionid": "div200",
      ...

    },
    {
      "divisionid": "div200",
      ...

    }
  ],

  "_count": 3
}

Note that the partitioned collection is having not only pros (benefits) but also cons (caveat).
Let’s consider the case if you search the documents without partition key. In this case, you must send the query to all the partitions (25 partitions in this case), and there is a significant overhead. Then if your application is having various kinds of documents, it’s better that you divide these documents into the different collections which include the single partition collections and the partitioned collections, and set the appropriate key for each partitioned collection. (You must care which one should be the partition key.)

If you need to use the partitioned collection and need to search without partition key, it’s better to send the query in parallel. When using .NET SDK, you must specify the following MaxDegreeOfParallelism property.
Otherwise, you fall into the sequential search of so many partitions, and it would take the tremendous time to return the results despite of setting the higher performance level (RU). (Inside, each call meets the performance level, but the overhead of total calls exceeds so much.)
Especially if you’re using .NET SDK, it automatically determines whether the multiple calls are required by the LINQ query expression, but it doesn’t automatically change into the parallel calls. You remember that you must set the parallel calls by your own.

...
using Microsoft.Azure.Documents.Client;
...

var client = new DocumentClient(
  new Uri(
    "https://myaccount01.documents.azure.com:443/"),
    "RQCgm3a...");  // key

var query = client.CreateDocumentQuery(
  UriFactory.CreateDocumentCollectionUri("db01", "test01"),
  new FeedOptions
  {
    EnableCrossPartitionQuery = true,
    MaxDegreeOfParallelism = 10
  })
  .Where(p => p.UserName == "Tsuyoshi Matsuzaki");
var tsmatz = query.AsEnumerable().FirstOrDefault();
...

Note : You can also do this partitioning task in the client-side programming code. (In this case, you provision multiple collections instead of server-side partitions.) This is called client-side partitioning.
It makes you free to customize and apply any partitioning policy (range partitioning, lookup partitioning, etc) by your strategies, and SDK is having several helper classes for this client-side implementation. (The consistent hash ring by the MD5 hash is used in SDK by default.)

Replication

Another aspect of DocumentDB scalability is “replication” (replica). DocumentDB has not only the distributed strategies, but the globally distributed strategies.

Let’s see the DocumentDB management in Azure Portal.
If you click the global map, you can see the “replicate data globally blade” in the portal. To add the replication region, you just click and save the location in this blade. You can add or remove even if your collection is running !

http://i1155.photobucket.com/albums/p551/tsmatsuz/20161207_Portal_Replication_zpsakbzbldv.jpg

If you set as above, you can read (or query) documents from any 3 regions (“East US”, “Japan East”, and “West Europe”) of the client’s choice, and you can write documents to only the write region (“East US”).
You can get these endpoints from the restful api (or SDK) as follows.

GET https://myaccount01.documents.azure.com/
authorization: type%3dmas...
HTTP/1.1 200 Ok
Content-Type: application/json

{
  "_self": "",
  "id": "myaccount01",
  "_rid": "myaccount01.documents.azure.com",
  "media": "//media/",
  "addresses": "//addresses/",
  "_dbs": "//dbs/",
  "writableLocations": [
    {
      "name": "East US",
      "databaseAccountEndpoint": "https://myaccount01-eastus.documents.azure.com:443/"
    }
  ],
  "readableLocations": [
    {
      "name": "East US",
      "databaseAccountEndpoint": "https://myaccount01-eastus.documents.azure.com:443/"
    },
    {
      "name": "Japan East",
      "databaseAccountEndpoint": "https://myaccount01-japaneast.documents.azure.com:443/"
    },
    {
      "name": "West Europe",
      "databaseAccountEndpoint": "https://myaccount01-westeurope.documents.azure.com:443/"
    }
  ],
  "userReplicationPolicy": {
    "asyncReplication": false,
    "minReplicaSetSize": 3,
    "maxReplicasetSize": 4
  },
  "userConsistencyPolicy": {
    "defaultConsistencyLevel": "Session"
  },
  "systemReplicationPolicy": {
    "minReplicaSetSize": 3,
    "maxReplicasetSize": 4
  },
  "readPolicy": {
    "primaryReadCoefficient": 1,
    "secondaryReadCoefficient": 1
  },
  "queryEngineConfiguration": "{"maxSqlQueryInputLength":30720,...}"
}

These region is prioritize, and if you don’t specify any specific region using SDK, the region of the first priority (in this case, “East US”) is used.
You can change this priority by drag-and-drop in Azure Portal. (You can also change the write region by the manual failover operation.)

This global multi-region setting affects to the availability and performance.

First, DocumentDB is able to failover automatically (transparently) with these regions. DocumentDB is also having the local replication (4 replicas by default), and these replicas will work on any level of troubles.

Second, it’s performance impacts. let’s say that you’re serving the applications or services in the world-wide. All the partitions are replicated in all regions, and your customer can read the data from the nearby region with low network latency.

Note that if you consume 20,000 RUs for each 3 regions, the total 60,000 RUs are needed for the performance level. But, if the read operations is distributed to other regions, eventually each RUs in one region would be reduced. When using replication, you must carefully estimate the costs.

Note : This replication mechanism is not for the database backup (the same like other database). For instance, if you overwrite the data because of your mistakes, the wrong data is also replicated soon.
If you want to take backup, DocumentDB can hold your backup automatically (each 4 hours) in Azure blob storage. For more details, please see “Automatic online backup and restore with DocumentDB“. (For restoring database, you seems to need the support request.)

Indexing (Query optimization)

DocumentDB provides the rich query expression by SQL including joins, string functions (contains, concat, …), spatial functions, and user-defined functions (custom functions), etc. You can also use LINQ in C#, and the query by the expression tree is executed on the server side.

DocumentDB automatically indexes the documents, and this helps the high throughput of query in most cases. But, you remember that not all the query is covered by the default index settings, and the index designing is still important for detailed cases. Here I explain how it works and how to use the indexes.

Now let’s see the following rest example. This example is creating the database collection with index policy.

After this index policy is provisioned, when you insert the document { "divisionid" : "div001", "divisioninfo" : { "membercount" : 5, "name" : "engineering" }, "divisionloc" : "Japan" }, the indexes of /divisionid, /divisioninfo/membercount, and /divisioninfo/name will be created. When you search documents with these proerties, these indexes are used and improves the query performance.

The path property is the document path. When you use “?” (question) in path, it means that it’s the exact path. When you use “*” (asterisk), the all paths under the specified path are included.

POST https://myaccount01-eastus.documents.azure.com/dbs/db01/colls
authorization: type%3dmas...
Accept: application/json

{
  "id": "test01",
  "indexingPolicy": {
    "automatic": true,
    "indexingMode": "Consistent",
    "includedPaths": [
      {
        "path": "/divisionid/?",
        "indexes": [
          {
            "dataType": "String",
            "precision": -1,
            "kind": "Hash"
          }
        ]
      },
      {
        "path": "/divisioninfo/*",
        "indexes": [
          {
            "dataType": "String",
            "precision": -1,
            "kind": "Hash"
          },
          {
            "dataType": "Number",
            "precision": -1,
            "kind": "Range"
          }
        ]
      }
    ],
    "excludedPaths": [
      {
        "path": "/*"
      }
    ]
  }
}
HTTP/1.1 201 Created
Content-Type: application/json

{
  "id": "test01",
  "indexingPolicy": {
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
      {
        "path": "/divisionid/?",
        "indexes": [
          {
            "kind": "Hash",
            "dataType": "String",
            "precision": -1
          },
          {
            "kind": "Range",
            "dataType": "Number",
            "precision": -1
          }
        ]
      },
      {
        "path": "/divisioninfo/*",
        "indexes": [
          {
            "kind": "Hash",
            "dataType": "String",
            "precision": -1
          },
          {
            "kind": "Range",
            "dataType": "Number",
            "precision": -1
          }
        ]
      }
    ],
    "excludedPaths": [
      {
        "path": "/*"
      }
    ]
  },
  "_rid": "GZscAKjHAQ0=",
  "_ts": 1480945705,
  "_self": "dbs/GZscAA==/colls/GZscAKjHAQ0=/",
  "_etag": ""0000e300-0000-0000-0000-5845702d0000"",
  "_docs": "docs/",
  "_sprocs": "sprocs/",
  "_triggers": "triggers/",
  "_udfs": "udfs/",
  "_conflicts": "conflicts/"
}

The index kind must be “Hash“, “Range“, or “Spatial“. The hash index works when the equality query (=) is used. When you want indexes for the comparison (,

Note : In DocumentDB, all the document is having the id property (which is automatically assigned to all the documents). This id property is unique in one single partition and having the hash index.

In the previous example I defined the index policy manually, but by default DocumentDB indexes every path (/*) in a document tree, and the string properties are indexed as the hash index and numeric properties as the range index. (The following is the result of provisioned index policy by default.)

{
  "id": "test01",
  "indexingPolicy": {
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
      {
        "path": "/*",
        "indexes": [
          {
            "kind": "Range",
            "dataType": "Number",
            "precision": -1
          },
          {
            "kind": "Hash",
            "dataType": "String",
            "precision": 3
          }
        ]
      }
    ],
    "excludedPaths": [
      
    ]
  },
  "_rid": "GZscAIxwGQ0=",
  "_ts": 1480993718,
  "_self": "dbs/GZscAA==/colls/GZscAIxwGQ0=/",
  "_etag": ""0000e900-0000-0000-0000-58462bbc0000"",
  "_docs": "docs/",
  "_sprocs": "sprocs/",
  "_triggers": "triggers/",
  "_udfs": "udfs/",
  "_conflicts": "conflicts/"
}

When you set “lazy” as the index mode (indexingMode), the index is created asynchronously and it responds soon when the document is updates (create/update/delete). The default is “consistent“.

Note : You can also use Etag to avoid conflicts and keep the consistency of data.

The index precision is the byte of precision, and “-1” means the maximum precision.
For example, if you specify “7” bytes as index precision for the number property (8 bytes), you can reduce the index storage, but the query consumes more IOs (inputs and outputs).

As I described before, this understanding is important for your actual applications.
For example, when you have the date/time property and use this property in the query of timeline search, it’s better to store this value as the epoch time (numeric time format) and set the range index for this numeric property.
Indexing is transparent in most cases, but the designing is still important for detailed cases.

Consistency

When the data is updated, this updates will be applied to any replicas. Waiting all these updates affects to the performance of DocumentDB, and there exists the trade-off between the consistency and performance.
To meet your business needs, DocumentDB is having the several level of consistency.

You can use 4 types of consistency level, which is Strong, Bounded staleness, Session, and Eventual (left to right, getting relaxed), and  by default, the Session consistency is used.
You can specify x-ms-session-token request header (which is returned as the response header beforehand) in restful requests, and DocumentDB recognize the specific client session using this header. (When using SDK, this is done by SDK automatically.)
The Session consistency keeps the consistency of this client session.

On the other hands, “Strong” and “Bounded staleness” consistencies are concerned about the global consistency of replicas. I don’t describe the details of other consistency levels, and please refer the following resource for more details.

Consistency levels in DocumentDB
https://docs.microsoft.com/en-us/azure/documentdb/documentdb-consistency-levels

The consistency level is defined in the scope of database account, but you can apply to the specific requests using x-ms-consistency-level request header. (You can change the consistency level for each request.)

Note : When you handle some bunch of operations, you can also use the consistent transactional operations with server-side JavaScript stored procedures or triggers.

Share the post

Azure DocumentDB – Understanding key features

×

Subscribe to Msdn Blogs | Get The Latest Information, Insights, Announcements, And News From Microsoft Experts And Developers In The Msdn Blogs.

Get updates delivered right to your inbox!

Thank you for your subscription

×