Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

The surprising properties of DynamoDB pagination

(Read this article on the blog)

DynamoDB uses token-based pagination. This means when a query or a scan operation potentially return more results then the result contains a LastEvaluatedKey field with some value. To fetch the next page, pass that value as the ExclusiveStartKey in a separate query. Do this until the LastEvaluatedKey is undefined, and you can be sure you fetched all items.

How it works is logical and simple. But when you start using other DynamoDB features, namely the Limit and the FilterExpression for queries and scans, things become more interesting.

In this article, we’ll look into a specific case where at first the results do not make much sense. Then we’ll look into how these features work and see why this is a feature of DynamoDB and not a bug.

The base query

Let’s run this DynamoDB query and inspect the results:

import { DynamoDBClient, QueryCommand } from "@aws-sdk/client-dynamodb";

const client = new DynamoDBClient();
const command = new QueryCommand({
	TableName: process.env.TABLE,
	IndexName: "groupId",
	KeyConditionExpression: "#groupId = :groupId",
	ExpressionAttributeNames: {"#groupId": "group_id", "#status": "status"},
	ExpressionAttributeValues: {
		":groupId": {S: "group1"},
		":status": {S: "ACTIVE"},
	},
	ScanIndexForward: false,
	FilterExpression: "#status = :status",
});
console.log(await client.send(command));

This returns a single item with some additional metadata:

{
  '$metadata': {
    "..."
  },
  Items: [
    {
      "..."
    }
  ],
  LastEvaluatedKey: undefined,
}

Resultsgroup_id (PK)last_active (SK)statusidgroup12022-01-01ACTIVEuser3

Limits

Let’s see what happens if we add different Limit arguments to the query!

Limit: 1

const command = new QueryCommand({
	TableName: process.env.TABLE,
	IndexName: "groupId",
	KeyConditionExpression: "#groupId = :groupId",
	ExpressionAttributeNames: {"#groupId": "group_id", "#status": "status"},
	ExpressionAttributeValues: {
		":groupId": {S: "group1"},
		":status": {S: "ACTIVE"},
	},
	ScanIndexForward: false,
	FilterExpression: "#status = :status",
	// Add a limit
	Limit: 1,
});

The result now has zero items!

{
	"Count": 0,
	"Items": [],
	"LastEvaluatedKey": {
		"..."
	},
	"ScannedCount": 1
}

Even when fetching the next page, there are no items, but a different LastEvaluatedKey. Only the third page returns an item.

Results1group_id (PK)last_active (SK)statusid...LastEvaluatedKey1Results2group_id (PK)last_active (SK)statusid...LastEvaluatedKey2ExclusiveStartKey:LastEvaluatedKey1Results3group_id (PK)last_active (SK)statusidgroup12022-01-01ACTIVEuser3...LastEvaluatedKey3ExclusiveStartKey:LastEvaluatedKey2Results4group_id (PK)last_active (SK)statusidExclusiveStartKey:LastEvaluatedKey3

Stranger still, the third query returns a pagination token, but the fourth query returns zero results.

Limit: 2

const command = new QueryCommand({
// ...
	Limit: 2,
});

No items, but a LastEvaluatedKey:

{
	"Count": 0,
	"Items": [],
	"LastEvaluatedKey": {
		"..."
	},
	"ScannedCount": 2
}

Results1group_id (PK)last_active (SK)statusid...LastEvaluatedKey1Results2group_id (PK)last_active (SK)statusidgroup12022-01-01ACTIVEuser3ExclusiveStartKey:LastEvaluatedKey1

Limit: 3

const command = new QueryCommand({
// ...
	Limit: 3,
});

We have a result!

{
	"Count": 1,
	"Items": [{
		"..."
	}],
	"LastEvaluatedKey": {
		"..."
	},
	"ScannedCount": 3
}

Repeating the query with the pagination token returns no items and no LastEvaluatedKey either:

{
	"Count": 0,
	"Items": [],
	"ScannedCount": 0
}

Results1group_id (PK)last_active (SK)statusidgroup12022-01-01ACTIVEuser3...LastEvaluatedKey1Results2group_id (PK)last_active (SK)statusidExclusiveStartKey:LastEvaluatedKey1

Limit: 4

const command = new QueryCommand({
// ...
	Limit: 4,
});

We have a result!

{
	"Count": 1,
	"Items": [{
		"..."
	}],
	"ScannedCount": 3
}

With this limit we came back to the original response: 1 item, no pagination token.

Results1group_id (PK)last_active (SK)statusidgroup12022-01-01ACTIVEuser3

Seeing these results, there are several pressing questions we can ask. Why the matching item not in the result when the Limit: 1? Why is there a pagination token when the response already contains all items, such as in the Limit: 3 scenario?

Investigation

First, let’s see what items are in the table!

Users tablegroup_id (PK)last_active (SK)statusidgroup12022-06-15INACTIVEuser1group22022-06-16INACTIVEuser2group12022-01-01ACTIVEuser3group12022-01-02INACTIVEuser4

The users are stored with a group_id and a status field. There is also a last_active and that is the sort key. This defines the ordering of the items.

The query expression defines the group_id, so the results only contain items where the group is group1. This is why all the returned users are in this group.

Then the filter expression makes sure that only active users are returned.

For the first query, DynamoDB fetches a page of users that match the query expression. This means all three users are fetched, then the filter expression drops inactive ones.

Fetchgroup_id (PK)last_active (SK)statusidgroup12022-06-15INACTIVEuser1group12022-01-02INACTIVEuser4group12022-01-01ACTIVEuser3Filtergroup_id (PK)last_active (SK)statusidgroup12022-01-01ACTIVEuser3

This is why this query returns 1 item and no pagination token: DynamoDB knows there are no more users, as it reached the end of the table.

Limit: 1

When the Limit is 1, DynamoDB fetches only 1 item. The query expression is effective before the fetch, it will still skip users not in group1. But since the filter expressions runs after, the result will have zero items and a pagination token.

Fetchgroup_id (PK)last_active (SK)statusidgroup12022-06-15INACTIVEuser1...LastEvaluatedKey1Filtergroup_id (PK)last_active (SK)statusid

Limit: 2

When Limit is 2, it works almost the same. The only difference is that DynamoDB fetches 2 items, then drops both.

Fetchgroup_id (PK)last_active (SK)statusidgroup12022-06-15INACTIVEuser1group12022-01-02INACTIVEuser4...LastEvaluatedKey1Filtergroup_id (PK)last_active (SK)statusid

Limit: 3

When the Limit is 3, things change a bit. DynamoDB gets 3 items, but the last one is ACTIVE. This means the filter expression drops only 2 items and returns 1. But since the query did not reach the end of the table, there will be a pagination token.

The next query returns zero results, but also no pagination token. This means DynamoDB reached the end of the table.

Fetch1group_id (PK)last_active (SK)statusidgroup12022-06-15INACTIVEuser1group1



This post first appeared on Blog - Advanced Web Machinery, please read the originial post: here

Share the post

The surprising properties of DynamoDB pagination

×

Subscribe to Blog - Advanced Web Machinery

Get updates delivered right to your inbox!

Thank you for your subscription

×