Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

How to use the AWS Lambda streaming response type

(Read this article on the blog)

Lambda response type

Lambda recently added support for streaming responses that lets functions send data to clients while they are still running. Originally, the Response the function returned was sent to clients after it was available fully. This was called the “buffered” mode as the Lambda runtime buffers the response.

Buffered response

To see how it can increase latency, consider what happens if a function reads and returns an object stored in S3. In the buffered case, it needs to read the contents fully, return to the Lambda runtime, and only then the client can get the bytes.

import { S3Client, GetObjectCommand } from "@aws-sdk/client-s3";
import { buffer } from "node:stream/consumers";

export const handler = async (event) => {
	const client = new S3Client();
	const res = await client.send(new GetObjectCommand({
		Bucket: process.env.Bucket,
		Key: file,
	}));
	return {
		statusCode: 200,
		headers: {
			"Content-Type": res.ContentType,
		},
		body: (await buffer(res.Body)).toString("base64"),
		isBase64Encoded: true,
	};
}

Streaming response

Streaming response is different as it allows the client to get the response as it becomes available to the function itself. With the above S3 example, when the function receives the first bytes it can push them without waiting for the the whole file.

import { S3Client, GetObjectCommand } from "@aws-sdk/client-s3";
import {pipeline} from "node:stream/promises";

export const handler = awslambda.streamifyResponse(async (event, responseStream) => {
	const client = new S3Client();
	const res = await client.send(new GetObjectCommand({
		Bucket: process.env.Bucket,
		Key: file,
	}));

	await pipeline(res.Body, awslambda.HttpResponseStream.from(responseStream, {
		statusCode: 200,
		headers: {
			"Content-Type": res.ContentType,
		},
	}));
});

Streaming vs buffering

This setup provides multiple benefits over the buffered approach.

First, streaming means the whole file does not need to fit into the memory of the Lambda function. As bytes arrive from the GetObject operation those bytes are pushed to the result stream and after that can be discarded by the function.

Second, reading and writing are interleaved in time, providing shorter response times. Instead of waiting for the full file to download once for the Lambda function and then to the client, the two processes happen in parallel.

Third, the client gets the first bytes sooner. This is especially important when it can do something useful with partial responses. For example, an image might be shown in a lower resolution while the file is still downloading, which is a better user experience.

And finally, the Lambda runtime provides a higher limit for the response size in this mode. While a buffered response can only go up to 6 MB, streaming supports up to 20 MB.

Implementation

To use streaming responses, use the awslambda.streamifyResponse function available in the Node runtime to wrap the handler function. This exposes the responseStream value:

export const handler = awslambda.streamifyResponse(async (event, responseStream) => {
	// ...
});

Next, the function URL needs to know in which mode to call the function:

resource "aws_lambda_function_url" "streaming" {
	function_name      = aws_lambda_function.streaming.function_name
	authorization_type = "NONE"
	invoke_mode = "RESPONSE_STREAM"
}

Headers and cookies

The buffered response type supports returning arbitrary headers and cookies in the response. This is supported with streaming responses as well by another utility function awslambda.HttpResponseStream.from(responseStream, metadata):

await pipeline(res.Body, awslambda.HttpResponseStream.from(responseStream, {
	statusCode: 200,
	headers: {
		"Cache-control": "no-store",
		"Content-Type": res.ContentType,
	},
	cookies: ["testcookie=value"],
}));

The awslambda.HttpResponseStream.from returns another writeable stream that needs to be used instead of the wrapped one.

The above code provides a generic structure to define the response. The res.Body is a readable stream that is piped into the response stream, and the status code, headers, and cookies are defined in the metadata parameter.

Tests

Let’s run a couple of tests to see what speedup streaming provides over buffering!

For this, I’ll use 3 test files:

  • small: 0.2 MB
  • medium: 4.4 MB
  • large: 13.8 MB

For each of these files, I’ll measure the total response time using the autocannon library and the time to first byte using ttfb.

Small (0.2 MB)

Buffered:

npx autocannon -c 1 $(terraform output -raw small_buffered)
┌─────────┬────────┬────────┬────────┬────────┬───────────┬───────────┬────────┐
│ Stat    │ 2.5%   │ 50%    │ 97.5%  │ 99%    │ Avg       │ Stdev     │ Max    │
├─────────┼────────┼────────┼────────┼────────┼───────────┼───────────┼────────┤
│ Latency │ 228 ms │ 302 ms │ 649 ms │ 649 ms │ 343.21 ms │ 102.41 ms │ 649 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴───────────┴────────┘

Streaming:

npx autocannon -c 1 $(terraform output -raw small_streaming)
┌─────────┬────────┬────────┬─────────┬─────────┬───────────┬───────────┬─────────┐
│ Stat    │ 2.5%   │ 50%    │ 97.5%   │ 99%     │ Avg       │ Stdev     │ Max     │
├─────────┼────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│ Latency │ 196 ms │ 238 ms │ 1331 ms │ 1331 ms │ 299.25 ms │ 217.79 ms │ 1331 ms │
└─────────┴────────┴────────┴─────────┴─────────┴───────────┴───────────┴─────────┘

A second run, apparently with less deviation:

┌─────────┬────────┬────────┬────────┬────────┬───────────┬──────────┬────────┐
│ Stat    │ 2.5%   │ 50%    │ 97.5%  │ 99%    │ Avg       │ Stdev    │ Max    │
├─────────┼────────┼────────┼────────┼────────┼───────────┼──────────┼────────┤
│ Latency │ 168 ms │ 205 ms │ 387 ms │ 415 ms │ 218.83 ms │ 52.07 ms │ 415 ms │
└─────────┴────────┴────────┴────────┴────────┴───────────┴──────────┴────────┘

The numbers are a bit smaller compared to the buffered case, but I did not find much difference between the two.

TTFB:

npx ttfb -n 10 $(terraform output -raw small_buffered) $(terraform output -raw small_streaming)
...  fastest .156445 slowest .303719 median .206308
...  fastest .118235 slowest .179110 median .151772

Again, the numbers show a slight improvement compared to the buffered case, but not substantial.

Medium (4.4 MB)

Buffered:

npx autocannon -c 1 $(terraform output -raw medium_buffered)
┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬───────┬─────────┐
│ Stat    │ 2.5%    │ 50%     │ 97.5%   │ 99%     │ Avg     │ Stdev │ Max     │
├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼───────┼─────────┤
│ Latency │ 3890 ms │ 3890 ms │ 4036 ms │ 4036 ms │ 3963 ms │ 73 ms │ 4036 ms │
└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴───────┴─────────┘

Streaming:

npx autocannon -c 1 $(terraform output -raw medium_streaming)
┌─────────┬─────────┬─────────┬─────────┬─────────┬───────────┬──────────┬─────────┐
│ Stat    │ 2.5%    │ 50%     │ 97.5%   │ 99%     │ Avg       │ Stdev    │ Max     │
├─────────┼─────────┼─────────┼─────────┼─────────┼───────────┼──────────┼─────────┤
│ Latency │ 1658 ms │ 1766 ms │ 2600 ms │ 2600 ms │ 2039.5 ms │ 368.6 ms │ 2600 ms │
└─────────┴─────────┴─────────┴─────────┴─────────┴───────────┴──────────┴─────────┘

This time the difference is very much noticable: even the fastest of the buffered case is slower than the slowest of the streaming ones.

TTFB:

npx ttfb -n 10 $(terraform output -raw medium_buffered) $(terraform output -raw medium_streaming)
...  fastest 2.09754 slowest 4.14310 median 2.490555
...  fastest .132218 slowest .687806 median .145763

As expected, the TTFB did not increase with the file size in the streaming case. On the buffered case, the delay is very noticable.

Large (13.8 MB)

Since the buffered version does not allow returning this file, there is no point comparing the response times, so I just ran the TTFB test.

npx ttfb -n 10 $(terraform output -raw large_streaming)
fastest .122149 slowest .183837 median .160531

Slow downloads

A Lambda function is sensitive to runtime both in terms of hitting a timeout and that a slow function is more expensive than quick ones. This prompted the question: what happens if the client that is consuming the response is slow? Would the function be stuck waiting for the output be consumed by the client?

Fortunately, no. The Lambda runtime does buffering in this case as well and the transfer time of the client does not affect the function execution.

In this case the download took more than a minute and still finished successfully, even though the function execution timeout was set to 30 seconds:

And the function’s execution did not take longer than in other cases:



This post first appeared on Blog - Advanced Web Machinery, please read the originial post: here

Share the post

How to use the AWS Lambda streaming response type

×

Subscribe to Blog - Advanced Web Machinery

Get updates delivered right to your inbox!

Thank you for your subscription

×