DynamoDB, NodeJS vs. Python and persistent connections

Recently, Yan Cui wrote an enlightening blogpost about using keep-alive HTTP connections to significantly speed up DynamoDB operations. He gave an example of how to do it in NodeJS. I was curious how to do it in Python.

To my surprise, I found out I did not have to do anything at all. DynamoDB keeps the connection open. See for yourself – using the CLI, run aws dynamodb list-tables --debug. Notice the response headers section, which looks something like this:

 Response headers:
 {'Server': 'Server', 
  'Date': 'Thu, 07 Mar 2019 19:42:55 GMT', 
  'Content-Type': 'application/x-amz-json-1.0', 
  'Content-Length': '328', 
  'Connection': 'keep-alive', 
  'x-amzn-RequestId': '38N9IJV176MACH027DNIRT5C53VV4KQNSO5AEMVJF66Q9ASUAAJG', 
  'x-amz-crc32': '2150813651'}

The Connection: keep-alive header is set by DynamoDB. Unless it’s explicitly set to close, the connection will stay open. Yet this is exactly what NodeJS does. Thank you to Stefano Buliani for providing additional visibility into this. This behaviour is inherited by the aws-js-sdk. I think that’s a mistake so I’ve opened a bug in the GitHub repo. Until then, if you’re writing code in JS, be sure to follow Yan’s recommendation.

Connection: keep-alive vs. close in Python

I was still curious if  I could replicate Yan’s findings in Python. Here’s a log of running a single putItem operation using vanilla boto3 DynamoDB client:

boto3-dynamodb-default

Except for the first one, most of them are sub 10 ms, since the connection is kept open.

However, when I explicitly did add the Connection: close header, things looked a lot different:

boto3-dynamodb-connection-close

Operations took at least 50 ms, often longer. This is in line with Yan’s findings.

Granted, my approach was not very rigorous. For the sake of replicability, here’s the code I used. Feel free to run your own experiments and let me know what you found.

Unit testing AWS services in Python

Consider the following piece of code:

import boto3
Table = boto3.resource('dynamodb').Table('foo')
def get_user(user_id):
ddb_response = Table.get_item(Key={'id': user_id})
return ddb_response.get('Item')
view raw models.py hosted with ❤ by GitHub

It’s a contrived example that just reads an item of data from a DynamoDB table. How would you write a unit test for the get_user function?

My favourite way to do so is to combine pytest fixtures and botocore’s Stubber:

from botocore import Stubber, ANY
import pytest
import models
@pytest.fixture(scope="function")
def ddb_stubber():
ddb_stubber = Stubber(models.Table.meta.client)
ddb_stubber.activate()
yield ddb_stubber
ddb_stubber.deactivate()
def test_user_exists(ddb_stubber):
user_id = 'user123'
get_item_params = {'TableName': ANY,
'Key': {'id': user_id}}
get_item_response = {'Item': {'id': {'S': user_id},
'name': {'S': 'Spam'}}}
ddb_stubber.add_response('get_item', get_item_response, get_item_params)
result = main.get_user(user_id)
assert result.get('id') == user_id
ddb_stubber.assert_no_pending_responses()
def test_user_missing(ddb_stubber):
user_id = 'user123'
get_item_params = {'TableName': ANY,
'Key': {'id': user_id}}
get_item_response = {}
ddb_stubber.add_response('get_item', get_item_response, get_item_params)
result = main.get_user(user_id)
assert result is None
ddb_stubber.assert_no_pending_responses()
view raw test_models.py hosted with ❤ by GitHub

There’s couple of things to note here.

First, I’m using the wonderful scope functionality of pytest fixtures. This allows me to create a new fixture per every test function execution. It is necessary for Stubber to work correctly.

The Stubber needs to be created with the correct client. Since I’m using a DynamoDB Table instance in models.py, I have to access its client when creating the Stubber instance.

Notice also the “verbose” get_item_response structure in the first test. That’s because of how the DynamoDB client interacts with DynamoDB API (needless to say, this is DynamoDB specific). The Table is a layer of abstraction on top of this, it converts between DynamoDB types and Python types. However it still uses the client underneath, so it expects this structure nevertheless.

Finally, it’s good practice to call assert_no_pending_response to make sure the tested code actually did make the call to an AWS service.

I really like this combination of pytest and Stubber. It’s a great match for writing correct and compact tests.