Recently, Yan Cui wrote an enlightening blogpost about using keep-alive HTTP connections to significantly speed up DynamoDB operations. He gave an example of how to do it in NodeJS. I was curious how to do it in Python.
To my surprise, I found out I did not have to do anything at all. DynamoDB keeps the connection open. See for yourself – using the CLI, run aws dynamodb list-tables --debug
. Notice the response headers section, which looks something like this:
Response headers: {'Server': 'Server', 'Date': 'Thu, 07 Mar 2019 19:42:55 GMT', 'Content-Type': 'application/x-amz-json-1.0', 'Content-Length': '328', 'Connection': 'keep-alive', 'x-amzn-RequestId': '38N9IJV176MACH027DNIRT5C53VV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '2150813651'}
The Connection: keep-alive
header is set by DynamoDB. Unless it’s explicitly set to close, the connection will stay open. Yet this is exactly what NodeJS does. Thank you to Stefano Buliani for providing additional visibility into this. This behaviour is inherited by the aws-js-sdk. I think that’s a mistake so I’ve opened a bug in the GitHub repo. Until then, if you’re writing code in JS, be sure to follow Yan’s recommendation.
Connection: keep-alive vs. close in Python
I was still curious if I could replicate Yan’s findings in Python. Here’s a log of running a single putItem
operation using vanilla boto3 DynamoDB client:
Except for the first one, most of them are sub 10 ms, since the connection is kept open.
However, when I explicitly did add the Connection: close
header, things looked a lot different:
Operations took at least 50 ms, often longer. This is in line with Yan’s findings.
Granted, my approach was not very rigorous. For the sake of replicability, here’s the code I used. Feel free to run your own experiments and let me know what you found.