Here’s a short recipe of how to transmit files from an external source to an S3 bucket, without downloading the whole source and hence unnecessarily allocating memory:
It’s taking advantage of request’s stream capability.
Even with files over 2 GB in size, the Lambda container consumed only about 120 MB of memory. Pretty sweet. Of course, this approach is applicable to any platform, not just Lambda.
In my day job, we’re using Lambda and Step Functions to create data processing pipelines. This combo works great for a lot of our use cases. However for some specific long running tasks (e.g. web scrapers), we “outsource” the computing from Lambda to Fargate.
This poses an issue – how to plug that part of the pipeline to the Step Function orchestrating it. Using an Activity does not work when the processing is distributed among multiple workers.
A solution I came up with is creating a gatekeeper loop in the Step Function to oversee the progress of the workers by a Lambda function. This is how in looks:
The gatekeeper function (triggered by the GatekeeperState) checks, if external workers have finished yet. This can be done by waiting until an SQS queue is empty, counting the number of objects in an S3 bucket or any other way indicating that the processing can move onto the next state.
If the processing is not done yet, the gatekeeper function raises a
NotReadyError. This is caught by the
Retry block in the Step Function, pausing the execution of a certain period of time, as defined by its parameters. Afterwards, the gatekeeper is called again.
Eventually, if the work is not done even after
MaxAttempts retries, the ForceGatekeeperState is triggered. It adds a
"force: true" parameter to the invocation event and calls the gatekeeper right back again. Notice that the gatekeeper function checks for this
force parameter as the very first thing when executed. Since it’s present from the ForceGatekeeperState, it returns immediately and the Step Function moves on to the DoneState.
For our use case, it was better to have partial results than no results at all. That’s why the ForceGatekeeperState is present. You can also leave it out altogether and have the Step Function execution fail after
MaxAttempt retries of the gatekeeper.
The default way of creating a zip package that’s to be deployed to AWS Lambda is to place everything – your source code and any libraries you are using – in the service root directory and compress it. I don’t like this approach as, due to the flat hierarchy it can lead to naming conflicts, it is harder to manage packaging of isolated functions and it creates a mess in the source directory.
What I do instead is install all dependencies into a
lib directory (which is as simple as
pip install -r requirements.txt -t lib step in the deployment pipeline) and set the
PYTHONPATH environment variable to
/var/runtime:/var/task/lib when deploying the Lambda functions.
This works because the zip package is extracted into
/var/task in the Lambda container. While it might seem as an unstable solution, I’ve been using this for over a year now without any problems.
This is a pre-commit hook I use in my Python projects.
Nevermind my feak bash-fu, in the end the script does what I want it to – the three following things:
- First, it checks if I haven’t forgotten to add a new module to the requirements.txt file. Most of the time this works like a charm with virtualenv and pip. The only drawback is installing modules in local experimental branches – these modules are not necessary in upstream branches and so they don’t belong to requirements.txt yet. When you switch back and want to commit in an upstream branch, the pre-commit hook fails. However, this is easily avoidable by using the
--no-verifyoption of git commit.
- Second, it runs pyflakes on all the .py files in the repository. If there’s something pyflakes doesn’t like, the pre-commit hook fails and shows the output of pyflakes. There’s one case which is ignored and that is using the _ (underscore) function from the gettext module as install makes it available everywhere. Pyflakes documentation is non-existent and I guess there’s no way to create a configuration profile for it, so I had to resort to this hack.
- Finally, since I deployed code with forgotten
set_trace()calls a couple of times, the third thing the script does is it checks for these and prints out the file and line number if it encounters any.
I keep this file as a part of the repository, making a symbolic link to it in .git/hooks/pre-commit. Make sure the file is executable.
Do you have similar stuff in your VCS hooks? Is there anything I could improve in mine? I’ll be glad to see your tips in the comments.
TL;DR To programatically verify Google Play subscriptions, you have to use the OAuth 2.0 for web server applications auth flow, not service accounts. This is a massive FUBAR on Google’s side and makes life of developers very painful.
Lately, I’ve been working on the backend part of a upcoming app we’re developing for one of our clients. This app offers monthly and yearly subscriptions, so I had to implement a check if the recurring payment happened, the credit card got billed and the app provider got the money. Of course, for multiple reasons, this has to be done server-side, completely automatically and without any intervention from the app user or provider.
Google provides an API called android-publisher for this. To use any API from Google, first you have to enable it from the Console and then authenticate with it. The authentication is done via OAuth 2.0. As Google offers API access to many of their services which are used in different occasions, they also offer different OAuth 2.0 authentication flows.
The flow/mechanism for server to server communication is called Service accounts in Google terminology. This is precisely what I needed. However, for reasons beyond my understanding, this is not the one used for android-publisher API. Instead, they chose Web server applications flow, which for this use case is absurd.
(Sidenote: When we started to build the aforementioned app, recurring transaction were not even available for Android. We planned to use Paypal as we did for the Blackberry version. However, during development, Google introduced subscriptions for Android which made us happy.
I started reading the docs and implementing the whole auth and check code, but it didn’t work; I was getting “This developer account does not own the application.” HTTP 401 error. Googling for this didn’t help – at that time, the only search results were two couple of hours old questions on Stack Overflow. I would swear the docs at that time mentioned to use Service accounts for authentication and later Google changed it. I had to re-read the docs from the beginning to debug this infuriating error.)
Using Web server applications flow is ridiculous because human interaction is involved. At least once, you (in this case our client!) need to press an “Allow” button in you web browser. Palm, meet face.
Here are the instructions you need to follow to achieve automated subscription verification. The code is in Python but it’s easy to adapt.
First of all, in the Console, you need to create a Client ID for Web applications. You can use
http://localhost as the redirect hostname. As you’ll see in a minute, it doesn’t matter much. You mostly need the Client ID and Client secret.
Next, fire up the Python REPL and enter this:
Use the Client ID and Client secret you obtained from Console. This piece of code will give you an authentication URL; by default, it will contain
access_type=offline parameter. This is very important, make sure it’s there. Open the URL in your browser and log in with the Google account that you will be using for publishing the Android application. After a successfull login and authorization, you’ll be redirected to localhost in your browser. Unless you’re running a webserver locally, this will probably fail, but it doesn’t matter. The address you are redirected to will contain a
code parameter. Copy its value and go back to the REPL again:
Finally you’ve got an instance of the
oauth2client.client.OAuth2Credentials class. It contains couple of properties but the only one that’s really interesting is the
refresh_token. Store the refresh token to your server configuration, you can use it forever meaning until someone does not revoke the access to the API. Then you would have to got through this whole process again.
Basically, thanks to this refresh token you will able to obtain a new access token on each call to the API. To do that, you create an instance of
OAuth2Credentials and use that to authorize an
You can now build a service and call the get purchases API call.
The following gist summarizes the whole blogpost:
As long as the API access will not be revoked, you should be fine using this method.
I gave a talk at Prague’s python user group meetup. It was about my experience of learning and using Obj-C to develop iOS apps as a python developer. You can check out the recorded video below. Slides are on Speakerdeck (I tried to embed them but Posterous doesn’t play nicely with Speakerdeck).
Myngo is a web administration interface for MongoDB. It is written in
Python, runs on Tornado and uses jQuery on the front-end. It is a
fresh, new project so there’s no package yet. If you want to try it
out follow the instructions.
and some server info. You can also do some actions with the DBs and
collections. Check the screenshots for more details.
There’s a lot of features in plan for Myngo, the most significant being:
* querying (or an interactive console)
* user auth and permission system
* slick UI
* some kind of test suite
requests to the project’s bug tracker or directly to me. I’ll also
gladly appreciate any kind of help (especially with design and layout)
so feel free to fork and hack away. If you find Myngo valuable, please consider donating so I can spend
more time improving it.