Working with Amazon Textract (Part 2)

Shubham Singh
4 min readApr 8, 2020

This is the continuation of the part one blog. In this blog we’ll write some code, and walk through the process of creating the infrastructure as shown in the previous blog. We’ll use AWS CDK (cloud development kit)to make our infrastructure in python!!! Yes, you don’t need to write cloud formation scripts in YAML or JSON format, you can write it in your favorite language.

At present AWS CDK supports [TypeScript, JavaScript, Python, Java, and C#/ .Net].

Prerequisites

All CDK developers need to install Node.js (>= 10.3.0), even those working in languages other than TypeScript or JavaScript. The AWS CDK Toolkit (cdk command-line tool) and the AWS Construct Library are developed in TypeScript and run on Node.js. The bindings for other supported languages use this backend and toolset.
You must provide your credentials and an AWS Region to use the AWS CDK CLI, as described in Specifying Your Credentials and Region.

Other prerequisites depend on your development language, as follows.
1. Python >= 3.6
2. TypeScript >= 2.7
Go through this document for more information about version DOC.

# installing aws-cdk globally
$ npm install -g aws-cdk
# updating your language dependencies
$ pip install --upgrade aws-cdk.core
# updating and installing python dependencies
$ python -m ensurepip --upgrade
$ python -m pip install --upgrade pip
$ python -m pip install --upgrade virtualenv

Configure your AWS account with aws-cli tool. If you don’t have aws-cli tool then please install it and run the following command.

Video Promotion
$ aws configure
AWS Access Key ID [****************IX76]: <YOUR ACCESS KEY ID>
AWS Secret Access Key [****************v+RW]: <YOUR SECRET KEY>
Default region name [us-east-1]: <SPECIFY THE REGION>
Default output format [json]: <OPTIONAL>
# for this lab we'll use project name as awscdk, you can modify it if you want but need to change some code.
$
mkdir awscdk
$ cd awscdk
$ cdk init app --language python

cdk init uses the name of the project folder to name various elements of the project, including classes, subfolders, and files.

After initializing the project, activate the project’s virtual environment. This allows the project’s dependencies to be installed locally in the project folder, instead of globally.

$ source .env/bin/activate# Then install the app's standard dependencies:
$ pip install -r requirements.txt

This will activate the environment and install the standard dependencies.

$ cdk synth

This command synthesizes a AWS Cloud Formation template from one or more of the stacks in your AWS CDK app.

$ cdk bootstrap

It deploys the CDK toolkit stack into an AWS environment.

$ cdk deploy

It deploys the stack(s) named STACKS into your AWS account

For more information about cdk commands please refer this LINK.

Download the folder named awscdk from this GitHub repository, https://github.com/denyshubh/Aws-Textract-Demo/

The folder should look as shown in this image. Replace Your awscdk folder inside your project directory with the folder that you downloaded, from the git repository.

The awscdk_stack.py file contains the infrastructure code for the this project.

$ cdk diff

It compares the specified stack with the deployed stack or a local template file, and returns with status 1 if any difference is found. It is always suggested to run this command and check if your code is ready to be deployed.

# generate the cloud formation template and verify it.
$
cdk synth
# deploy the infrastructure to cloud.
$ cdk deploy

After the deployment is successful you’ll get a cloud formation Stack ARN.

AWS CLOUD FORMATION DESIGN

Upload Files to S3 DocumentsBucket (Left Image). The infrastructure will generate the textract report in the same s3 bucket within a folder. The folder contents are shown in the right image.

You can watch all the process logs in Cloudwatch Log groups. If any error occurs you can read the logs and resolve it.

If you want to Delete the stack and all the resources that it created just run the below command.

$ cdk destroy awscdk

Here is a link to documentation page of AWS Textract. I would suggest you to please go through it for better understanding.
https://docs.aws.amazon.com/textract/latest/dg/textract-dg.pdf

Also, you’ll get all the codes in this blog on AWS GitHub (TypeScript).
https://github.com/aws-samples/amazon-textract-serverless-large-scale-document-processing ,

If you want the stack in python then refer my GitHub: https://github.com/denyshubh/Aws-Textract-Demo

Link to part one of this blog https://medium.com/@shubham.singh98/working-with-amazon-textract-part-1-2a9225703c20

Thank You and Happy Coding.

--

--

Shubham Singh

I am Software Engineer with 2 years experience working for Dassault Systems and FIS, with a keen interest in Kubernetes and Container Technology.