{"id":600,"date":"2023-03-08T12:16:25","date_gmt":"2023-03-08T12:16:25","guid":{"rendered":"https:\/\/computeoncloud.eu\/?p=600"},"modified":"2023-03-08T12:23:09","modified_gmt":"2023-03-08T12:23:09","slug":"executing-ad-hoc-jobs-with-amazon-step-functions","status":"publish","type":"post","link":"https:\/\/computeoncloud.eu\/index.php\/2023\/03\/08\/executing-ad-hoc-jobs-with-amazon-step-functions\/","title":{"rendered":"Executing ad-hoc jobs with Amazon Step Functions"},"content":{"rendered":"\n<p>We came across a requirement from a client to process thousands of data records in a database, generate a report, and present it to the client. Following are some of the requirements for this:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Filter out the data based on the request<\/li>\n\n\n\n<li>Monitor the state of every request<\/li>\n\n\n\n<li>There could be multiple requests at a time<\/li>\n\n\n\n<li>Processing a single request could take anywhere between 5 minutes to 5 hours<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">TL;DR \u26a1<\/h2>\n\n\n\n<p>We create an <strong>AWS Step Function<\/strong> which takes the input and generate the custom reports from database using <strong>ECS Tasks<\/strong>. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Solution \ud83d\udd25<\/h2>\n\n\n\n<p>Suppose we receive a request from Client A, which has id&nbsp;<strong>1266cd5a-1dfa-44f2-83ca-534c30b38555<\/strong>&nbsp;in our system. The client wants to generate a report on all the pending transactions in December 2021. This input from the client looks as follows:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{\n    \"filter\": {\n        \"client_id\": \"1266cd5a-1dfa-44f2-83ca-534c30b38555\",\n        \"start_date\": \"2021-01-01\",\n        \"end_date\": \"2021-12-31\",\n        \"status\": \"pending\"\n    }\n}<\/code><\/pre>\n\n\n\n<p>On the code side, we take this input as an environment variable <strong>REPORT_REQUEST<\/strong> and send this request for another RESTful API to get the required data:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import os, json\n\ndef main():\n    data_request_str = os.environ.get('REPORT_REQUEST', None)\n    if data_request_str is None:\n        raise EnvironmentError(\"REPORT_REQUEST isn't set\")\n    data_request = json.loads(data_request_str)\n    # Get data using this data request\n\nif __name__ == '__main__':\n    main()<\/code><\/pre>\n\n\n\n<p>We <strong>Dockerize<\/strong> above mentioned script and push it to Amazon ECR which is AWS managed Docker container registry. We create a task definition which lets AWS know what kind of resources we need to run this Docker container, for example, we need 4 GB of RAM.<\/p>\n\n\n\n<p>We create a <strong>Step Function<\/strong>, which takes a JSON input, Stringifies it and sets it as ECS Task environment variable <strong>REPORT_REQUEST<\/strong>. The Step Function definition looks like following:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{\n  \"Comment\": \"Report Task\",\n  \"StartAt\": \"main\",\n  \"States\": {\n    \"main\": {\n      \"Type\": \"Task\",\n      \"Resource\": \"arn:aws:states:::ecs:runTask.sync\",\n      \"Parameters\": {\n        \"LaunchType\": \"FARGATE\",\n        \"Cluster\": \"arn:aws:ecs:us-west-2:123456789012:cluster\/reporting-cluster\",\n        \"TaskDefinition\": \"arn:aws:ecs:us-west-2:123456789012:task-definition\/client-report:1\",\n        \"NetworkConfiguration\": {\n          \"AwsvpcConfiguration\": {\n            \"Subnets\": &#91;\n              \"subnet-0e365e9c4570c3fd2\"\n            ],\n            \"SecurityGroups\": &#91;\n              \"sg-07066bac0ae8b88t2\"\n            ],\n            \"AssignPublicIp\": \"DISABLED\"\n          }\n        },\n        \"Overrides\": {\n          \"ContainerOverrides\": &#91;\n            {\n              \"Name\": \"main\",\n              \"Environment\": &#91;\n                {\n                  \"Name\": \"REPORT_REQUEST\",\n                  \"Value.$\": \"States.JsonToString($)\"\n                }\n              ]\n            }\n          ]\n        }\n      },\n      \"End\": true\n    }\n  }\n}<\/code><\/pre>\n\n\n\n<p>The <strong>runTask.sync<\/strong> blocks the step and waits for the Ecs Task to stop execution.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"561\" height=\"341\" src=\"https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/Executing-ad-hoc-task-1.jpg\" alt=\"\" class=\"wp-image-611\" srcset=\"https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/Executing-ad-hoc-task-1.jpg 561w, https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/Executing-ad-hoc-task-1-300x182.jpg 300w\" sizes=\"auto, (max-width: 561px) 100vw, 561px\" \/><\/figure>\n<\/div>\n\n\n<p>Now we just start a new state machine with the above input, name it client-a-report and run it by clicking Start execution.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"265\" src=\"https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image-1024x265.png\" alt=\"\" class=\"wp-image-601\" srcset=\"https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image-1024x265.png 1024w, https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image-300x78.png 300w, https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image-768x199.png 768w, https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image-1536x397.png 1536w, https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image.png 1698w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">The function&nbsp;<strong><code>States.JsonToString($)<\/code>&nbsp;<\/strong>converts request JSON to a string.<br><br>If we receive another request from Client B, we can just start another state machine, which takes care of starting another state machine and both reports can run in parallel and we can monitor the state of all reports in the Amazon Step Function console.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"242\" src=\"https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image-1-1024x242.png\" alt=\"\" class=\"wp-image-602\" srcset=\"https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image-1-1024x242.png 1024w, https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image-1-300x71.png 300w, https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image-1-768x181.png 768w, https:\/\/computeoncloud.eu\/wp-content\/uploads\/2023\/03\/image-1.png 1501w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Key Takeaways \ud83d\uddd2\ufe0f<\/h2>\n\n\n\n<p>This solution helps to automate the repetitive adhoc tasks. We can achieve the same end result without using Step Functions, by directly using <strong>ECS run-task<\/strong> api. Step Functions help us to visualize the status of tasks and helps us view input in a more human readable manner.  We can also add retries in case of failures directly from Amazon Step Function definition. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>We came across a requirement from a client to process thousands of data records in a database, generate a report, and present it to the client. Following are some of the requirements for this: TL;DR \u26a1 We create an AWS Step Function which takes the input and generate the custom reports from database using ECS &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/computeoncloud.eu\/index.php\/2023\/03\/08\/executing-ad-hoc-jobs-with-amazon-step-functions\/\"> <span class=\"screen-reader-text\">Executing ad-hoc jobs with Amazon Step Functions<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":614,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8,11,5],"tags":[],"class_list":["post-600","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-aws","category-lambda","category-serverless"],"_links":{"self":[{"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/posts\/600","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/comments?post=600"}],"version-history":[{"count":10,"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/posts\/600\/revisions"}],"predecessor-version":[{"id":613,"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/posts\/600\/revisions\/613"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/media\/614"}],"wp:attachment":[{"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/media?parent=600"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/categories?post=600"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/computeoncloud.eu\/index.php\/wp-json\/wp\/v2\/tags?post=600"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}