Ross Youngblood Ross Youngblood - 10 months ago 59
Python Question

Python 'yield' statements cause JSON not serializable errors in LAMBDA AWS test case

I'm learning how to use Python in the Amazon AWS Lambda service. I'm trying to read characters from an S3 object, and write them to another S3 object. I realize I can copy the S3 object to a local tmp file, but I wanted to "stream" the S3 input into the script, process and output, without the local copy stage if possible. I'm using code from this StackOverFlow (Second answer) that suggests a solution for this.

This code contains two "yield()" statements which are causing my otherwise working script to throw a "generator is noto JSON serializable" error.
I'm trying to understand why a "yield()" statement would throw this error. Is this a Lambda environment restriction, or is this something specific to my code that is creating the serialization issue. (Likely due to using an S3 file object?).

Here is my code that I run in Lambda. If I comment out the two yield statements it runs but the output file is empty.

from __future__ import print_function

import json
import urllib
import uuid
import boto3
import re

print('Loading IO function')

s3 = boto3.client('s3')

def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))

# Get the object from the event and show its content type
inbucket = event['Records'][0]['s3']['bucket']['name']
outbucket = "outlambda"
inkey = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
outkey = "out" + inkey
infile = s3.get_object(Bucket=inbucket, Key=inkey)

except Exception as e:
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
raise e

tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
# upload_path = '/tmp/resized-{}'.format(key)

with open(tmp_path,'w') as out:
unfinished_line = ''
for byte in infile:
byte = unfinished_line + byte
#split on whatever, or use a regex with re.split()
lines = byte.split('\n')
unfinished_line = lines.pop()
for line in lines:
yield line # This line causes JSON error if uncommented
yield unfinished_line # This line causes JSON error if uncommented
# Upload the file to S3
tmp = open(tmp_path,"r")
outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
except Exception as e:
print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
raise e



A function includes yield is actually a generator, whereas the lambda handler needs to be a function that optionally returns a json-serializable value.