Copying Files to S3

If you want to host a simple website in S3 (like this one!) it's really easy to upload your content - either use the AWS S3 console or do something like:

aws cp s3 index.html s3://yourbuckethere/

But, what happens if you write some code to copy the file? After all, let's do some automation, people!

import boto3
import os

S3Client = boto3.client("s3")
S3BucketName = "yourbuckethere"

for Filename in os.listdir("source-dir"):
    try:
        Response = S3Client.upload_file("source-dir/"+Filename, S3BucketName, Filename)
    except Exception as e:
        print("Unknown error copying to S3: %s" % e)

The files are there - but if you browse to the HTML contents your browser just downloads the file rather than rendering it. But if you then launch that file the browser does what you expect. Huh?

Turns out that the metdata (specifically the Content-Type) is set incorrectly. The AWS CLI (and the console) automatically set it for you but when you're copying your own files the Python libraries don't do that - after all, you're in control of your own destiny now!

Ok. Let's do it manually then. Admittedly, this is quite dodgy and I should really be using the python-magic library but because I didn't want any dependencies and I knew exactly what file types I was going to use I hacked this a bit:

import boto3
import os

def GetMIMEType(Filename):
    MIMEList = {"html":"text/html", "css":"text/css", "js":"text/javascript", "pdf":"application/pdf", "png":"image/png",
                "zip":"application/zip", "py":"text/x-script.python"}
            
    MIMEType = ""
    Extension = os.path.splitext(Filename)[1]
    if Extension in MIMEList: MIMEType = MIMEList[Extension]
    return(MIMEType)

S3Client = boto3.client("s3")
S3BucketName = "yourbuckethere"

for Filename in os.listdir("source-dir"):
    ContentType = {"ContentType":GetMIMEType(Filename)}
        
    try:
        Response = S3Client.upload_file("source-dir/"+Filename, S3BucketName, Filename, ExtraArgs=ContentType)
    except Exception as e:
        print("Unknown error copying to S3: %s" % e)

The only curiosty here is that you'd expect to set the content type by setting "Content-Type" but you don't - you use "ContentType" instead. If you go down the path of working around the error of putting "Content-Type" into the metdata by creating custom metatdata types then things still don't work - the S3 API turns that into "x-aws-custom-content-type" instead. Which is quite unhelpful unless you're really trying to set some custom metadata.

Go figure. There endeth the lesson.