Python Immutable Objects

While reading into implementing DDD, there is often a plea for the use of immutable objects. The main motivation is that an object that is initially valid always remains valid; there is no later verification or validation required. Secondly, working with an immutable object cannot cause any side effects. Some data objects in Python are immutable, the dataclasses themselve are not. Let’s have this simple class: class SimpleClass: def __init__(self, attr1: int): self.attr1 = attr1 With this setup, you can change any of its attributes making the object not immutable: ...

January 13, 2021 · 1 min · 199 words · Joost

How does Airflow schedule Daylight Saving Time?

One morning you find out your favorite Airflow DAG did not ran that night. Sad… Six months later the task ran twice and now you understand: you scheduled your DAG timezone aware and the clock goes back and forth sometimes because of Daylight Saving Time. For example, in Central European Time (CET) on Sunday 29 March 2020, 02:00, the clocks were turned from “local standard time” forward 1 hour to 03:00:00 “local daylight time”. And recently, on Sunday 25 October 2020, 03:00 the clocks were turned from “local daylight time” backward 1 hour to “local standard time” instead. That means that any activity between 02:00 and 03:00 will not exist, or exists twice. ...

October 29, 2020 · 2 min · 326 words · Joost

Control-flow structure for database connections

With Python, creating a database connection is straightforward. Yet, I often see the following case go wrong, while a simple solution is easily at hand by using the context manager pattern. For database connections, you’ll need at least one secret. Let’s say you get this secret from a secret manager by running the get_secret() method. You also use an utility like JayDeBeApi to setup the connection and you are smart enough to close the connection after querying and deleting the password: ...

October 5, 2020 · 2 min · 386 words · Joost

Provide Spark with cross-account access

In case you need to provide Spark with resources from a different AWS account, I found that quite tricky to figure out. Let’s assume you have two AWS accounts: the alpha account where you run Python with IAM role alpha-role and access to the Spark cluster; and the beta account where you have the S3 bucket you want to get access to. You could give S3 read access to the alpha-role, but it is more persistent and easier to manage by creating an access-role in the beta account that can be assumed by the alpha-role. ...

August 21, 2020 · 2 min · 413 words · Joost

Upload Gitlab CI artifacts to S3

With GitLab CI it is incredibly easy to build a Hugo website (like mine); you can even host it there. But in my case I use AWS S3 and Cloudfront because it is cheap and easy to setup. The CI pipeline to build and upload the static website is also straightforward with the following .gitlab-ci.yml: variables: GIT_SUBMODULE_STRATEGY: recursive stages: - build - upload build: stage: build image: monachus/hugo script: - hugo version - hugo only: - master artifacts: paths: - ./public upload: stage: upload dependencies: - build image: dobdata/primo-triumvirato:v0.1.7 script: - aws --version - aws configure set region $AWS_DEFAULT_REGION - aws s3 sync --delete ./public s3://$S3_BUCKET only: - master The build stage generates the static website, which is shared with successive stages as an artifact. The upload stage uses my primo-triumvirato image, but this can be any image that has the aws cli installed. The sync --delete ... command recursively copies new and updated files from the source directory to the destination and deletes files that exist in the destination but not in the source. ...

July 5, 2020 · 1 min · 206 words · Joost