Summarize large files - an introduction

ChatPDF providers, where you can question large files with large language models (LLM’s), are sprouting like mushrooms. The technique is mainly based on vector embedding with a vector index or vector database; based on the question, semantically relevant chunks from the file are provided to the LLM so it can compose an answer. While this technique is cool, it is limited when you ask a question that spans the entire text, such as generating a summary, since that requires not a couple of chunks, but the full text....

July 31, 2023 · 4 min · 711 words · Joost

Async method decorator

Had a complete headache trying to figure out how a decorator as a class can maintain the possible async properties of a method. The solution is actually very simple. When called, use inspect.iscoroutinefunction to check whether it is a coroutine, and return again an async method! The example adds given paths to a registry, import inspect from functools import wraps paths_registry = [] class route(object): def __init__(self, path: str, **kwargs) -> None: self....

September 24, 2021 · 1 min · 131 words · Joost

A Simple Factory for Domain Events

This is a simple demonstration of a domain event factory in Python. I assume you are familiar with the Factory Method Pattern. I also use the pydantic package for attribute validation. When implemented, we can use the factory to create immutable domain events with a homogenous data structure across instances of the same type. The metadata is generated by the underlying BaseEvent. In this approach we always produces complete events....

January 25, 2021 · 2 min · 403 words · Joost

Python Immutable Objects

While reading into implementing DDD, there is often a plea for the use of immutable objects. The main motivation is that an object that is initially valid always remains valid; there is no later verification or validation required. Secondly, working with an immutable object cannot cause any side effects. Some data objects in Python are immutable, the dataclasses themselve are not. Let’s have this simple class: class SimpleClass: def __init__(self, attr1: int): self....

January 13, 2021 · 1 min · 199 words · Joost

How does Airflow schedule Daylight Saving Time?

One morning you find out your favorite Airflow DAG did not ran that night. Sad… Six months later the task ran twice and now you understand: you scheduled your DAG timezone aware and the clock goes back and forth sometimes because of Daylight Saving Time. For example, in Central European Time (CET) on Sunday 29 March 2020, 02:00, the clocks were turned from “local standard time” forward 1 hour to 03:00:00 “local daylight time”....

October 29, 2020 · 2 min · 326 words · Joost