Async Pandas

Pandas is great for Python because it offers efficient data manipulation and analysis capabilities, leveraging the speed of the underlying NumPy library. How does it behave with asyncio since I could not find much about it. Have an enourmnes dataset call an API with a throughput of 10call at once. The simple example pandas.DataFrame consists of 100 rows of lorem text: import lorem import pandas as pd df = pd.DataFrame({"Text": [lorem.text() for _ in range(100)]}) >>> df.head() Text 0 Labore quisquam neque adipisci labore non quae... 1 Aliquam etincidunt dolore dolore voluptatem. A... 2 Aliquam consectetur dolor dolorem dolorem ipsu... 3 Labore non aliquam numquam sed. Eius neque con... 4 Voluptatem ipsum modi amet tempora tempora eti... Asyncio If we want to sent every row to an API and that call takes about a second. Let’s consider this method reverses the text and returns the final three letters: ...

May 16, 2024 · 2 min · 331 words · Joost

Row-Level Security with SQLAlchemy

With Row Level security (RLS) you manage the access control at the row level within a database instead of the application. Row-Level Security allows you to define policies that determine which rows of data a particular user or role can access within a given table. Postgres Tables For this demonstration we create a simple setup with a User table and a Item table using SQLAlchemy 2.0: from sqlalchemy import Column, ForeignKey, Integer, String, create_engine, text from sqlalchemy.orm import declarative_base, relationship admin_engine = create_engine("postgresql://postgres:postgres@0.0.0.0:5432/postgres") Base = declarative_base() class Item(Base): __tablename__ = "items" id = Column(Integer, primary_key=True) name = Column(String) user_id = Column(Integer, ForeignKey("users.id")) user = relationship("User", back_populates="item_entries") class User(Base): __tablename__ = "users" id = Column(Integer, primary_key=True) username = Column(String) password = Column(String) item_entries = relationship("Item", back_populates="user") Base.metadata.create_all(admin_engine) Using the PostgreSQL superuser for application access is not a great idea due to its extensive privileges and security risks. It’s advisable to create a dedicated user with limited permissions tailored to the application’s requirements for improved security and operational control. ...

January 31, 2024 · 3 min · 562 words · Joost

Obfuscate Python

this post is under construction – I have the approaches here but need some time to also share the experience… How to obscure some Python code from anyone running the code? I am no expert here but I have tried a few things and will give my steps and recommendations here. Have a main.py with a simple helloworld FastAPI in this case. There is also an /error endpoint to see how much source code is returned in the logs. ...

November 6, 2023 · 2 min · 289 words · Joost

Limit Concurrency in AsyncIO

You can run multiple async tasks simultaneously in Python using the asyncio native library. This allows you to create and manage asynchronous tasks, such as coroutines, and run them concurrently. Let’s have the following async method, that counts to the length of the given name and returns the name. import asyncio from typing import Awaitable async def count_word(name: str) -> Awaitable[str]: if len(name) > 8: raise ValueError(f"{name} is too long...") for ii in range(len(name)): print(name, ii) await asyncio.sleep(1) return name Now, running this task twice: await count_word("first"), await count_word("second") first 0 first 1 first 2 first 3 first 4 second 0 second 1 second 2 second 3 second 4 second 5 ('first', 'second') Will not run the two tasks concurrently because it’s using await, which means that it will wait for the first task to complete before starting the second task. If you want to run these tasks concurrently, you should use asyncio.gather(): ...

October 23, 2023 · 3 min · 530 words · Joost

Summarize large files - an introduction

ChatPDF providers, where you can question large files with large language models (LLM’s), are sprouting like mushrooms. The technique is mainly based on vector embedding with a vector index or vector database; based on the question, semantically relevant chunks from the file are provided to the LLM so it can compose an answer. While this technique is cool, it is limited when you ask a question that spans the entire text, such as generating a summary, since that requires not a couple of chunks, but the full text. ...

July 31, 2023 · 4 min · 716 words · Joost