Fill PostgreSQL database with AI using Python

Combining power of both SQL and AI we are going to fill out missing description columns in database using AI generated text.

This notebook was created with MLJAR Studio

MLJAR Studio is Python code editior with interactive code recipes and local AI assistant.
You have code recipes UI displayed at the top of code cells.

Documentation

All required packages are automatically imported by MLJAR Studio for you so you don't have to worry about them.

# import packages
import psycopg
import os
from dotenv import load_dotenv
from psycopg import sql
from openai import OpenAI, AuthenticationError

Make sure you opened a connection in your notebook. To learn how to do it, check out open and test database connection example notebook.

# load credentials from .env file:
load_dotenv(override=True)

# get the credentials
def create_new_connection():
    try:
        conn = psycopg.connect(
            dbname=os.getenv("POSTGRES_DB_NAME"),
            user=os.getenv("POSTGRES_USERNAME"),
            password=os.getenv("POSTGRES_PASSWORD"),
            host=os.getenv("POSTGRES_HOST"),
            port=os.getenv("POSTGRES_PORT"),
        )
        return conn
    # check for errors
    except psycopg.Error as e:
        raise psycopg.Error(f"""
Error occurred while establishing connection: 
    {e}

Maybe you provided wrong credentials, use define new connection recipe to edit them.
Other option is that database server is down or you dont have the right acces to this database.
            """)

# open new connection:
conn = create_new_connection()

I chose the products table. Lets list all columns of that table using show all columns recipe

# if connection was used and closed it is reopen here
if conn.closed:
    conn = create_new_connection()

# run query
with conn:
    with conn.cursor() as cur:

        # query db
        try:
            cur.execute("""
                    SELECT attname AS col, atttypid::regtype AS datatype
                    FROM pg_attribute
                    WHERE attrelid = %s::regclass 
                    AND attnum > 0
                    AND NOT attisdropped
                    ORDER BY attnum;""", 
                    ("products",))
        # check for errors
        except psycopg.ProgrammingError as e:
            raise psycopg.ProgrammingError(f"""
Problem running query:
    {e}

Did you spell everything correctly?
You can use show tables and columns recipes.
            """)

        # print the results
        print("Columns of products:")
        for column in cur.fetchall():
            print(f"{column}")

Lets check the contents of products table using select query recipe.

# if connection was used and closed it is reopen here
if conn.closed:
    conn = create_new_connection()

# run query
with conn:
    with conn.cursor() as cur:

        # query db
        try:
            cur.execute(
                sql.SQL("SELECT {columns} FROM {table}").format(
                    columns=sql.SQL("*"),
                    table=sql.Identifier("products"),
                )
            )
        # check for errors
        except psycopg.ProgrammingError as e:
            raise psycopg.ProgrammingError(f"""
Problem running query:
    {e}

Did you spell everything correctly?
You can use show tables and columns recipes.
            """)

        # print the results
        for row in cur.fetchall():
            print(f"{row}")

We can see that the description column is empty. Lets fill it out. To do so we will use AI.

First we need to open a new connection to the LLM provider. Here is an example on how to do it

# load .env file
load_dotenv()

# get api key from environment
api_key = os.environ["OPENAI_KEY"]

# create OpenAI client
def create_client(api_key):
    try:
        client = OpenAI(api_key=api_key)
        client.models.list()
        return client
    except AuthenticationError:
        print("Incorrect API")
    return None

client = create_client(api_key)

Now lets ask our question to the all mighty, super inteligent LLM.

# create a chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "I have a list of products in my database: (1, 'grass cutter', 130, None) (2, 'laptop', 1529, None) (3, 'metal detector', 502, None) (4, 'cup', 10, None) (5, 'camera', 655, None) those records are missing values in a description column  i'd like you to fill it out, so there are no blanks left return only the descriptions and in a form of a python list"},
    ],
    max_tokens=300
)

# get and print response
print(response.choices[0].message.content)

We will copy the returned list and paste it into the notebook. We will also create an ids list. Don't forget to execute this cell!

# list of descriptions generated by LLM
descriptions = [
    "A tool used to cut grass in gardens and lawns efficiently.",
    "A portable personal computer with a screen and keyboard.",
    "A device used for detecting metal objects beneath the ground.",
    "A small bowl-shaped drinking vessel used to drink liquids.",
    "A device for capturing photographs and recording videos."
]
# list of id to update on, 
# the range is 1 to 6 because in pytohn the upper bound is exclusive
ids = [id for id in range(1,6)]
# result is: ids = [1, 2, 3, 4, 5]

To update records in out database we will use Update selected columns recipe. Here is a notebook explaining how it works.

# if connection was used and closed it is reopen here
if conn.closed: 
    conn = create_new_connection()

# run query
with conn:
    with conn.cursor() as cur:

        # update data
        try: 
            query = sql.SQL(
                "UPDATE {table} SET {column} = %(value)s WHERE id = %(id)s RETURNING *"
            ).format(table=sql.Identifier("products"), column=sql.Identifier("description"))
            cur.executemany(query, params_seq=[{"id": ids[i], "value": descriptions[i]} for i in range(len(ids))], returning=True)
        # check for errors
        except psycopg.ProgrammingError as e:
            raise psycopg.ProgrammingError(f"""
Problem running query:
    {e}

Are you sure every name is spelled correctly?
You can use show all tables or columns to check db contents.
            """)

        # print the results
        while True:
            for row in cur.fetchall():
                print(row)
            if not cur.nextset():
                break

Conclusions

MLJAR studio provides plethora of very powerful tool which when strung together empowers you to accomplish complex task in just a few clicks.

Recipes used in the postgresql-python-fill-out-db-with-ai.ipynb

All code recipes used in this notebook are listed below. You can click them to check their documentation.

Packages used in the postgresql-python-fill-out-db-with-ai.ipynb

List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.

psycopg[binary]>=3.2.1

python-dotenv>=1.0.1

openai>=1.35.14