Raw SQL query straight to pandas dataframe

When you want to use combined full powers of SQL and MLJAR studio you land here. A recipe to automatically convert your database data into pandas dataframe object, ready for further analysis.

This notebook was created with MLJAR Studio

MLJAR Studio is Python code editior with interactive code recipes and local AI assistant.
You have code recipes UI displayed at the top of code cells.

Documentation

All required packages are automatically imported by MLJAR Studio for you so you don't have to worry about them.

# import packages
import psycopg
import os
from dotenv import load_dotenv
import pandas as pd

Make sure you opened a connection in your notebook. To learn how to do it, check out open and test database connection example notebook.

# load credentials from .env file:
load_dotenv(override=True)

# get the credentials
def create_new_connection():
    try:
        conn = psycopg.connect(
            dbname=os.getenv("POSTGRES_DB_NAME"),
            user=os.getenv("POSTGRES_USERNAME"),
            password=os.getenv("POSTGRES_PASSWORD"),
            host=os.getenv("POSTGRES_HOST"),
            port=os.getenv("POSTGRES_PORT"),
        )
        return conn
    # check for errors
    except psycopg.Error as e:
        raise psycopg.Error(f"""
Error occurred while establishing connection: 
    {e}

Maybe you provided wrong credentials, use define new connection recipe to edit them.
Other option is that database server is down or you dont have the right acces to this database.
            """)

# open new connection:
conn = create_new_connection()

To create dataframe we need data, so our keyword of choice is SELECT. And tables are id,num and data. We named them thse same as in the database for simplicity.

# if connection was used and closed it is reopen here
if conn.closed:
    conn = create_new_connection()

# run query
with conn:
    with conn.cursor() as cur:

        try:
            cur.execute("""
select * from test
            """)
        # check for errors
        except psycopg.ProgrammingError as e:
            raise psycopg.ProgrammingError(f"""
Problem running query:
    {e}
            """)

        # check if query was a select query
        if str(cur.statusmessage).upper().startswith("SELECT"):
            # fetch all the results
            results = cur.fetchall()

            # create dataframe
            try:
                df = pd.DataFrame(data=results, columns=pd.Index(["id","num","data","description"]))
                print(f"{df.shape = }")
                print(df.head())
            except ValueError as e:
                raise ValueError(f"""
Error creating DataFrame:
    {e}

Does number of columns specified matches number of columns in the data.
                """)

Results section shows us the shape and the head of the dataframe.

Conclusions

Now you can treat the dataframe as any other object of this type. For exapmle display a binary tree or other data analysis.

Recipes used in the postgresql-python-raw-query-to-pandas.ipynb

All code recipes used in this notebook are listed below. You can click them to check their documentation.

Packages used in the postgresql-python-raw-query-to-pandas.ipynb

List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.

psycopg>=3.2.1

python-dotenv>=1.0.1

pandas>=1.0.0