Raw SQL query straight to pandas dataframe
When you want to use combined full powers of SQL and MLJAR studio you land here. A recipe to automatically convert your database data into pandas dataframe object, ready for further analysis.
MLJAR Studio is Python code editior with interactive code recipes and local AI assistant.
You have code recipes UI displayed at the top of code cells.
All required packages are automatically imported by MLJAR Studio for you so you don't have to worry about them.
# import packages
import psycopg
import os
from dotenv import load_dotenv
import pandas as pd
Make sure you opened a connection in your notebook. To learn how to do it, check out open and test database connection example notebook.
# load credentials from .env file:
load_dotenv(override=True)
# get the credentials
def create_new_connection():
try:
conn = psycopg.connect(
dbname=os.getenv("POSTGRES_DB_NAME"),
user=os.getenv("POSTGRES_USERNAME"),
password=os.getenv("POSTGRES_PASSWORD"),
host=os.getenv("POSTGRES_HOST"),
port=os.getenv("POSTGRES_PORT"),
)
return conn
# check for errors
except psycopg.Error as e:
raise psycopg.Error(f"""
Error occurred while establishing connection:
{e}
Maybe you provided wrong credentials, use define new connection recipe to edit them.
Other option is that database server is down or you dont have the right acces to this database.
""")
# open new connection:
conn = create_new_connection()
To create dataframe we need data, so our keyword of choice is SELECT
. And tables are id,num
and data
. We named them thse same as in the database for simplicity.
# if connection was used and closed it is reopen here
if conn.closed:
conn = create_new_connection()
# run query
with conn:
with conn.cursor() as cur:
try:
cur.execute("""
select * from test
""")
# check for errors
except psycopg.ProgrammingError as e:
raise psycopg.ProgrammingError(f"""
Problem running query:
{e}
""")
# check if query was a select query
if str(cur.statusmessage).upper().startswith("SELECT"):
# fetch all the results
results = cur.fetchall()
# create dataframe
try:
df = pd.DataFrame(data=results, columns=pd.Index(["id","num","data","description"]))
print(f"{df.shape = }")
print(df.head())
except ValueError as e:
raise ValueError(f"""
Error creating DataFrame:
{e}
Does number of columns specified matches number of columns in the data.
""")
Results section shows us the shape and the head of the dataframe.
Conclusions
Now you can treat the dataframe as any other object of this type. For exapmle display a binary tree or other data analysis.
Recipes used in the postgresql-python-raw-query-to-pandas.ipynb
All code recipes used in this notebook are listed below. You can click them to check their documentation.
Packages used in the postgresql-python-raw-query-to-pandas.ipynb
List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.
psycopg>=3.2.1
python-dotenv>=1.0.1
pandas>=1.0.0
Similar notebooks
List of similar Python notebooks, so you can find more inspiration ๐
Define a new PostgreSQL connection in Python
Connect to PostgreSQL database in Python
Show tables from PostgreSQL database
Show columns from table in PostgreSQL
Create new table using python and PostgreSQL
Drop table using Python and PostgreSQL
Backup table using python
Update column in table using Python
Select every column from table using python
Insert row into a table using Python
Run raw SQL query in python
Raw SQL query straight to pandas dataframe
From database to chart plotting
Update selected rows of a column in table ...
Fill PostgreSQL database with AI using Python
Transfer data from database to a Google sheet