Data AnalystData Exploration14 promptsBeginner โ†’ Advanced13 single prompts ยท 1 chainFree to use

Data Exploration AI Prompts

AI prompts for exploratory data analysis (EDA), dataset understanding, distributions, correlations, and initial data investigation.

Prompts in this category

14 prompts
IntermediateSingle prompt
01

Bivariate Relationship Analysis

Bivariate Relationship Analysis is a intermediate prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output can include more technical detail, prioritization, and interpretation while still staying practical.

Prompt text
Analyze pairwise relationships between the key variables in this dataset: 1. Identify the most important target or outcome variable 2. For each other numeric column, create a scatter plot vs the target variable and compute the correlation coefficient 3. For each categorical column, show the mean target value per category (group-by analysis) 4. Flag any non-linear relationships that a correlation coefficient would miss 5. Identify the single variable that has the strongest relationship with the target, linear or otherwise 6. Note any interaction effects โ€” pairs of variables that together predict the target better than either alone Return a ranked list of variables by predictive relationship strength.
Open prompt page
IntermediateSingle prompt
02

Categorical Column Profiling

Categorical Column Profiling is a intermediate prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output can include more technical detail, prioritization, and interpretation while still staying practical.

Prompt text
Profile all categorical and text columns in this dataset: - For each column: unique value count, top 10 most frequent values with percentages - Flag high-cardinality columns (more than 50 unique values) - Identify columns that look like free text vs controlled vocabulary - Check for inconsistent formatting within the same column (e.g. 'USA' vs 'United States' vs 'us') - Identify any categorical column that could be useful as a grouping or segmentation dimension Return a profile table and highlight the 3 most analytically useful categorical columns.
Open prompt page
BeginnerSingle prompt
03

Column Relationship Map

Column Relationship Map is a beginner prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output should remain approachable and easy to review, even for someone with limited analytical background.

Prompt text
Map the relationships between columns in this dataset: 1. Identify likely primary key columns (unique identifiers) 2. Identify likely foreign key columns (references to other entities) 3. Group columns into logical categories: identifiers, dimensions, measures, dates, flags 4. For each measure column, identify which dimension columns are most likely used to slice or filter it 5. Draw a simple text-based entity map showing how columns relate to each other This should help me understand the data model before I start querying.
Open prompt page
IntermediateSingle prompt
04

Correlation Deep Dive

Correlation Deep Dive is a intermediate prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output can include more technical detail, prioritization, and interpretation while still staying practical.

Prompt text
Find all significant correlations in this dataset: - Compute the full correlation matrix for all numeric columns - List the top 10 strongest positive and negative correlations with their r values - Flag any pairs with |r| > 0.85 as multicollinearity risks - For each flagged pair, recommend which column to keep based on relationship to the target or business relevance - Visualize the correlation matrix as a heatmap with annotations - Note any correlations that are surprising or counterintuitive
Open prompt page
IntermediateSingle prompt
05

Data Freshness and Latency Check

Data Freshness and Latency Check is a intermediate prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output can include more technical detail, prioritization, and interpretation while still staying practical.

Prompt text
Check how fresh and timely this dataset is: 1. What is the most recent record date in the dataset? 2. How many hours or days old is the most recent data compared to today? 3. Is there evidence of data latency โ€” events that happened recently but haven't appeared yet? 4. Are records added in batches (e.g. large jumps at specific times) or continuously? 5. Compare record volume in the most recent period vs the equivalent prior period โ€” does it look complete or truncated? 6. Flag any columns that suggest pipeline delays (e.g. processing_date significantly later than event_date) Return a freshness verdict: Real-time / Near real-time / Daily batch / Delayed / Stale.
Open prompt page
BeginnerSingle prompt
06

Dataset Overview

Dataset Overview is a beginner prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output should remain approachable and easy to review, even for someone with limited analytical background.

Prompt text
Give me a complete overview of this dataset. Include: - Shape (rows, columns) - Column names and data types - Missing values per column (%) - Basic statistics for numeric columns (mean, std, min, max, quartiles) - Sample of first 5 rows Highlight any immediate data quality issues you notice.
Open prompt page
AdvancedSingle prompt
07

Dimensionality Assessment

Dimensionality Assessment is a advanced prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output should be comprehensive, methodical, and suitable for expert review or production-style work.

Prompt text
Assess the dimensionality and information density of this dataset: 1. How many features are there relative to the number of rows? Is this dataset wide, tall, or balanced? 2. Apply PCA and report how many components explain 80%, 90%, and 95% of the variance โ€” this shows the true effective dimensionality 3. Identify groups of highly correlated features that are effectively measuring the same thing 4. Flag any features that appear to be near-linear combinations of others (redundant features) 5. Identify features with near-zero variance โ€” they carry almost no information 6. Recommend a minimum feature set that retains 90% of the information in the dataset Return a dimensionality report with: original features, effective dimensions, redundant groups, and recommended feature set.
Open prompt page
IntermediateSingle prompt
08

Distribution Analysis

Distribution Analysis is a intermediate prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output can include more technical detail, prioritization, and interpretation while still staying practical.

Prompt text
Analyze the distribution of every numeric column in this dataset: - Compute mean, median, std, skewness, and kurtosis - Identify columns with skewness above 1 or below -1 - Flag outliers using the IQR method (1.5ร— IQR rule) - Suggest an appropriate transformation for skewed columns (log, sqrt, Box-Cox) - Plot a histogram for each numeric column Return a summary table: column | skewness | outlier count | recommended transformation.
Open prompt page
BeginnerSingle prompt
09

First Look at a New Dataset

First Look at a New Dataset is a beginner prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output should remain approachable and easy to review, even for someone with limited analytical background.

Prompt text
I just received this dataset and have never seen it before. Help me understand it from scratch: 1. What does this dataset appear to be about? What business domain or process does it describe? 2. What is the grain of the data โ€” what does one row represent? 3. What are the most important columns, and what do they measure? 4. What time period does it cover? 5. What are the top 3 questions this data could answer? 6. What are the top 3 questions it clearly cannot answer? Write your response in plain English, as if explaining to someone seeing data for the first time.
Open prompt page
AdvancedChain
10

Full EDA Chain

Full EDA Chain is a advanced chain for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is structured as a multi-step chain so the AI can reason through the problem in a deliberate order and produce a more complete result. The requested output should be comprehensive, methodical, and suitable for expert review or production-style work.

Prompt text
Step 1: Profile the dataset โ€” shape, column types, missing values, duplicates, memory usage. Step 2: Analyze distributions and detect outliers in all numeric columns. Step 3: Analyze cardinality and value frequencies in all categorical columns. Flag any with high cardinality (>50 unique values). Step 4: Compute and visualize the correlation matrix. Flag pairs with |r| > 0.85. Step 5: Identify the 5 most interesting patterns, anomalies, or relationships in the data. Step 6: Write a 1-page EDA summary report: dataset description, key findings, data quality issues, and recommended next steps.
Open prompt page
BeginnerSingle prompt
11

Numeric Column Summary Table

Numeric Column Summary Table is a beginner prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output should remain approachable and easy to review, even for someone with limited analytical background.

Prompt text
Create a clean summary table for all numeric columns in this dataset. For each numeric column include: - Count of non-null values - Mean and median - Standard deviation - Min and max - 25th, 50th, and 75th percentiles - Number of zeros - Number of negative values - Number of unique values Format as a transposed table where each column name is a row. Highlight any column where the mean and median differ by more than 20% โ€” this indicates skewness. Highlight any column with more than 10% zero values โ€” these may need special treatment.
Open prompt page
IntermediateSingle prompt
12

Outlier Landscape Overview

Outlier Landscape Overview is a intermediate prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output can include more technical detail, prioritization, and interpretation while still staying practical.

Prompt text
Give me a comprehensive map of extreme values across this entire dataset: 1. For every numeric column, show the top 5 highest and bottom 5 lowest values with their row indices 2. Flag any value that exceeds 5 standard deviations from the mean โ€” these are extreme outliers 3. Check whether extreme values cluster in the same rows (a single row that is extreme across many columns is suspicious) 4. Classify each extreme value: plausible business value, likely data error, or needs investigation 5. Calculate what percentage of rows contain at least one extreme value Return a summary table and highlight the 3 rows most deserving of manual inspection.
Open prompt page
BeginnerSingle prompt
13

Quick Data Health Check

Quick Data Health Check is a beginner prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output should remain approachable and easy to review, even for someone with limited analytical background.

Prompt text
Run a quick health check on this dataset and return a traffic-light summary: ๐ŸŸข Good / ๐ŸŸก Needs attention / ๐Ÿ”ด Critical issue Check: 1. Completeness โ€” missing values above 5% per column? 2. Consistency โ€” mixed data types, formatting issues, encoding errors? 3. Timeliness โ€” what is the date range? Are there gaps? 4. Accuracy โ€” values that seem impossible or implausible? 5. Uniqueness โ€” duplicate rows present? For each check, state the status and a one-sentence finding.
Open prompt page
IntermediateSingle prompt
14

Time Series Structure Check

Time Series Structure Check is a intermediate prompt for data exploration. This prompt helps the user understand the structure, meaning, and analytical potential of a dataset before moving into deeper work. It is designed to surface what is in the data, how trustworthy it looks, and which columns, relationships, or patterns deserve attention first. Use it early in an analysis workflow to reduce guesswork and create a shared understanding of the dataset. It is best suited for direct execution against a real dataset. The requested output can include more technical detail, prioritization, and interpretation while still staying practical.

Prompt text
Inspect the temporal structure of this dataset: - Identify all date, datetime, or timestamp columns - For each: min date, max date, total time span, and inferred frequency (daily, weekly, monthly) - Check for gaps in the time series โ€” are there missing periods? - Check for duplicate timestamps - Identify any time-zone inconsistencies - Plot the number of records per time period to visualize data volume over time Summarise whether this dataset is suitable for time series analysis and flag any issues that must be resolved first.
Open prompt page

Recommended workflow

1

Bivariate Relationship Analysis

Start with a focused prompt in Data Exploration so you establish the first reliable signal before doing broader work.

Jump to prompt
2

Categorical Column Profiling

Review the output and identify what needs follow-up, cleanup, explanation, or deeper analysis.

Jump to prompt
3

Column Relationship Map

Continue with the next prompt in the category to turn the result into a more complete workflow.

Jump to prompt
4

Correlation Deep Dive

When the category has done its job, move into the next adjacent category or role-specific workflow.

Jump to prompt

Frequently asked questions

What is data exploration in data analyst work?+

Data Exploration is a practical workflow area inside the Data Analyst prompt library. It groups prompts that solve closely related tasks instead of leaving users to search through one flat list.

Which prompt should I start with?+

Start with the most general prompt in the list, then move toward the more specific or advanced prompts once you have initial output.

What is the difference between a prompt and a chain?+

A single prompt gives you one instruction and one output. A chain is a multi-step sequence designed to build on earlier results and produce a more complete workflow.

Can I use these prompts outside MLJAR Studio?+

Yes. They work in other AI tools too. MLJAR Studio is still the best fit when you want local execution, visible code, and notebook-based reproducibility.

Where should I go next after this category?+

Good next stops are Visualization, Data Cleaning, Business Insights depending on what the current output reveals.

Explore other roles