when data is being prepared for research, analytics sharing, or sandbox access
De-identification Verification AI Prompt
This prompt verifies whether a dataset is sufficiently de-identified for compliant secondary use, sharing, or analysis. It scans for direct HIPAA identifiers as well as combinations of quasi-identifiers that could still create re-identification risk. It is especially useful before data leaves a protected clinical environment or is used in research, analytics sandboxes, or external reporting.
Verify that this dataset has been properly de-identified in compliance with HIPAA Safe Harbor or Expert Determination standards. Check for the presence of the 18 HIPAA identifiers: 1. Direct identifiers to scan for: - Names: scan all text columns for patterns matching full names - Geographic data: zip codes with <20,000 population, full street addresses, city+state combinations that identify small areas - Dates: scan for specific dates of birth, death, admission, or discharge that could identify individuals (dates should be shifted or replaced with age/year only) - Phone numbers, fax numbers, email addresses - Social Security Numbers (pattern: XXX-XX-XXXX) - Medical record numbers, health plan numbers, account numbers - Certificate/license numbers, vehicle identifiers, device serial numbers - URLs and IP addresses - Biometric identifiers - Full-face photographs 2. Quasi-identifiers: flag any combination of age + zip + sex + rare diagnosis that could re-identify a patient 3. For each identifier found: column name, number of affected rows, severity (direct identifier vs quasi-identifier) Return a de-identification gap report with recommended remediation for each finding.
When to use this prompt
when you need to verify HIPAA Safe Harbor style de-identification
when leadership wants a documented list of residual identifier risks
when quasi-identifier combinations may still permit re-identification
What the AI should return
A de-identification gap report listing detected identifiers or quasi-identifiers by column, severity classification, row counts affected, and specific remediation recommendations.
How to use this prompt
Open your data context
Load your dataset, notebook, or working environment so the AI can operate on the actual project context.
Copy the prompt text
Use the copy button above and paste the prompt into the AI assistant or prompt input area.
Review the output critically
Check whether the result matches your data, assumptions, and desired format before moving on.
Chain into the next prompt
Once you have the first result, continue deeper with related prompts in Data Quality and Compliance.
Frequently asked questions
What does the De-identification Verification prompt do?+
It gives you a structured data quality and compliance starting point for healthcare data analyst work and helps you move faster without starting from a blank page.
Who is this prompt for?+
It is designed for healthcare data analyst workflows and marked as intermediate, so it works well as a guided starting point for that level of experience.
What type of prompt is this?+
De-identification Verification is a single prompt. You can copy it as-is, adapt it, or use it as one step inside a larger workflow.
Can I use this outside MLJAR Studio?+
Yes. The prompt text works in other AI tools too, but MLJAR Studio is the best fit when you want local execution, visible Python code, and reusable notebooks.
What should I open next?+
Natural next steps from here are Clinical Data Quality Audit, Coding Accuracy Analysis, POA Flag Validation.