When publishing a dataset for multiple downstream consumers.
Data Contract Definition AI Prompt
This prompt creates a formal data contract that defines what a dataset is, what it means, how fresh and accurate it will be, and how changes are managed. It is useful when producer and consumer teams need explicit expectations instead of informal assumptions. The contract should be precise enough to support governance and automation.
Write a data contract for the dataset: {{dataset_name}} produced by the {{producer_team}} team and consumed by {{consumer_teams}}.
A data contract is a formal agreement between data producers and consumers specifying what data will be delivered, in what format, with what quality guarantees, and on what schedule.
1. Dataset identity:
- Dataset name and version
- Producer: team, contact, and escalation path
- Consumers: teams currently depending on this dataset
2. Schema definition:
- Table or topic name
- For each column/field: name, data type, nullable (Y/N), description, example value, PII classification (Y/N)
- Primary key or unique identifier
- Partitioning columns
3. Semantics and business rules:
- Grain: what does one row represent?
- Business rules: constraints and derived logic (e.g. 'order_total is always the sum of line items')
- Key relationships to other datasets
4. Quality commitments:
- Completeness: which columns are guaranteed non-null?
- Uniqueness: which column combinations are guaranteed unique?
- Freshness: data will be available by {{sla_time}} on each {{frequency}}
- Accuracy: key measures are reconciled to source within {{tolerance}}
5. Change management:
- Breaking change definition: removed column, type change, semantic change
- Notice period: {{notice_period}} days notice required before a breaking change
- Deprecation process: how will consumers be notified and given time to migrate?
6. SLA and support:
- Incident response time: {{response_time}}
- Scheduled maintenance window: {{maintenance_window}}
- Where to report issues: {{issue_channel}}
Return: complete data contract document in YAML format.When to use this prompt
When introducing producer-consumer accountability in a platform.
When freshness, quality, and change guarantees must be documented.
When you want a machine-readable contract format such as YAML.
What the AI should return
Return a complete YAML data contract covering identity, schema, grain, business rules, quality commitments, change management, and support details. Ensure required fields, SLA language, and breaking-change definitions are explicit. The output should be ready to store in version control or a registry.
How to use this prompt
Open your data context
Load your dataset, notebook, or working environment so the AI can operate on the actual project context.
Copy the prompt text
Use the copy button above and paste the prompt into the AI assistant or prompt input area.
Review the output critically
Check whether the result matches your data, assumptions, and desired format before moving on.
Chain into the next prompt
Once you have the first result, continue deeper with related prompts in Data Contracts.
Frequently asked questions
What does the Data Contract Definition prompt do?+
It gives you a structured data contracts starting point for data engineer work and helps you move faster without starting from a blank page.
Who is this prompt for?+
It is designed for data engineer workflows and marked as beginner, so it works well as a guided starting point for that level of experience.
What type of prompt is this?+
Data Contract Definition is a single prompt. You can copy it as-is, adapt it, or use it as one step inside a larger workflow.
Can I use this outside MLJAR Studio?+
Yes. The prompt text works in other AI tools too, but MLJAR Studio is the best fit when you want local execution, visible Python code, and reusable notebooks.
What should I open next?+
Natural next steps from here are Breaking Change Migration, Contract Validation Pipeline, Data Mesh Contract Governance.