Files
random_corp/.github/copilot-instructions.md

386 lines
26 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Copilot Instructions for md-ddl
## What this repository is
This repo is the **source of the MD-DDL standard** — its specification, agent prompts, skills, and worked examples. It is not a runtime application and not a domain modelling workspace.
### Purpose boundary
- `copilot-instructions.md` governs **contributor behaviour in this repository** (maintaining the spec, agent prompts/skills, examples, and tooling).
- Custom agents govern **modelling behaviour when applying MD-DDL** to user domains.
- Keep these concerns separate: repository maintenance guidance belongs here; end-user modelling workflow guidance belongs in agent prompts/skills.
Contributors here are working on one of four things:
1. **The specification** — the normative rules of the MD-DDL language
2. **Agent prompts and skills** — the AI guidance layer built on top of the spec
3. **Examples** — reference domain and entity files that demonstrate the spec
4. **Tooling** — any validator or utility that processes MD-DDL files
Understanding which of these you are working on determines everything about how to proceed.
---
## Repository layout
```
md-ddl-specification/ Normative spec — source of truth for all rules
1-Foundation.md Core principles and document structure
2-Domains.md Domain file format and metadata
3-Entities.md Entity and attribute definitions
4-Enumerations.md Enum structure and naming
5-Relationships.md Relationship semantics and YAML
6-Events.md Event structure and temporal rules
7-Sources.md Source system declarations and source-layer structure
8-Transformations.md Transformation vocabulary and mapping types
9-Data-Products.md Data product classes, declaration, and generation rules
10-Adoption.md Brownfield adoption, maturity model, baseline-to-canonical path
MD-DDL-Complete.md Concatenated single-file version (generated)
agents/
agent-guide/ Learning and navigation agent
AGENT.md Core prompt — identity, modes, user archetypes
skills/
orientation/ Role profiling, MD-DDL overview, workflow mapping
concept-explorer/ Interactive spec teaching with analogies
worked-examples/ Simple Customer + Financial Crime + Brownfield Retail walkthroughs
adoption-planning/ Brownfield adoption assessment and roadmap planning
platform-setup/ VS Code Copilot + Claude Code setup and workflow
agent-ontology/ Discovery and design agent
AGENT.md Core prompt — identity, modes, skill index
skills/
domain-scoping/ Interview protocol + Domains spec (includes brownfield baseline-to-canonical path)
entity-modelling/ Concept realisation + Entities + Enumerations spec
relationship-events/ Relationships + Events spec
standards-alignment/ Industry standards mapping (self-contained)
source-mapping/ Source system declaration + field-level transformations
domain-review/ Structural and decision-quality review protocol
baseline-capture/ Document existing schemas, ETL, catalog as baselines
schema-import/ Fast-track DDL-to-domain inference (brownfield)
agent-artifact/ Physical artifact generation agent
AGENT.md Core prompt — identity, modes, skill index
skills/
dimensional/ Star schema, fact/dimension/bridge mapping
normalized/ Normalized operational schema, DDL/JSON Schema/Parquet
wide-column/ Denormalized wide-column reporting schemas
knowledge-graph/ Knowledge graph schema, Neo4j Cypher DDL
reconciliation/ Compare generated artifacts against existing state
agent-architect/ Strategic design and data product publication agent
AGENT.md Core prompt — identity, modes, skill index
skills/
architecture/ Architectural philosophy, tenets, comparison, presentation outputs
product-design/ Product class selection, entity scoping, governance, masking
odps-alignment/ ODPS v4.0 manifest generation and mapping
references/ ODPS specification reference
agent-governance/ Standards conformance and compliance assurance agent
AGENT.md Core prompt — identity, modes (Conformance/Compliance/Monitor/Remediate), skill index
skills/
standards-conformance/ Post-modelling industry standards audit
regulatory-compliance/ Jurisdiction mapping + regulator file loader (shared with agent-ontology)
regulators/ Per-regulator guidance files (apra.md, gdpr.md, fatf.md, etc.)
compliance-audit/ Three-level audit protocol, gap report format, severity rules
.github/
agents/ Copilot custom-agent entrypoints (wrappers)
agent-guide.agent.md Frontmatter + include of canonical `agents/agent-guide/AGENT.md`
agent-ontology.agent.md Frontmatter + include of canonical `agents/agent-ontology/AGENT.md`
agent-artifact.agent.md Frontmatter + include of canonical `agents/agent-artifact/AGENT.md`
agent-architect.agent.md Frontmatter + include of canonical `agents/agent-architect/AGENT.md`
agent-governance.agent.md Frontmatter + include of canonical `agents/agent-governance/AGENT.md`
review-md-ddl.agent.md Copilot review agent wrapper
copilot-instructions.md Copilot-specific contributor guidance
.prompts/
md-ddl-review-prompt.md Layer 1: Structural review prompt
md-ddl-adversarial-review-prompt.md Layer 2: Adversarial review prompt
md-ddl-evaluation-prompt.md Layer 3: Stakeholder simulation prompt
md-ddl-layered-review-process.md Review process orchestration
concat-md-ddl-specs.prompt.md Spec concatenation rules
examples/
Financial Crime/ Primary reference example (most current)
domain.md Reference-quality domain file
entities/party.md Reference-quality entity detail file
entities/party_role.md Reference-quality entity detail file
```
---
## Working on the specification
### The spec is the authority — agents defer to it
Every rule in every agent prompt and skill must be traceable to the spec. If a rule exists in an agent prompt but not in the spec, it should either be
added to the spec (if it's a genuine standard) or removed from the prompt (if it's agent-specific behaviour).
### Section ownership
Each spec section owns a distinct layer of the language:
File | Owns
--- | ---
`1-Foundation.md` | Principles, document structure, two-layer model
`2-Domains.md` | Domain file format, metadata schema, diagram rules, summary tables
`3-Entities.md` | Entity YAML, attribute types, constraints, diagrams, inheritance
`4-Enumerations.md` | Enum formats, naming, dictionary vs. simple list
`5-Relationships.md` | Relationship types, granularity, cardinality, constraint syntax
`6-Events.md` | Event structure, payload design, temporal priority, actor/entity
`7-Sources.md` | Source file format, change models, domain feed tables, source-layer rules
`8-Transformations.md` | Transformation types, YAML syntax, expression language, generation behaviour
`9-Data-Products.md` | Data product classes, declaration syntax, governance, masking, product-driven generation
`10-Adoption.md` | Brownfield adoption maturity model, baseline capture format, baseline-to-canonical transition
When adding or changing a rule, edit the owning section only. Do not duplicate rules across sections.
### MD-DDL-Complete.md
Never edit the `MD-DDL-Complete.md` directly, as we will join this from the individual spec files when we are about to push to github.
This is a generated file — a concatenation of sections 110 in order. It exists for AI context loading (single-file spec injection into agent prompts). Do not edit it directly.
Regenerate with the repo script:
`powershell -ExecutionPolicy Bypass -File .\.github\scripts\concat-md-ddl-specs.ps1`
Canonical generation rules are maintained in `.prompts/concat-md-ddl-specs.prompt.md`. Keep concatenation behavior defined there and avoid duplicating detailed algorithm rules in multiple files.
After regeneration, run the verification checklist defined in `.prompts/concat-md-ddl-specs.prompt.md` before committing.
### Versioning
The spec uses semantic versioning in the H1 heading of each file. A version bump is required any time a rule changes in a way that would alter the output of a correctly-authored MD-DDL file. Corrections to examples or prose clarifications do not require a version bump.
---
## Working on agents and skills
### Canonical vs wrapper locations
Use `agents/` as the canonical source of agent behaviour and skill content.
Use `.github/agents/` for Copilot custom-agent wrapper files only. Wrapper files should contain:
- Custom-agent frontmatter (`name`, `description`, `argument-hint`, optional `tools`)
- A single include to the canonical prompt in `agents/.../AGENT.md`
Do not duplicate full agent prompts in both locations. Update canonical files in `agents/` and keep wrappers minimal.
### Responsibility split: repo guidance vs modelling guidance
- Put repository authoring/maintenance rules in `copilot-instructions.md`.
- Put modelling interview, drafting, refinement, and compliance execution rules in agent prompts/skills.
- If guidance controls how an agent models a business domain, it belongs under `agents/`.
- If guidance controls how contributors edit this repo's assets, it belongs here.
### The agent/spec relationship
Agent prompts and skills are **guidance built on top of the spec** — they teach an AI how to apply the rules, not what the rules are. If you find yourself
repeating a spec rule verbatim inside a skill, that's a signal to reference the spec file instead.
### Skill structure
Each skill follows the progressive disclosure pattern:
```
skills/<skill-name>/
SKILL.md Trigger description (frontmatter) + process guidance
references/ Spec sections or external references loaded on demand
```
The `SKILL.md` body should stay under 500 lines. Heavy spec content belongs in `references/` and is loaded only when the skill is active. See the skill-creator skill pattern for detailed guidance on writing effective skill files.
### The spec reference stub pattern
Reference files in `skills/*/references/` are stubs that point to the canonical spec file — they do not duplicate content. This means a spec update propagates automatically without touching agent files. When adding a new spec reference, create the stub and point it at the correct `md-ddl-specification/` file.
Use file-relative paths in both Markdown links and `{{INCLUDE: ...}}` directives inside reference stubs. Do not use workspace-root paths (for example `md-ddl-specification/...`) because this repo is commonly consumed as a `.md-ddl` submodule and root-based paths break in consumer projects.
### Reference Loading
Treat `{{INCLUDE: ...}}` as platform-dependent.
- VS Code Copilot custom-agent wrappers in `.github/agents/` process `{{INCLUDE}}` natively.
- Many other chat platforms do not process `{{INCLUDE}}` in arbitrary Markdown files.
- In SKILL.md guidance, instruct agents to load canonical spec files directly from `md-ddl-specification/*.md`.
- Keep `skills/*/references/*-spec.md` stub files as dependency documentation and include-aware fallbacks.
When updating a skill, prefer wording like: `Load md-ddl-specification/3-Entities.md (reference stub: references/entities-spec.md)`.
### Agent responsibilities and boundaries
Each agent owns a distinct lifecycle stage. Do not add capabilities to an agent that belong to another agent's stage.
Agent | Lifecycle stage | Owns
--- | --- | ---
`agent-guide` | Learning and navigation | Standard explanation, user orientation, concept teaching, worked example walkthroughs, platform setup, agent navigation and handoff
`agent-architect` | Strategic design and data products | Architecture philosophy discussion, data product class selection, entity scoping, governance overrides, masking strategies, ODPS manifest generation, external catalogue alignment
`agent-ontology` | Discovery and design | Domain modelling, entity authoring, relationship and event design, source system mapping and field-level transformations, standards alignment during authoring
`agent-artifact` | Physical artifact generation | Dimensional star schemas, normalized 3NF designs, wide-column reporting schemas, knowledge graph schemas, SQL DDL, JSON Schema, Cypher, Parquet schema contracts
`agent-governance` | Standards conformance and compliance assurance | Standards conformance auditing, compliance metadata auditing, regulatory monitoring, governance remediation
**Boundary rule — Guide vs All Specialists:** Agent Guide teaches concepts, walks through examples, helps with platform setup, and navigates users to the right specialist agent. It never creates production MD-DDL artifacts — domain files, entity details, data product declarations, physical schemas, or compliance audit reports. When a user is ready for production work, Agent Guide hands off explicitly to the appropriate specialist agent. Do not add production modelling, generation, product design, architecture discussion, or compliance auditing to Agent Guide.
**Boundary rule — Guide vs Ontology:** Agent Guide may sketch MD-DDL to illustrate concepts (clearly marked as demonstrations, not production artifacts). Agent Ontology creates production domain files, entity details, relationships, and events. If Agent Guide identifies that a user is ready to model, it hands off to Agent Ontology with a suggested opening prompt. Do not add production modelling capability to Agent Guide, and do not add tutorial or onboarding capability to Agent Ontology.
**Boundary rule — Guide vs Architect:** Agent Guide may mention architecture concepts when teaching (Teach mode). Agent Architect engages in strategic architecture discussion, positioning, and comparison (Discuss mode). If Agent Guide identifies that a user wants to discuss architecture philosophy, compare approaches, or prepare strategic material, it hands off to Agent Architect. Do not add strategic discussion capability to Agent Guide, and do not add tutorial or onboarding capability to Agent Architect.
**Boundary rule — Ontology vs Artifact:** Agent Ontology produces conceptual and logical MD-DDL models. Agent Artifact consumes those models and generates physical artifacts (DDL, JSON Schema, Parquet). If Agent Artifact identifies a conceptual gap (missing entity, attribute, or relationship), it flags the gap and defers the structural change to Agent Ontology. Do not add physical generation capability to Agent Ontology or domain modelling capability to Agent Artifact.
**Boundary rule — Ontology vs Architect:** Agent Ontology creates the initial `## Data Products` summary table during domain drafting. Agent Architect takes over for detailed product design — choosing product class, scoping entities, setting governance overrides, masking strategies, and generating external manifests (ODPS). Agent Architect also handles architecture philosophy discussion. Do not add detailed product design, ODPS generation, or architecture discussion to Agent Ontology, and do not add entity modelling or relationship design to Agent Architect.
**Boundary rule — Architect vs Artifact:** Agent Architect produces MD-DDL data product declarations that serve as input contracts for Agent Artifact. Agent Artifact generates physical artifacts (DDL, JSON Schema, Parquet, Cypher) scoped by the product's `entities`, `schema_type`, `governance`, and `masking` fields. Do not add physical artifact generation to Agent Architect, and do not add product design or ODPS alignment to Agent Artifact.
**Boundary rule — Ontology vs Governance:** Agent Ontology applies governance metadata and industry standards alignment during authoring (design-time). Agent Governance audits standards conformance and regulatory compliance after modelling (assurance-time). If a compliance or conformance gap requires a structural model change — a new entity, attribute, or relationship — Agent Governance flags it and defers the structural work to Agent Ontology. Do not add structural modelling capability to Agent Governance, and do not add post-hoc conformance auditing to Agent Ontology.
**Boundary rule — Governance vs Architect:** Agent Governance may audit data product governance and masking metadata (see Level 4 in the compliance-audit skill) and produce recommendations. Agent Architect owns all product declarations and applies approved changes. Agent Governance flags product governance gaps; Agent Architect fixes them. Do not add product declaration authoring to Agent Governance, and do not add compliance auditing to Agent Architect.
### Shared skills
Some skills are used by more than one agent. The `regulatory-compliance` skill is the current example — Agent Ontology uses it to apply governance metadata
during domain authoring; Agent Governance uses it as the requirements benchmark during audits.
Shared skills live under the agent that owns them conceptually. Agent Governance owns `regulatory-compliance` because compliance assurance is its primary purpose. Agent Ontology loads it as an external reference.
Industry standard reference files (`industry_standards/bian/`, `industry_standards/fhir/`, `industry_standards/tmforum/`) are shared read-only references used by both Agent Ontology (standards-alignment skill, design-time) and Agent Governance (standards-conformance skill, assurance-time).
When editing a shared skill, consider the impact on both agents. The skill's trigger description should reflect all contexts in which it is used.
### Adding a new agent
New agents follow the same structure as `agent-ontology/`:
- `AGENT.md` — identity, behaviour modes, skill index, non-negotiable output rules
- `skills/` — one skill per coherent process area, not per spec section
The skill index in `AGENT.md` is the triggering mechanism. Write trigger descriptions to be specific and slightly pushy — the agent should load a skill
when in doubt, not skip it to save context.
Before adding a new agent, confirm it occupies a distinct lifecycle stage not already covered. Add it to the agent responsibilities table above.
### What belongs in agents, not in the spec
- Interview protocols and question sequences
- Decision frameworks for trade-offs (entity vs. enum vs. attribute)
- Checklists for output quality review
- Industry standards mapping tables
- Behaviour modes (Interview / Drafting / Refinement)
None of these are rules of the language. They are guidance for applying the language.
---
## Table formatting in markdown
When writing tables in markdown, use pipe (`|`) syntax without leading and trailing pipes with a header row and separator line. For example:
```markdown
Column 1 | Column 2 | Column 3
--- | --- | ---
Value 1 | Value 2 | Value 3
```
---
## Working on examples
### What examples are for
Examples serve two purposes: they demonstrate correct spec application for human readers, and they act as AI context references (agents are instructed to use `examples/Financial Crime/` as the quality benchmark).
### The current reference example
`examples/Financial Crime/` is the highest-quality example in the repo. When in doubt about what correct MD-DDL looks like, this is the reference. The domain file uses the current table format; the entity files use current classDiagram and YAML patterns.
### Adding or updating examples
- New examples must conform to the current spec version — check the version header in `1-Foundation.md` before writing.
- If spec and example disagree, the spec wins. Flag the discrepancy and update the example to match the spec — do not leave known-incorrect examples in place.
- Keep links in examples relative and navigable: `entities/party.md`, `#party`.
- Never invent reference URLs for standards. Verify before adding.
### Upgrading existing examples
Examples must stay current with the spec. When a spec version bumps, check all examples for patterns that the new version has superseded and update them.
When upgrading an example, update all patterns in the file in a single pass — do not partially upgrade a file, as mixed old/new patterns are more confusing than either consistently old or consistently new.
---
## Validation philosophy
MD-DDL distinguishes between **mechanical pre-flight checks** (syntax, links, entity references) and **agent-driven quality review** (structure, convention, governance, domain fitness). This split is deliberate and normative — see `md-ddl-specification/1-Foundation.md` "Validation Model" for the authoritative definition.
Rules for contributors:
- **Do not add lint-style enforcement to agent prompts.** If an agent rejects a file for a convention violation, that is a prompt bug. Agents flag deviations; they do not reject files for anything above syntax level.
- **Validation language in agent prompts must match the tier.** Use `flag` / `note` / `suggest` / `observe` for convention and quality issues (Levels 35). Use `error` / `reject` / `fail` only for syntax-level failures (Level 1). When reviewing an agent prompt, search for error-language and confirm it is limited to syntax contexts.
- **Organisational vocabulary deviations are observations, not errors.** When an agent encounters `phi` instead of `pii`, or `data_class` instead of `classification`, the correct response is to note it as a potential spec vocabulary gap and continue working. Do not add prompt rules that reject non-standard vocabulary.
- **Pre-flight checks are fixed and minimal.** There are exactly 5 checks (YAML syntax, Mermaid syntax, internal link integrity, entity reference consistency, domain version field). Adding a new check requires a spec version bump — it is a deliberate, reviewed decision, not a casual addition.
- **The `1-Foundation.md` validation model section is the normative reference.** If you are unsure whether something is a pre-flight check or an agent-driven concern, consult that section. Do not resolve the ambiguity by adding enforcement rules to agent prompts.
---
## Architectural philosophy
MD-DDL implements the **Data Autonomy** architectural style. Every design decision in the spec — domain-aligned ownership, canonical models, model-driven generation, polyglot persistence, governance as metadata — traces to one or more of 13 architectural tenets distilled from the foundational blog series.
The tenets are documented in `agents/agent-architect/skills/architecture/SKILL.md`. The source material is in `references/architecture/` (17 blog posts, 3 external references, 7 Mermaid diagram conversions).
### The 13 tenets (summary)
1. Master data where change is least
2. Separate data from logic ownership
3. Design for loose coupling
4. Model for business semantics
5. Encode governance as metadata
6. Use polyglot persistence
7. Embrace small, regular change
8. Ask "what does good look like?"
9. Standardise 80%, differentiate 20%
10. Data products are architecture quantum
11. Canonical models + key mapping
12. Event-driven = real-time semantics
13. Data ethics as relational philosophy
### Rules for contributors
- When adding or changing spec rules, consider which tenet(s) the rule implements. If none, question whether the rule belongs.
- When reviewing agent prompts, verify they teach the *why* (architecture) not just the *what* (syntax).
- Tenets are living — they may evolve as the architectural thinking matures. Blog posts are historical references; the tenet table in SKILL.md is the current version.
- The tenets are informed positions with rationale and acknowledged counter-positions, not dogma. Agent prompts must present them as design heuristics, not rules.
---
## Cross-cutting rules for all contributions
- **Spec owns the rules.** Agents and examples implement them. When they conflict, fix the agent or example, not the spec (unless the spec is genuinely wrong).
- **One source of truth per rule.** If a rule appears in multiple places, that's technical debt. Flag it.
- **No invented content.** Do not fabricate standards references, regulatory requirements, or example data that is not verifiable.
- **Validate Markdown structure** on any changed file: table/link integrity, Mermaid syntax, heading hierarchy, and YAML block correctness.
- **No runtime assumptions.** There is no build system. Validation is structural and manual (or via a linter if one is added). Do not add code that assumes a build or test pipeline exists unless one has been defined.
---
## Reviewing the standard
When a user asks to review, evaluate, audit, or assess the MD-DDL standard, agents, or examples, use the layered review process defined in `.prompts/md-ddl-layered-review-process.md`.
### Quick reference
Request | What to do
--- | ---
"Review the standard" / "check for issues" / "run a review" | Load `.prompts/md-ddl-review-prompt.md` (Layer 1 — structural). Run it. Report findings.
"Find weaknesses" / "what's wrong" / "stress test" / "adversarial review" | Load `.prompts/md-ddl-adversarial-review-prompt.md` (Layer 2 — adversarial). Run it. Report findings.
"Evaluate for users" / "would people adopt this" / "stakeholder review" | Load `.prompts/md-ddl-evaluation-prompt.md` (Layer 3 — stakeholder simulation). Run it. Report findings.
"Full review" / "layered review" / "comprehensive review" | Load `.prompts/md-ddl-layered-review-process.md` for the orchestration protocol, then run layers 1 → 2 → 3 in order, cross-referencing findings between layers.
Review from a specific viewpoint (e.g. "review as a data engineer") | Load `.prompts/md-ddl-layered-review-process.md`, check the Ad-Hoc Viewpoint Reviews table, and frame the review from the requested perspective.
### Key rules for reviews
- Each layer should run in a separate conversation to prevent cross-contamination between evaluation stances.
- Every review report must include a "What I Cannot Evaluate" section declaring the limits of AI assessment.
- Cross-model diversity improves review quality — run different layers with different AI models when possible.
- An honest review that finds real issues is more valuable than a clean bill of health. Never soften findings.