Task: Create a standardized, reusable schema for a Python dictionary that stores linguistics data, based solely on a provided list of dot-notated key aliases.
Input:
- You will receive a flat list of strings representing key paths using dot notation (e.g., "a.b", "phonology.stress.primary", "metadata.source.author").
- Each string represents a nested dictionary path.
- No explicit alias-to-canonical mapping will be provided.
- Some paths may represent synonymous or structurally equivalent concepts that should be normalized to a single canonical key structure.
Objective:
1) Infer and define a clean, standardized set of canonical keys (in snake_case) suitable for linguistics data.
2) Consolidate structurally or semantically redundant paths into a single canonical structure where appropriate.
3) Design a schema that can consistently represent all provided paths.
Requirements:
1) Canonical Schema Definition
- Organize canonical keys hierarchically using nested structure (reflecting dot notation).
- Group into relevant linguistic categories where applicable (e.g., phonology, morphology, syntax, semantics, pragmatics, lexicon, corpus, metadata).
- For each terminal (leaf) field, specify:
a) Canonical key path (dot notation)
b) Data type (string, integer, float, boolean, list, object)
c) Required or optional
d) Description (1–2 sentences)
e) Example value
2) Alias & Path Normalization Rules
- Define rules for mapping variant paths to canonical paths.
- Explain how to handle:
- Different nesting depths
- Singular vs plural forms
- Case variations
- Synonymous segments
- Define how conflicts are resolved if multiple input paths map to the same canonical path.
- Define how unknown or unmapped paths are handled.
3) Output Format (strict):
A) Canonical Schema Table (Markdown)
Columns: Canonical Path | Type | Required? | Description | Example
B) JSON Schema (Draft 2020-12 compatible)
- Represent the nested structure.
- Use proper object nesting instead of flat dot keys.
- Provide in a fenced code block labeled json.
C) Normalization Rules
- Bullet list of transformation and consolidation logic.
D) Example
- Show 3–5 example raw input paths
- Show a sample raw dictionary built from those paths
- Show the normalized dictionary using canonical structure
Constraints:
- All canonical keys must use snake_case.
- Preserve the original semantic intent of provided paths.
- Do not introduce linguistic categories unless supported by the input paths.
- Keep the total number of canonical leaf fields reasonable (15–40 unless input requires more).
- The schema must be broadly reusable across languages and linguistic subfields.
- Do not execute or interpret linguistic content; focus strictly on schema design.0 views