Untitled Page

Task: Create a standardized, reusable schema for a Python dictionary that stores linguistics data, based solely on a provided list of dot-notated key aliases.

Input:
- You will receive a flat list of strings representing key paths using dot notation (e.g., "a.b", "phonology.stress.primary", "metadata.source.author").
- Each string represents a nested dictionary path.
- No explicit alias-to-canonical mapping will be provided.
- Some paths may represent synonymous or structurally equivalent concepts that should be normalized to a single canonical key structure.

Objective:
1) Infer and define a clean, standardized set of canonical keys (in snake_case) suitable for linguistics data.
2) Consolidate structurally or semantically redundant paths into a single canonical structure where appropriate.
3) Design a schema that can consistently represent all provided paths.

Requirements:

1) Canonical Schema Definition
   - Organize canonical keys hierarchically using nested structure (reflecting dot notation).
   - Group into relevant linguistic categories where applicable (e.g., phonology, morphology, syntax, semantics, pragmatics, lexicon, corpus, metadata).
   - For each terminal (leaf) field, specify:
     a) Canonical key path (dot notation)
     b) Data type (string, integer, float, boolean, list, object)
     c) Required or optional
     d) Description (1–2 sentences)
     e) Example value

2) Alias & Path Normalization Rules
   - Define rules for mapping variant paths to canonical paths.
   - Explain how to handle:
     - Different nesting depths
     - Singular vs plural forms
     - Case variations
     - Synonymous segments
   - Define how conflicts are resolved if multiple input paths map to the same canonical path.
   - Define how unknown or unmapped paths are handled.

3) Output Format (strict):

A) Canonical Schema Table (Markdown)
   Columns: Canonical Path | Type | Required? | Description | Example

B) JSON Schema (Draft 2020-12 compatible)
   - Represent the nested structure.
   - Use proper object nesting instead of flat dot keys.
   - Provide in a fenced code block labeled json.

C) Normalization Rules
   - Bullet list of transformation and consolidation logic.

D) Example
   - Show 3–5 example raw input paths
   - Show a sample raw dictionary built from those paths
   - Show the normalized dictionary using canonical structure

Constraints:
- All canonical keys must use snake_case.
- Preserve the original semantic intent of provided paths.
- Do not introduce linguistic categories unless supported by the input paths.
- Keep the total number of canonical leaf fields reasonable (15–40 unless input requires more).
- The schema must be broadly reusable across languages and linguistic subfields.
- Do not execute or interpret linguistic content; focus strictly on schema design.