Typescript (Zod)
Amino Acid Annotation (A3) format — TypeScript/Zod implementation design
Requirements
- Strict canonical format only — no legacy input accommodation
- Runtime validation via Zod (TypeScript types are compile-time only)
- Immutable value objects (
Object.freezeat construction) - Serialize to JSON
- Primary consumer: rtemislive-draw Next.js visualization app
- 100% wire-format compatible with the R implementation
Tooling
- Package manager:
pnpm - Formatter / linter:
biome - Test runner:
vitest - Validation:
zod ^3
Type Hierarchy
Types are inferred from Zod schemas — no separate interface/type declarations.
// Primitives (Zod schemas → inferred TypeScript types)
PositionSchema → number (positive integer, 1-based)
PositionsSchema → number[] (sorted, deduplicated)
RangeTupleSchema → [number, number] (start <= end)
RangesSchema → [number, number][] (sorted, overlaps merged)
// Annotation entries
SiteEntryData → { index: number[]; type: string }
RegionEntryData → { index: [number, number][]; type: string }
FlexEntryData → { index: number[] | [number, number][]; type: string }
VariantData → { position: number; [key: string]: unknown }
// Top-level
A3Data → {
sequence: string
annotations: {
site: Record<string, SiteEntryData>
region: Record<string, RegionEntryData>
ptm: Record<string, FlexEntryData>
processing: Record<string, FlexEntryData>
variant: VariantData[]
}
metadata: {
uniprot_id: string // default ""
description: string // default ""
reference: string // default ""
organism: string // default ""
}
}Schema Details
Sequence
z.string()with.min(2), regex[A-Za-z*]+,.transform(s => s.toUpperCase())- Lowercase accepted and normalized to uppercase
- Characters outside
[A-Za-z*]are rejected
Positions (PositionsSchema)
z.array(z.number().int().min(1)).transform(sortDedup)— sorted ascending, duplicates removed
Ranges (RangesSchema)
z.array(z.tuple([PositionSchema, PositionSchema]).refine(([s, e]) => s < e)).transform(sortRanges)— sorted by start (then end for ties).superRefine(checkNoOverlap)— rejects if any two consecutive ranges overlap (curr[0] <= prev[1]); adjacent ranges (curr[0] = prev[1] + 1) are permitted
Annotation entry schemas
Site (SiteEntrySchema): { index: PositionsSchema, type: z.string().default("") }
Region (RegionEntrySchema): { index: RangesSchema, type: z.string().default("") }
PTM / Processing (FlexEntrySchema):
{ index: z.union([RangesSchema, PositionsSchema]), type: z.string().default("") }
Union order is significant: RangesSchema is tried first (more specific — requires
2-element tuple elements). Input with scalar number elements falls through to
PositionsSchema.
Variant (VariantSchema)
z.object({ position: PositionSchema }).catchall(z.unknown()).refine(isJsonCompatible)— all fields must be recursively JSON-compatible
Annotation families (AnnotationsSchema)
z.object({ site, region, ptm, processing, variant }).strict().strict()rejects any key not in{ site, region, ptm, processing, variant }- All families default to
{}/[]when absent
Metadata (MetadataSchema)
z.object({ uniprot_id, description, reference, organism }).strict()- All fields are
z.string().default("")
Root schema (A3InputSchema)
z.object({ sequence, annotations, metadata }).strict().strict()rejects unknown top-level keys.superRefine(boundsCheck)— stage 2 contextual validation
Normalization Helpers (normalize.ts)
Pure functions used inside Zod transforms:
sortDedup(arr: readonly number[]): number[]
// Deduplicate and sort ascending
sortRanges(arr: readonly [number, number][]): [number, number][]
// Sort by start (then end for ties); no merging
// Overlap detection is a separate step in RangesSchema
isJsonCompatible(v: unknown): boolean
// Recursive check: null | boolean | number | string | array | plain object
// Rejects: undefined, functions, symbols, class instancesA3 Class
class A3 {
readonly #data: A3Data // Object.freeze'd at construction
constructor(input: unknown)
static fromData(data: unknown): A3
static fromJSONText(text: string): A3
static async readJSON(path: string): Promise<A3> // via io.ts
get length(): number // sequence length
residueAt(position: number): string // 1-based; throws RangeError
variantsAt(position: number): VariantData[]
toData(): A3Data // frozen reference
toJSON(): A3Data // called by JSON.stringify
toJSONString(indent?: number): string
async writeJSON(path: string, indent?: number): Promise<void>
}toJSON() returns the plain data object (not a string), so JSON.stringify(a3)
works naturally and produces canonical output.
Error Classes
class A3ValidationError extends Error
issues: ZodError["issues"] // full Zod issue list for programmatic inspection
class A3ParseError extends Error
// wraps JSON.parse failures and file I/O errorsFile Structure
typescript/
src/
normalize.ts // pure normalization helpers
schemas.ts // Zod schemas + exported inferred types
a3.ts // A3 class, A3ValidationError, A3ParseError
io.ts // readJSON / writeJSON (node:fs/promises)
index.ts // public exports
tests/
normalize.test.ts
schemas.test.ts
a3.test.ts
roundtrip.test.tsWire Format
Strict canonical format. Unknown keys are rejected at the top level and in
annotation families. The type field is always present in output (defaults to "").
{
"sequence": "MAEPRQ...",
"annotations": {
"site": {
"Disease_associated_variant": { "index": [4, 5, 14], "type": "" },
"catalyticResidues": { "index": [57, 102], "type": "activeSite" }
},
"region": {
"KXGS": { "index": [[259, 262], [290, 293]], "type": "" }
},
"ptm": {
"Phosphorylation": { "index": [17, 18, 29], "type": "" }
},
"processing": {},
"variant": [
{ "position": 301, "from": "P", "to": "L" }
]
},
"metadata": {
"uniprot_id": "P10636",
"description": "Microtubule-associated protein tau",
"reference": "",
"organism": "Homo sapiens"
}
}Validation
Stage 1 — Structural (Zod schemas)
sequence: non-empty,[A-Za-z*]+, uppercased- Positions: positive integers, sorted, deduplicated
- Ranges:
start <= end, sorted, overlaps merged - Annotation entries: must be
{ index, type }objects — bare arrays rejected - Annotation family keys: non-empty strings
- Unknown annotation families: rejected (
.strict()) - Variant fields: JSON-compatible
- Metadata fields: strings; unknown keys rejected (
.strict()) - Unknown top-level keys: rejected (
.strict())
Stage 2 — Contextual (.superRefine)
Runs on the fully normalized data (after all transforms):
- All site / ptm / processing positions satisfy
1 <= pos <= sequence.length - All region / ptm / processing range endpoints satisfy the same
- All variant positions satisfy the same
- Error paths include the full field path for precise error messages