rtemis A3 Logo
Specification

A3 Specification

Amino Acid Annotation (A3) — language-agnostic specification

Purpose

A3 is a structured format for annotating amino acid sequences with site, region, post-translational modification, processing, and variant information, alongside sequence metadata. It is designed for:

  • Exchange between analysis tools and visualization applications
  • Long-term storage of curated protein annotation data
  • 100% round-trip fidelity through JSON serialization

Wire Format

JSON is the canonical serialization format. TOML is a secondary target for human-authoring workflows.

The canonical file extension for A3 JSON files is .a3.json.

All five annotation families are always present in serialized output, even when empty. The type field is always present on annotation entries (empty string when unset).

Example — MAPT (P10636)

Data Model

A3
 ├── sequence:    string
 ├── annotations:
 │   ├── site:       map<string, { index: integer[],           type: string }>
 │   ├── region:     map<string, { index: [integer,integer][], type: string }>
 │   ├── ptm:        map<string, { index: integer[] | [integer,integer][], type: string }>
 │   ├── processing: map<string, { index: integer[] | [integer,integer][], type: string }>
 │   └── variant:    list of { position: integer, [key: string]: any }
 └── metadata:
     ├── uniprot_id:  string
     ├── description: string
     ├── reference:   string
     └── organism:    string

Field Definitions

sequence

  • Non-empty string; minimum 2 characters
  • Characters: [A-Z*] — standard IUPAC amino acid codes plus * (stop codon)

position (integer[])

An ordered collection of 1-based residue positions.

  • All values are positive integers (≥ 1)

range ([integer, integer][])

An ordered collection of inclusive [start, end] range pairs.

  • All values are positive integers (≥ 1)
  • Each pair: start < end (strict — degenerate single-position ranges are not permitted; use a position-indexed family instead)
  • No two ranges may overlap: ranges [a, b] and [c, d] (where c > a) overlap when c ≤ b. Adjacent ranges (c = b + 1) are permitted.

Annotation families

Five fixed families — no others are permitted:

FamilyIndex typeSemantics
sitepositions onlyIndividual residues of interest
regionranges onlyContiguous spans
ptmpositions or rangesPost-translational modifications
processingpositions or rangesSignal peptides, cleavage, maturation
variant(see below)Sequence variants

Each entry within site, region, ptm, and processing is a named object with two fields:

  • index — positions or ranges (as defined above)
  • type — string label; optional on input, always present in output (default "")

Annotation names (map keys) are non-empty strings. No constraint on characters beyond that.

Bare index arrays (without the { index, type } wrapper) are not permitted. The canonical object form is the only accepted input.

variant

An ordered list (not a named map) of variant records. Each record:

  • position: required, 1-based positive integer
  • All other fields: optional, must be JSON-compatible (no functions, symbols, class instances, or undefined)

metadata

Four string fields, all optional (default ""):

  • uniprot_id — UniProt accession
  • description — human-readable protein description
  • reference — citation or URL
  • organism — species name

Unknown metadata fields are not permitted. Unknown top-level keys are not permitted.

On this page