rtemis A3 Logo
Specification

R (S7)

Amino Acid Annotation (A3) format — R/S7 implementation design

Requirements

  • Thorough validation (type checks, bounds checks, etc.)
  • Serialize to JSON
  • Visualize using echarts in rtemislive-draw Next.js WASM app
  • Compatible with popular protein sequence/annotation formats without loss of information (UniProt, ClinVar, GFF, GTF, GenBank, etc.)
  • Fixed annotation categories aligned with UniProt: site, region, ptm, processing, variant
  • type is optional per annotation — not part of the schema, but serialized as "" when absent so the field is always present in JSON output (supports manual editing in the web app)

S7 Class Hierarchy

A3Index (abstract base — pure geometry, no type)
  ├── A3Position(data: integer())
  └── A3Range(data: integer matrix N x 2, colnames = c("start", "end"))

A3Sequence(data: character(1))

A3Feature(type: character(1) = "")
  ├── A3Site(index: A3Position)
  ├── A3Region(index: A3Range)
  ├── A3PTM(index: A3Index)
  └── A3Processing(index: A3Index)

A3Variant(position: A3Position, info: named list)

A3Annotation
  site:       named list of A3Site
  region:     named list of A3Region
  ptm:        named list of A3PTM
  processing: named list of A3Processing
  variant:    list of A3Variant

Metadata (abstract base)
  └── A3Metadata
        uniprot_id:  character(1), default ""
        description: character(1), default ""
        reference:   character(1), default ""
        organism:    character(1), default ""

A3
 ├── sequence:    A3Sequence
 ├── annotations: A3Annotation
 └── metadata:    A3Metadata

Class Details

A3Sequence

Wraps character(1) with its own validator.

Validation (stage 1):

  • Non-empty string
  • Uppercase
  • Characters in [A-Z*] only

A3Index

Abstract base class for sequence index types. Provides a common type for A3PTM and A3Processing, which accept either positions or ranges. Pure geometry — carries data only, no type (type lives on A3Feature).

A3Position

Sorted unique 1-based positive integer positions. Wraps integer().

Validation (stage 1):

  • Elements are positive integers
  • Sorted ascending
  • No duplicates

A3Range

N x 2 integer matrix of inclusive [start, end] range pairs. Column names: "start", "end".

Structurally parallel to A3Position — both are collection types wrapping native R vectorized data structures.

Validation (stage 1):

  • All values are positive integers
  • Each row: start < end
  • Rows sorted by start, then end

Empty: matrix(integer(), ncol = 2).

A3Feature

Abstract base class for annotation feature types (A3Site, A3Region, A3PTM, A3Processing). Named after the standard bioinformatics term used by UniProt, GFF, GTF, and GenBank.

Properties:

  • type: character(1), default "". Always serialized in JSON output.

A3Site

Point annotation feature. Index is A3Position.

A3Region

Range annotation feature. Index is A3Range.

Sites and regions are visualized differently: sites show individual circles at each residue position; regions show contiguous bands spanning [start, end].

A3PTM

Post-translational modification feature. Index is A3Index (either A3Position or A3Range, never mixed within one entry).

A3Processing

Sequence processing/maturation feature (e.g. signal peptides, cleavage sites, mature chains). Index is A3Index (either A3Position or A3Range, never mixed within one entry).

A3Variant

Variant record with required 1-based position (A3Position) and open JSON-compatible info (named list).

A3Annotation

Container for the five annotation families. Each named annotation family is a named list keyed by annotation name:

site:       named list of A3Site
region:     named list of A3Region
ptm:        named list of A3PTM
processing: named list of A3Processing
variant:    list of A3Variant (ordered, not named)

Metadata / A3Metadata

Metadata is an abstract base class. A3Metadata inherits from it with fields specific to amino acid annotations. Other data types will have their own Metadata subtypes. A generic Metadata viewer is planned for the web app.

All metadata fields are character(1) with default "".

A3

Top-level class. Construction validates and normalizes all input.

Wire Format

Annotation families use a named map. Each entry value is an object with index and type fields — canonical form only. Bare arrays are rejected. type is always present in output (empty string when unset):

{
  "sequence": "MAEPRQ...",
  "annotations": {
    "site": {
      "Disease_associated_variant": {"index": [4, 5, 14], "type": ""},
      "catalyticResidues": {"index": [57, 102], "type": "activeSite"}
    },
    "region": {
      "KXGS": {"index": [[259, 262], [290, 293]], "type": ""}
    },
    "ptm": {
      "Phosphorylation": {"index": [17, 18, 29], "type": ""}
    },
    "processing": {},
    "variant": [
      {"position": 301, "from": "P", "to": "L"}
    ]
  },
  "metadata": {
    "uniprot_id": "P10636",
    "description": "Microtubule-associated protein tau",
    "reference": "",
    "organism": "Homo sapiens"
  }
}

Serialization

to_json(x) — S7 generic

Converts an A3 object to a canonical JSON string. Uses jsonlite::unbox() for scalar fields and preserves arrays for index data.

A3from_json(x) — function

Accepts a JSON string or pre-parsed named list and returns an A3 object. Each annotation entry must be an object with index and type fields; bare arrays are rejected.

write_A3json / read_A3json

File I/O wrappers around to_json/from_json.

Validation

Two-stage validation.

Stage 1 — Structural (each class validates itself)

  • A3Position: elements are integers, positive, sorted ascending, unique
  • A3Range: integer matrix, positive values, each row start < end, rows sorted
  • A3Sequence: non-empty, uppercase, characters in [A-Z*]
  • A3Feature subtypes: type is character(1)
  • A3Variant: position is a positive integer scalar
  • A3Annotations: each family contains only its allowed feature types; annotation names are non-empty strings

Stage 2 — Contextual (A3 validates the whole)

  • All positions satisfy 1 <= pos <= nchar(sequence)
  • All range endpoints satisfy the same
  • Sequence is valid (handled by A3Sequence in stage 1)

Internal classes are not exported. All user-facing construction goes through create_A3() / A3from_json(), which runs both stages.

On this page