R (S7)

Requirements

Thorough validation (type checks, bounds checks, etc.)
Serialize to JSON
Visualize using echarts in rtemislive-draw Next.js WASM app
Compatible with popular protein sequence/annotation formats without loss of information (UniProt, ClinVar, GFF, GTF, GenBank, etc.)
Fixed annotation categories aligned with UniProt: site, region, ptm, processing, variant
type is optional per annotation — not part of the schema, but serialized as "" when absent so the field is always present in JSON output (supports manual editing in the web app)

S7 Class Hierarchy

A3Index (abstract base — pure geometry, no type)
  ├── A3Position(data: integer())
  └── A3Range(data: integer matrix N x 2, colnames = c("start", "end"))

A3Sequence(data: character(1))

A3Feature(type: character(1) = "")
  ├── A3Site(index: A3Position)
  ├── A3Region(index: A3Range)
  ├── A3PTM(index: A3Index)
  └── A3Processing(index: A3Index)

A3Variant(position: A3Position, info: named list)

A3Annotation
  site:       named list of A3Site
  region:     named list of A3Region
  ptm:        named list of A3PTM
  processing: named list of A3Processing
  variant:    list of A3Variant

Metadata (abstract base)
  └── A3Metadata
        uniprot_id:  character(1), default ""
        description: character(1), default ""
        reference:   character(1), default ""
        organism:    character(1), default ""

A3
 ├── sequence:    A3Sequence
 ├── annotations: A3Annotation
 └── metadata:    A3Metadata

Class Details

A3Sequence

Wraps character(1) with its own validator.

Validation (stage 1):

Non-empty string
Uppercase
Characters in [A-Z*] only

Abstract base class for sequence index types. Provides a common type for A3PTM and A3Processing, which accept either positions or ranges. Pure geometry — carries data only, no type (type lives on A3Feature).

A3Position

Sorted unique 1-based positive integer positions. Wraps integer().

Validation (stage 1):

Elements are positive integers
Sorted ascending
No duplicates

A3Range

N x 2 integer matrix of inclusive [start, end] range pairs. Column names: "start", "end".

Structurally parallel to A3Position — both are collection types wrapping native R vectorized data structures.

Validation (stage 1):

All values are positive integers
Each row: start < end
Rows sorted by start, then end

Empty: matrix(integer(), ncol = 2).

A3Feature

Abstract base class for annotation feature types (A3Site, A3Region, A3PTM, A3Processing). Named after the standard bioinformatics term used by UniProt, GFF, GTF, and GenBank.

Properties:

type: character(1), default "". Always serialized in JSON output.

A3Site

Point annotation feature. Index is A3Position.

A3Region

Range annotation feature. Index is A3Range.

Sites and regions are visualized differently: sites show individual circles at each residue position; regions show contiguous bands spanning [start, end].

A3PTM

Post-translational modification feature. Index is A3Index (either A3Position or A3Range, never mixed within one entry).

A3Processing

Sequence processing/maturation feature (e.g. signal peptides, cleavage sites, mature chains). Index is A3Index (either A3Position or A3Range, never mixed within one entry).

A3Variant

Variant record with required 1-based position (A3Position) and open JSON-compatible info (named list).

A3Annotation

Container for the five annotation families. Each named annotation family is a named list keyed by annotation name:

site:       named list of A3Site
region:     named list of A3Region
ptm:        named list of A3PTM
processing: named list of A3Processing
variant:    list of A3Variant (ordered, not named)

Metadata / A3Metadata

Metadata is an abstract base class. A3Metadata inherits from it with fields specific to amino acid annotations. Other data types will have their own Metadata subtypes. A generic Metadata viewer is planned for the web app.

All metadata fields are character(1) with default "".

A3

Top-level class. Construction validates and normalizes all input.

Wire Format

Annotation families use a named map. Each entry value is an object with index and type fields — canonical form only. Bare arrays are rejected. type is always present in output (empty string when unset):

{
  "sequence": "MAEPRQ...",
  "annotations": {
    "site": {
      "Disease_associated_variant": {"index": [4, 5, 14], "type": ""},
      "catalyticResidues": {"index": [57, 102], "type": "activeSite"}
    },
    "region": {
      "KXGS": {"index": [[259, 262], [290, 293]], "type": ""}
    },
    "ptm": {
      "Phosphorylation": {"index": [17, 18, 29], "type": ""}
    },
    "processing": {},
    "variant": [
      {"position": 301, "from": "P", "to": "L"}
    ]
  },
  "metadata": {
    "uniprot_id": "P10636",
    "description": "Microtubule-associated protein tau",
    "reference": "",
    "organism": "Homo sapiens"
  }
}

A3Position: elements are integers, positive, sorted ascending, unique
A3Range: integer matrix, positive values, each row start < end, rows sorted
A3Sequence: non-empty, uppercase, characters in [A-Z*]
A3Feature subtypes: type is character(1)
A3Variant: position is a positive integer scalar
A3Annotations: each family contains only its allowed feature types; annotation names are non-empty strings

Stage 2 — Contextual (A3 validates the whole)

All positions satisfy 1 <= pos <= nchar(sequence)
All range endpoints satisfy the same
Sequence is valid (handled by A3Sequence in stage 1)

Internal classes are not exported. All user-facing construction goes through create_A3() / A3from_json(), which runs both stages.

R (S7)

Requirements

S7 Class Hierarchy

Class Details

A3Sequence

A3Index

A3Position

A3Range

A3Feature

A3Site

A3Region

A3PTM

A3Processing

A3Variant

A3Annotation

Metadata / A3Metadata

A3

Wire Format

Serialization

`to_json(x)` — S7 generic

`A3from_json(x)` — function

`write_A3json` / `read_A3json`

Validation

Stage 1 — Structural (each class validates itself)

Stage 2 — Contextual (A3 validates the whole)

On this page