R (S7)
Amino Acid Annotation (A3) format — R/S7 implementation design
Requirements
- Thorough validation (type checks, bounds checks, etc.)
- Serialize to JSON
- Visualize using echarts in rtemislive-draw Next.js WASM app
- Compatible with popular protein sequence/annotation formats without loss of information (UniProt, ClinVar, GFF, GTF, GenBank, etc.)
- Fixed annotation categories aligned with UniProt:
site,region,ptm,processing,variant typeis optional per annotation — not part of the schema, but serialized as""when absent so the field is always present in JSON output (supports manual editing in the web app)
S7 Class Hierarchy
A3Index (abstract base — pure geometry, no type)
├── A3Position(data: integer())
└── A3Range(data: integer matrix N x 2, colnames = c("start", "end"))
A3Sequence(data: character(1))
A3Feature(type: character(1) = "")
├── A3Site(index: A3Position)
├── A3Region(index: A3Range)
├── A3PTM(index: A3Index)
└── A3Processing(index: A3Index)
A3Variant(position: A3Position, info: named list)
A3Annotation
site: named list of A3Site
region: named list of A3Region
ptm: named list of A3PTM
processing: named list of A3Processing
variant: list of A3Variant
Metadata (abstract base)
└── A3Metadata
uniprot_id: character(1), default ""
description: character(1), default ""
reference: character(1), default ""
organism: character(1), default ""
A3
├── sequence: A3Sequence
├── annotations: A3Annotation
└── metadata: A3MetadataClass Details
A3Sequence
Wraps character(1) with its own validator.
Validation (stage 1):
- Non-empty string
- Uppercase
- Characters in
[A-Z*]only
A3Index
Abstract base class for sequence index types. Provides a common type for
A3PTM and A3Processing, which accept either positions or ranges.
Pure geometry — carries data only, no type (type lives on A3Feature).
A3Position
Sorted unique 1-based positive integer positions. Wraps integer().
Validation (stage 1):
- Elements are positive integers
- Sorted ascending
- No duplicates
A3Range
N x 2 integer matrix of inclusive [start, end] range pairs.
Column names: "start", "end".
Structurally parallel to A3Position — both are collection types wrapping
native R vectorized data structures.
Validation (stage 1):
- All values are positive integers
- Each row:
start < end - Rows sorted by start, then end
Empty: matrix(integer(), ncol = 2).
A3Feature
Abstract base class for annotation feature types (A3Site, A3Region,
A3PTM, A3Processing). Named after the standard bioinformatics term used
by UniProt, GFF, GTF, and GenBank.
Properties:
type:character(1), default"". Always serialized in JSON output.
A3Site
Point annotation feature. Index is A3Position.
A3Region
Range annotation feature. Index is A3Range.
Sites and regions are visualized differently: sites show individual circles
at each residue position; regions show contiguous bands spanning [start, end].
A3PTM
Post-translational modification feature. Index is A3Index (either
A3Position or A3Range, never mixed within one entry).
A3Processing
Sequence processing/maturation feature (e.g. signal peptides, cleavage sites,
mature chains). Index is A3Index (either A3Position or A3Range, never
mixed within one entry).
A3Variant
Variant record with required 1-based position (A3Position) and open
JSON-compatible info (named list).
A3Annotation
Container for the five annotation families. Each named annotation family is a named list keyed by annotation name:
site: named list of A3Site
region: named list of A3Region
ptm: named list of A3PTM
processing: named list of A3Processing
variant: list of A3Variant (ordered, not named)Metadata / A3Metadata
Metadata is an abstract base class. A3Metadata inherits from it with
fields specific to amino acid annotations. Other data types will have their
own Metadata subtypes. A generic Metadata viewer is planned for the web app.
All metadata fields are character(1) with default "".
A3
Top-level class. Construction validates and normalizes all input.
Wire Format
Annotation families use a named map. Each entry value is an object with
index and type fields — canonical form only. Bare arrays are rejected.
type is always present in output (empty string when unset):
{
"sequence": "MAEPRQ...",
"annotations": {
"site": {
"Disease_associated_variant": {"index": [4, 5, 14], "type": ""},
"catalyticResidues": {"index": [57, 102], "type": "activeSite"}
},
"region": {
"KXGS": {"index": [[259, 262], [290, 293]], "type": ""}
},
"ptm": {
"Phosphorylation": {"index": [17, 18, 29], "type": ""}
},
"processing": {},
"variant": [
{"position": 301, "from": "P", "to": "L"}
]
},
"metadata": {
"uniprot_id": "P10636",
"description": "Microtubule-associated protein tau",
"reference": "",
"organism": "Homo sapiens"
}
}Serialization
to_json(x) — S7 generic
Converts an A3 object to a canonical JSON string. Uses jsonlite::unbox()
for scalar fields and preserves arrays for index data.
A3from_json(x) — function
Accepts a JSON string or pre-parsed named list and returns an A3 object.
Each annotation entry must be an object with index and type fields;
bare arrays are rejected.
write_A3json / read_A3json
File I/O wrappers around to_json/from_json.
Validation
Two-stage validation.
Stage 1 — Structural (each class validates itself)
A3Position: elements are integers, positive, sorted ascending, uniqueA3Range: integer matrix, positive values, each rowstart < end, rows sortedA3Sequence: non-empty, uppercase, characters in[A-Z*]A3Featuresubtypes:typeischaracter(1)A3Variant:positionis a positive integer scalarA3Annotations: each family contains only its allowed feature types; annotation names are non-empty strings
Stage 2 — Contextual (A3 validates the whole)
- All positions satisfy
1 <= pos <= nchar(sequence) - All range endpoints satisfy the same
- Sequence is valid (handled by
A3Sequencein stage 1)
Internal classes are not exported. All user-facing construction goes through
create_A3() / A3from_json(), which runs both stages.