NoETL Runtime Results — Storage & Access
This document defines how NoETL stores and retrieves task outcomes and step results in a reference-first, event-sourced system.
Aligned with current DSL:
- No
sinktool kind: storage is just tools that write data and return references. - No
retry/pagination/caseblocks: retries + pagination are handled by task policies (task.spec.policy.rules) and iteration scope (iter.*). - No
eval:/expr:: usewhenin policies. - Events and context pass references only (inline only for size-capped preview/extracted fields).
See also:
1) Goals
- Event-sourced correctness: the event log is the source of truth for what happened.
- Efficiency: large payloads MUST NOT bloat the event log.
- Composable access: downstream steps MUST be able to reference:
- latest outcome
- per-attempt outcomes (retry)
- per-page outcomes (pagination streams)
- per-iteration outcomes (loop)
- combined outcomes via manifests (streamable)
- Pluggable storage: store bodies in:
- Postgres
- NATS KV
- NATS Object Store
- Google Cloud Storage (GCS) and keep only references + extracted fields in events/context.
- Streaming-friendly: represent combined results as manifests, not huge merged arrays.
2) Concepts
2.1 Task outcome vs step result
- Task outcome: the final result of a single task invocation in a step pipeline (e.g., one HTTP call).
- Step result: logical outcome of a step; may be comprised of many task outcomes due to:
- retries (multiple attempts)
- pagination (multiple pages)
- loops (multiple iterations)
current DSL produces step boundaries via step.done / step.failed events.
2.2 Reference-first output rule
A task’s output body is either:
- inline (only if small and within caps), OR
- externalized to a backend and returned as a ResultRef.
The event log stores:
- the outcome envelope (
status,error,meta) - ResultRef + extracted fields + preview
3) ResultRef
A ResultRef is a lightweight pointer to externally stored data.
3.1 Standard shape
{
"kind": "result_ref",
"ref": "noetl://execution/<eid>/step/<step>/task/<task>/run/<task_run_id>/attempt/<n>",
"store": "nats_kv|nats_object|gcs|postgres",
"scope": "step|execution|workflow|permanent",
"expires_at": "2026-02-01T13:00:00Z",
"meta": {
"content_type": "application/json",
"bytes": 123456,
"sha256": "...",
"compression": "gzip"
},
"extracted": {
"page": 2,
"has_more": true
},
"preview": {
"truncated": true,
"bytes": 2048,
"sample": [{"id": 1}]
}
}
3.2 Logical vs physical addressing
refis a logical NoETL URI.- Physical location is derived from
store+meta(bucket/key/object/table range), and/or a control-plane mapping.
4) Storage tiers and recommended usage
4.1 Inline (small)
Store directly in the event payload for small results.
- controlled by
inline_max_bytes
Use for:
- small status objects
- small aggregates
- extracted fields
4.2 Postgres (queryable)
Use for:
- queryable intermediate tables
- projections / indices
- moderate-sized JSON stored in tables (when useful)
Recommended ResultRef meta:
{ "schema": "...", "table": "...", "pk": "...", "range": "id:100-150" }
4.3 NATS KV (small, fast)
Use for:
- cursors
- session-like state
- small JSON parts (within practical limits)
Recommended ResultRef meta:
{ "bucket": "...", "key": "execution/<eid>/..." }
4.4 NATS Object Store (medium artifacts)
Use for:
- paginated pages
- chunked streaming parts
- medium artifacts
Recommended ResultRef meta:
{ "bucket": "...", "key": ".../page_001.json.gz" }
4.5 Google Cloud Storage (large, durable)
Use for:
- large payloads
- durable datasets
- cross-system distribution
Recommended ResultRef meta:
{ "bucket": "...", "object": ".../payload.json.gz" }
5) DSL controls for result storage
Result storage controls live under task.spec.result.
- fetch_page:
kind: http
method: GET
url: "{{ workload.api_url }}/items"
spec:
result:
inline_max_bytes: 65536
preview_max_bytes: 2048
store:
kind: auto # auto|nats_kv|nats_object|gcs|postgres
scope: execution # step|execution|workflow|permanent
ttl: "1h"
compression: gzip
select: # extracted fields for routing/state
- path: "$.paging.hasMore"
as: has_more
- path: "$.paging.page"
as: page
5.1 Extracted fields
select extracts small values into ResultRef.extracted.
These are safe to pass in context and to use in routing decisions without resolving the full body.
6) Indexing (how to address pieces)
To retrieve “pieces” deterministically, every task completion SHOULD record correlation keys:
execution_idstep_name,step_run_idtask_label,task_run_idattempt(retry attempt number, starting at 1)iteration/iteration_id(loop)page(pagination)
This enables queries like:
- final successful attempt for iteration 2 / page 3
- all pages for iteration 7
- latest output for task
transformin stepfetch_transform_store
7) Manifests (aggregation without bloat)
7.1 Why manifests
Do not materialize huge merged arrays in memory/events. Use manifest refs listing part refs.
{
"kind": "manifest",
"strategy": "append",
"merge_path": "$.data.items",
"parts": [
{"ref": "noetl://.../page/1/..."},
{"ref": "noetl://.../page/2/..."}
]
}
7.2 Where manifests live
A manifest itself is stored reference-first:
- inline if tiny
- else NATS Object Store / GCS / Postgres
The step boundary event stores only a reference to the manifest.
8) How downstream steps access results (reference-only)
Standard rule: downstream steps receive ResultRef + extracted fields, not full payload bodies.
Recommended bindings for a task label fetch_page:
fetch_page.__ref__→ ResultReffetch_page.page,fetch_page.has_more→ extracted fieldsfetch_page.__preview__→ optional preview
To read the full body, the playbook MUST explicitly resolve the ref:
- via server API (resolve endpoint), or
- via a tool (
kind: artifact/result,action: get), or - by scanning the artifact directly (DuckDB, etc.), depending on backend.
9) Implementation strategy (event-sourced & efficient)
9.1 Worker responsibilities
On each task completion:
- compute
outcome(status/error/meta + raw result) - apply
task.spec.result:- create preview (optional)
- extract selected fields (optional)
- choose store backend (auto/explicit)
- store full body if needed
- emit
task.attempt.*+task.done/failedevents containing:outcome(size-capped)- ResultRef (if externalized) and extracted fields
- correlation keys
9.2 Server responsibilities
On event ingestion:
- persist events
- upsert projections (rebuildable):
noetl.result_index (recommended)
- execution_id, step_name, task_label
- step_run_id, task_run_id
- iteration, page, attempt
- result_ref (json)
- status, created_at
noetl.step_state (recommended)
- execution_id, step_name
- last_result_ref
- aggregate_result_ref (manifest)
- status
These projections prevent scanning the full event stream for common reads.
10) Pagination + retry + loop (Standard mental model)
A single step with loop + pagination + retry yields a lattice of pieces indexed by:
- iteration (outer loop: endpoint/city/hotel)
- page (pagination stream inside that iteration)
- attempt (retry per page)
Each externally stored page becomes a ResultRef part, and the entire stream is represented as a manifest ref.
11) Quantum orchestration note
Quantum tools often return large measurement datasets. Standard pattern:
- tool returns a ResultRef (NATS Object Store or GCS)
- extracted fields store small metadata (job id, backend, shots, cost)
- manifests represent iterative algorithms (VQE/QAOA) or shot batches
This preserves a complete experiment trace keyed by execution_id without embedding large payloads in events.