Result Storage & References — current DSL
This document merges and supersedes the older result/TempRef docs and updates them to the latest current DSL rules:
- No
sinkconcept: “sink” is a pattern, not a tool kind. - No
eval:/expr:. Outcome handling usesspec.policy.ruleswithwhen. - Events and step context pass references only (inline only for small previews / extracted fields).
- Storage backends: NATS KV, NATS Object Store, Google Cloud Storage, Postgres.
See also:
- current DSL runtime results model:
documentation/docs/reference/dsl/runtime_results.md - Legacy v2 doc (pre-policy DSL):
documentation/docs/reference/result_storage_v2.md
0) Goals
- Event-sourced correctness: event log remains the source of truth for what happened.
- Efficiency: large payloads MUST NOT bloat the event log.
- Composable access: downstream steps can reference:
- latest result
- per-attempt (retry) results
- per-page (pagination) results
- per-iteration (loop) results
- combined results via manifests (streamable)
- Pluggable storage: store results in Postgres / NATS KV / NATS Object Store / GCS, and keep only refs in events.
- Streaming-friendly: combine via manifests; avoid giant merged arrays.
1) Concepts
1.1 Tool output vs step result
- Tool output: output of a single task invocation (HTTP call, DB command, Python run).
- Step result: the logical outcome of a step; may include multiple tool outputs from pagination/loop/retry.
1.2 Reference-first rule
A task output is either:
- inline (only if size is less than or equal to the cap), OR
- stored externally with a ResultRef (preferred).
The event log stores: metadata + ResultRef + extracted fields + preview.
2) ResultRef (Standard pointer)
2.1 Structure
{
"kind": "result_ref",
"ref": "noetl://execution/123/step/fetch/task/fetch_page/run/abc123",
"store": "nats_kv|nats_object|gcs|postgres",
"scope": "step|execution|workflow|permanent",
"expires_at": "2026-02-01T13:00:00Z",
"meta": {
"content_type": "application/json",
"bytes": 52480,
"sha256": "abc123...",
"compression": "gzip"
},
"extracted": {
"has_more": true,
"page": 2,
"next_cursor": "c_123"
},
"preview": {
"truncated": true,
"bytes": 1024,
"sample": [{"id": 1}, {"id": 2}]
}
}
2.2 Logical vs physical addressing
refis a logical URI (noetl://...) stable across backends.- Physical details (bucket/key/object/table range) live in
metaand/or a server mapping.
3) Storage backends (Standard)
3.1 NATS KV (small)
Use for:
- cursors, tokens, small JSON
- small page payloads (within practical limits)
Recommended ResultRef meta:
meta.bucket,meta.key
3.2 NATS Object Store (medium)
Use for:
- medium artifacts (page payloads, streamed chunks)
Recommended ResultRef meta:
meta.bucket,meta.key, optionalmeta.etag
3.3 Google Cloud Storage (large / durable)
Use for:
- large payloads
- longer retention
- cross-system access
Recommended ResultRef meta:
meta.bucket,meta.object(ormeta.uri)
3.4 Postgres (queryable)
Use for:
- queryable intermediate tables (facts, audit rows)
- projections and indices for refs
Recommended ResultRef meta:
meta.schema,meta.table,meta.range(ormeta.pk)
standard: Postgres is both event store/projections and optionally a result store (tables).
4) Auto-selection (recommended)
A runtime SHOULD support store.kind: auto with size-aware selection:
<= inline_max_bytes→ inline<= kv_max_bytes→ NATS KV<= object_max_bytes→ NATS Object Store- else → GCS (or Postgres table when queryability is required)
Thresholds are runtime config, but the tier model is stable.
5) DSL configuration
5.1 Where config lives
Result storage config is per-task under task.spec.result.
- fetch_page:
kind: http
method: GET
url: "{{ workload.api_url }}/items"
spec:
result:
inline_max_bytes: 65536
preview_max_bytes: 2048
store:
kind: auto # auto|nats_kv|nats_object|gcs|postgres
scope: execution # step|execution|workflow|permanent
ttl: "1h"
compression: gzip
select:
- path: "$.paging.hasMore"
as: has_more
- path: "$.paging.page"
as: page
- path: "$.paging.nextCursor"
as: next_cursor
5.2 Extracted fields
select fields become ResultRef.extracted.*.
They are small, safe for:
- routing decisions
- pagination state
- UI/observability
6) Manifests (combined result without bloat)
6.1 Manifest object
{
"kind": "manifest",
"strategy": "append|concat|merge|replace",
"merge_path": "$.data.items",
"parts": [
{ "ref": "noetl://.../page/1/..." },
{ "ref": "noetl://.../page/2/..." }
],
"total_parts": 2,
"total_bytes": 30270
}
6.2 Storage
Manifests are stored reference-first like any other result:
- inline if tiny
- else NATS Object Store / GCS / Postgres
7) Correlation keys (MUST for indexing)
Each task.done event SHOULD include when applicable:
iteration,iteration_id(loop)page(pagination)attempt(retry)step_run_id,task_run_id
This supports retrieval patterns:
- page 3 of iteration 7
- last successful attempt for page 2
- all parts for step X
8) Passing results to next steps (reference-only)
Standard rule: server binds only extracted + refs, not full bodies.
Recommended binding for task label fetch_page:
fetch_page.<field>for extracted fieldsfetch_page.__ref__for ResultRef- optional
fetch_page.__preview__for small UI/debug sample
Downstream steps:
- use extracted fields directly
- explicitly resolve full payload if needed
9) Resolving refs (explicit)
Full payload resolution MUST be explicit:
- server API:
GET /results/resolve?ref=... - or tool:
kind: artifact/kind: resultwithaction: get
Example:
- load_full_page:
kind: artifact
action: get
args:
ref: "{{ fetch_page.__ref__ }}"
10) Standard pagination + streaming pattern (single worker logical thread)
Use loop for outer fan-out (endpoints/cities/hotels), but inside each iteration run a sequential stream:
- fetch page
- transform
- route store by response code (200 vs 404 etc.)
- decide continue (jump fetch) or finish (break)
All control is via task policy: spec.policy.rules.
11) Implementation requirements (what to build)
11.1 Worker-side (data plane)
Implement a result handler that runs after each task:
- stable serialize (JSON)
- apply
task.spec.result:- preview (optional)
- select/extract (optional)
- choose store backend
- write payload externally (unless inline)
- return final
outcomewith:outcome.resultcontaining small fieldsoutcome.result.__ref__containing ResultRef (if externalized)
Emit task.done with:
outcome.status,outcome.error,outcome.meta- inline small result or ResultRef (+ extracted + preview)
- correlation keys
11.2 Server-side (control plane)
On event ingest:
- persist event
- upsert projections:
result_indexkeyed by execution/step/task/iteration/page/attemptstep_statekeyed by execution/step with latest refs and status
Context binding:
- inject extracted fields and ResultRefs only
12) Garbage collection
Use scope + ttl:
step→ cleanup at step finalizeexecution→ cleanup at execution finalizeworkflow→ cleanup at workflow finalizepermanent→ manual only
Backend-specific deletes:
- NATS KV: delete key
- NATS Object Store: delete object
- GCS: delete object
- Postgres: delete rows (or partition retention)
13) Terminology: TempRef vs ResultRef
Legacy docs used TempRef (kind: temp_ref) for step-to-step pointers.
current DSL uses ResultRef (kind: result_ref) everywhere.
If reading legacy:
- treat
temp_refas alias ofresult_refat resolution time - emit only
result_refgoing forward