Quickstart
Two steps to your first authenticated API call. This page assumes you already have an Orkidata account — sign-up happens through the web UI, not the API.
1. Mint a Bearer token
Tokens are minted from Profile → API tokens in the web UI.
Click Generate, copy the token (it's shown once — store it
somewhere safe), and use it as your Authorization header
for every API request.
2. First authenticated request
List the workflows in your account:
curl -H "Authorization: Bearer $ORKIDATA_TOKEN" \
https://orkidata.com/api/workflows
Response — empty account
{
"confirmEmpty": true,
"items": []
}
Response — populated account
{
"items": [
{
"id": "id_1777515532786_jwshcxvic",
"name": "Daily sales report",
"type": "workflow",
"createdAt": "2026-04-30T02:18:52Z",
"showInSidebar": false
},
{
"id": "fold_customer_pipelines",
"name": "Customer pipelines",
"type": "folder"
}
]
}
The tree may also include a top-level confirmEmpty field
— that's a write-time safety flag (see tutorial step 3) the server
persists from your last save; it's not meaningful in read responses.
From here, jump to the tutorial for an end-to-end
walkthrough that creates a workflow, runs it, and downloads its export
— entirely via curl.
Authentication
The public API uses Bearer-token authentication only. Every request must
carry an Authorization: Bearer <token> header.
Token lifecycle
| Operation | Where | Notes |
|---|---|---|
| Mint | Profile → API tokens (UI) or POST /api/auth/token |
The plaintext token is shown ONCE. Store it in a secret manager. |
| Revoke | Profile → API tokens (UI) or DELETE /api/auth/token |
Revocation is immediate. The next request with that token returns 401. |
| Rotate | Mint a new one | Re-minting auto-revokes the previous active token. |
Each request counts toward your account's RPM quota — see the rate limits table.
Rate limits & tiers
Every account has per-minute, per-hour, and concurrency budgets sized to
its tier. GET /api/usage returns your current consumption
and the active limits at any time.
| Tier | RPM | Exec/hr | Concurrent | Max data | Max exec sec | Emails/day |
|---|---|---|---|---|---|---|
| Free | 30 | 10 | 1 | 10 MB | 60 | 0 |
| Entry | 120 | 60 | 5 | 500 MB | 300 | 500 |
| Pro | 600 | 300 | 20 | 5 GB | 870 | 5,000 |
Inspect your current usage
curl -H "Authorization: Bearer $ORKIDATA_TOKEN" \
https://orkidata.com/api/usage
Response (free tier, idle)
{
"tier": "free",
"current": {
"concurrent_active": 0,
"exec_hour_used": 0,
"rpm_used": 2
},
"limits": {
"rpm": 30,
"exec_per_hour": 10,
"concurrent": 1,
"max_data_mb": 10,
"max_exec_seconds": 60,
"max_emails_per_day": 0
}
}
Error responses
Every non-2xx response is JSON with at minimum an error field:
{ "error": "Human-readable description", "retry_after": 60 }
The retry_after field appears on rate-limit responses (429).
| Status | Meaning |
|---|---|
| 400 | Validation error in the request body or query. |
| 401 | Missing, invalid, or expired Bearer token. |
| 403 | Authenticated, but not authorized for this resource. |
| 404 | Resource doesn't exist (or you can't see it — same code on purpose). |
| 409 | Conflict — for example, re-adding a previously unsubscribed email recipient. |
| 429 | Rate limit hit. retry_after indicates seconds until next attempt. |
| 500 | Unhandled server error. Please report at support@orkidata.com. |
Example: missing token
curl -i https://orkidata.com/api/workflows
Response
HTTP/1.1 401 Unauthorized
Content-Type: application/json
{"error":"Authentication required"}
Tutorial: build & introspect a complete pipeline via API
A nine-step end-to-end walkthrough — build a workflow, run it, pull
per-step results, download the export, and sample everything else
your account exposes. Every command and response below was captured
from a real Free-tier account on https://dev.orkidata.com.
Set your token once for the rest of the tutorial:
export ORKIDATA_TOKEN='etl_…your token…'
export H="Authorization: Bearer $ORKIDATA_TOKEN"
export BASE='https://orkidata.com'
1. List what's in your account
curl -s -H "$H" "$BASE/api/workflows"
Response — fresh account
{ "items": [] }
A populated tree returns {"items":[…]} with each
workflow / folder entry. The response may also carry a
confirmEmpty flag persisted from your last write
— see step 3 for what it does on the write side.
2. Upload a tiny CSV (data for the workflow)
Uploads use a 2-step presigned-URL pattern (it bypasses the API Gateway 10 MB body limit). First, ask for the URL:
curl -s -H "$H" \
"$BASE/api/data-storage/upload-url?filename=orders.csv&contentType=text/csv"
Response
{
"bucket": "etl-platform-prod-…",
"expiresIn": 300,
"key": "data-storage/<user-id>/1777513758_orders.csv",
"uploadUrl": "https://…s3.amazonaws.com/…?AWSAccessKeyId=…&Signature=…&Expires=…"
}
Then PUT the file bytes to that signed URL, and confirm the upload:
curl -X PUT -H "Content-Type: text/csv" \
--data-binary @orders.csv \
"<uploadUrl from above>"
curl -s -X POST -H "$H" -H 'Content-Type: application/json' \
-d '{"name":"orders.csv","s3Key":"data-storage/<user-id>/…orders.csv","format":"csv","mimeType":"text/csv"}' \
"$BASE/api/data-storage/confirm-upload"
Response
{
"status": "success",
"file": {
"id": "491ec56f-0e2d-41b4-8f8c-db530fdaeea8",
"type": "file",
"name": "orders.csv",
"format": "csv",
"mimeType": "text/csv",
"rowCount": 10,
"columnCount": 5,
"s3Key": "data-storage/<user-id>/…orders.csv",
"uploadedAt": "2026-04-30T01:49:54Z"
}
}
Hold on to file.id — the workflow's load_data step references it.
3. Add the workflow to your tree
confirmEmpty: true — partial trees do not
have that guard.
curl -s -X POST -H "$H" -H 'Content-Type: application/json' \
-d '{"items":[{"id":"docs-tutorial","name":"docs-tutorial","type":"workflow","parentId":null}]}' \
"$BASE/api/workflows"
Response
{ "success": true }
4. Save the step graph
Three steps — load_data reads the file you just uploaded,
filter_rows keeps the paid orders, and export_file
writes the result to CSV.
curl -s -X POST -H "$H" -H 'Content-Type: application/json' \
-d '{
"steps": [
{
"id": "load",
"type": "load_data",
"name": "Load orders",
"config": {
"sources": [
{ "fileId": "491ec56f-…", "outputVariable": "orders" }
]
}
},
{
"id": "filter",
"type": "filter_rows",
"name": "Paid only",
"config": {
"source": "orders",
"outputVariable": "paidOrders",
"conditions": [
{ "field": "status", "operator": "==", "value": "paid" }
]
}
},
{
"id": "export",
"type": "export_file",
"name": "Export paid orders",
"config": {
"source": "paidOrders",
"delimiter": "comma",
"filename": "paid-orders.csv"
}
}
]
}' \
"$BASE/api/workflow/docs-tutorial/definition"
Response
{ "success": true }
5. Inspect the saved definition
The same definition you just stored, now read back:
curl -s -H "$H" "$BASE/api/workflow/docs-tutorial/definition"
Response
{
"_permission": "owner",
"steps": [
{ "id": "load", "type": "load_data", "name": "Load orders",
"config": { "sources": [ { "fileId": "491ec56f-…", "outputVariable": "orders" } ] } },
{ "id": "filter", "type": "filter_rows", "name": "Paid only",
"config": { "source": "orders", "outputVariable": "paidOrders",
"conditions": [ { "field": "status", "operator": "==", "value": "paid" } ] } },
{ "id": "export", "type": "export_file", "name": "Export paid orders",
"config": { "source": "paidOrders", "delimiter": "comma", "filename": "paid-orders.csv" } }
]
}
6. Run it
Tier 1 (≤10 MB estimated peak) returns the full result inline. Tier 2
or 3 returns {"async": true, "runId": "…"};
poll /api/execution/{runId}/status until the status is
completed or failed. See section 8 below.
curl -s -X POST -H "$H" -H 'Content-Type: application/json' \
-d '{"input":{},"source":"api"}' \
"$BASE/api/workflow/docs-tutorial/execute"
Response (Tier 1, sync)
{
"workflow_id": "docs-tutorial",
"success": true,
"started_at": "2026-04-30T01:50:54Z",
"completed_at": "2026-04-30T01:50:54Z",
"duration_seconds": 0.000109,
"step_count": 3,
"steps": [
{
"step_id": "load", "step_type": "load_data", "success": true,
"message": "Loaded 10 row(s) from orders.csv into 'orders'",
"output": {
"totalRows": 10,
"sources": [
{ "fileName": "orders.csv", "rowCount": 10, "columnCount": 5,
"columns": ["order_id","customer_email","status","amount_usd","created_at"],
"preview": [/* first 3 rows */] }
]
}
},
{
"step_id": "filter", "step_type": "filter_rows", "success": true,
"message": "Filtered 10 → 6 rows (4 removed)",
"output": {
"originalCount": 10, "filteredCount": 6, "removedCount": 4,
"logic": "AND",
"conditions": ["status = paid"],
"preview": [/* first 3 matching rows */]
}
},
{
"step_id": "export", "step_type": "export_file", "success": true,
"message": "Export ready: 6 rows, 5 columns (comma)",
"output": {
"filename": "paid-orders.csv",
"delimiter": "comma",
"rowCount": 6,
"columnCount": 5,
"exportReady": true
}
}
]
}
7. Inspect run history (per-step results)
List recent runs (use ?slim=true for a lightweight payload — exports/HTML are stripped):
curl -s -H "$H" "$BASE/api/workflow/docs-tutorial/history?slim=true"
Response (slim)
{
"runs": [
{
"id": "run_1777513854059",
"runId": "run_1777513854059",
"status": "success",
"apiTriggered": true,
"startedAt": "2026-04-30T01:50:54Z",
"completedAt": "2026-04-30T01:50:54Z",
"duration": 0.000109,
"stepCount": 3,
"hasExports": true,
"exportSummary": [
{ "stepId": "export", "name": "Export paid orders", "rows": 6, "delimiter": "comma" }
],
"dataMetrics": {
"totalInputRows": 10,
"totalOutputRows": 6,
"emptyResult": false
}
}
]
}
Pull the full per-step detail for a specific run:
curl -s -H "$H" "$BASE/api/workflow/docs-tutorial/history/run/run_1777513854059"
Returns the same top-level fields as the slim list plus
result.steps[] — each step's full output payload,
duration, errors, and the stepsSnapshot (the exact
config the run executed against).
8. Async variant (for larger data)
When a run's estimated memory peak exceeds 10 MB, execute
returns immediately with an async-dispatch response:
{ "async": true, "runId": "async_id_17772_a0c5ffd8", "tier": 2 }
Poll until terminal:
RUN_ID='async_id_17772_a0c5ffd8'
while true; do
STATUS=$(curl -s -H "$H" "$BASE/api/execution/$RUN_ID/status" | jq -r .status)
case "$STATUS" in
completed|failed|timeout|cancelled) break ;;
esac
sleep 3
done
echo "Final: $STATUS"
Once completed, fetch the full result via the
same /history/run/{runId} endpoint as step 7.
9. Download the export
curl -X POST -H "$H" \
-o paid-orders.csv \
"$BASE/api/workflow/docs-tutorial/history/run_1777513854059/export/export"
Tier-1 (sync) runs return the CSV bytes directly with
Content-Type: text/csv. Tier-2/3 (async) runs return
{"downloadUrl": "https://s3…","filename": "…"}
— follow the redirect or curl -O it. The presigned URL
is valid for 5 minutes.
order_id,customer_email,status,amount_usd,created_at
ord-1001,alice@example.com,paid,49.50,2026-04-21
ord-1003,carol@example.com,paid,128.75,2026-04-22
ord-1005,eve@example.com,paid,21.40,2026-04-24
ord-1006,frank@example.com,paid,87.25,2026-04-25
ord-1008,heidi@example.com,paid,9.99,2026-04-27
ord-1009,ivan@example.com,paid,210.00,2026-04-28
You can see your entire account from curl
The endpoints used above are a subset. The same Bearer token also reads:
| What | Endpoint |
|---|---|
| Current usage & limits | GET /api/usage |
| Day-by-day usage history | GET /api/usage/history |
| Subscription state | GET /api/billing/subscription |
| Plan catalog | GET /api/billing/plans |
| Data Storage tree | GET /api/data-storage |
| Connections list | GET /api/connections |
| Dashboards | GET /api/dashboards |
| Verified email recipients | GET /api/email/recipients |
| Per-workflow usage breakdown | GET /api/workflow/{id}/usage |
Every one of those is a plain curl -H "$H" away.
See the API reference below for the
full list with request/response schemas.
Step types reference
Each card shows the step's purpose, config schema, and a minimal
create example — the JSON you'd embed in steps[] when
saving a workflow definition (see tutorial step 4). To see the
runtime output of any step, run the workflow and read
result.steps[] from GET /history/run/{runId}.
Free-tier accounts cannot execute webhook or
send_email — those step types are gated to Entry+.
One step can load multiple sources in parallel. Each source becomes a context variable downstream. Pick exactly one source mode per entry: Data Storage file, DB connection + query, or SFTP connection + path.
Config
{
"sources": [
// Data Storage file
{ "fileId": "<data-storage file id>",
"outputVariable": "myData",
"maxRows": 10000 /* optional cap */ },
// DB connector (requires a connection of type postgres/mysql/etc.)
{ "connectionId": "<connection id>",
"query": "SELECT … FROM …",
"outputVariable": "myData",
"maxRows": 10000 },
// SFTP file (requires a connection of type sftp)
{ "connectionId": "<sftp connection id>",
"remotePath": "/exports/orders.csv",
"outputVariable": "myData",
"matchMode": "specific" /* or "pattern" with folderPath + pattern + aggregateMode */ }
]
}
Example
{ "id": "load", "type": "load_data", "name": "Load orders",
"config": { "sources": [
{ "fileId": "491ec56f-…", "outputVariable": "orders" }
] } }
Output: { totalRows, sourceCount, sources: [{ outputVariable, fileName, fileFormat, rowCount, columnCount, columns, preview, truncated }] }
Diagnostic step. Surfaces the resolved value of each listed variable path in the run's output. Doesn't transform data.
Config
{
"variables": ["orders", "steps.filter.output.outputRowCount"]
}
Config
{
"leftSource": "orders",
"rightSource": "customers",
"leftKey": "customer_email",
"rightKey": "email",
"joinType": "left", // inner | left | right | outer (default: left)
"rightPrefix": "customer", // optional — prefix for right-side columns to avoid collisions
"flatten": false, // optional — flatten nested right-side object into top-level keys
"outputVariable": "joined"
}
Each columns[] entry can be either a plain string (use the original name as-is) or a {original, rename} object to alias on the way through.
Config
{
"source": "orders",
"outputVariable": "trimmed",
"mode": "keep", // keep | drop
"columns": [
"order_id", // plain — keep / drop as-is
{ "original": "amount_usd", "rename": "amount" } // alias while keeping
]
}
Config
{
"source": "orders",
"outputVariable": "filtered",
"logic": "and", // and | or (default: and)
"conditions": [
{ "field": "status", "operator": "==", "value": "paid" },
{ "field": "amount_usd", "operator": ">", "value": 50 }
]
}
Operators: ==, !=, >,
<, >=, <=,
contains, not_contains,
is_null, is_not_null. Empty strings
count as null for the null checks.
Config
{
"source": "rows",
"outputVariable": "rowsSplit",
"column": "full_name",
"delimiter": " ", // any string; default is "|"
"newColumns": ["first_name", "last_name"], // required unless dynamicSplit=true
"dynamicSplit": false, // if true, auto-detect max parts → column_1, column_2, …
"removeOriginal": true // drop the source column from output rows (default: true)
}
Map one or more variables into the named keys you want returned. Useful when you want a tidy result object instead of dumping the entire context.
Config
{
"outputs": [
{ "name": "orders", "source": "paidOrders" },
{ "name": "summary", "source": "steps.filter.output" }
],
"dataOnly": false // optional — when true, the run response is just the data dict
}
The step itself doesn't write a file — it records metadata (delimiter, columns, row count). The actual file bytes are generated on demand by the download endpoint.
Config
{
"source": "paidOrders",
"delimiter": "comma", // comma | tab | semicolon | pipe
"filename": "paid-orders.csv" // free-form
}
File extension is auto-derived from the delimiter:
comma → .csv, tab → .tsv,
anything else → .txt. Download via
POST /api/workflow/{id}/history/{runId}/export/{stepId}
— see tutorial step 9.
Two systems, two match modes. system: data_storage moves entries inside your Data Storage tree (recorded as a deferred intent that applies after the workflow succeeds). system: sftp moves remote files immediately during the step. Path templates support {{variable}} placeholders that resolve from context.
Config — Data Storage, specific file
{
"system": "data_storage", // default
"matchMode": "specific", // default
"sourcePath": "inbox/orders.csv",
"destPath": "processed/{{date}}"
}
Config — Data Storage, glob pattern
{
"system": "data_storage",
"matchMode": "pattern",
"sourceFolderPath": "inbox",
"sourcePattern": "orders-*.csv",
"destPath": "processed/{{date}}"
}
Config — SFTP
{
"system": "sftp",
"connectionId": "<sftp connection id>",
"sourcePath": "/inbox/orders.csv",
"destPath": "/processed/orders-{{date}}.csv"
}
Detailed shape, retry rules, and SSRF guards in Outbound webhook step below.
Config
{ "reason": "All upstream rows already processed" }
Marks the run successful and skips remaining steps. Pairs naturally with conditional.
Config
{
"conditions": [
{ "variable": "steps.load.output.totalRows", "operator": "==", "value": 0 },
{ "variable": "orders", "operator": "has_rows", "value": null }
],
"logic": "and", // and | or (default: and)
"then_steps": [
/* step definitions to execute if conditions pass */
],
"else_steps": [
/* step definitions to execute otherwise (optional) */
]
}
Operators: ==, !=, >,
<, >=, <=,
contains, not_contains,
has_rows, is_empty,
exists, not_exists.
Config
{
"recipients": ["ops@acme.com"], // must be in your verified-recipients list (status=verified)
"subject": "Daily report — ${steps.summary.output.row_count} rows",
"body_mode": "inline_data", // plain | inline_data
"body_text": "…${path}…", // when body_mode=plain
"body_variable": "steps.summary.output.rows", // when body_mode=inline_data
"attachment": {
"kind": "variable", // none | variable | step
"source": "steps.transform.output.results",
"format": "csv", // csv | tsv | psv | ssv | json (variable kind only)
"filename": "report-${run_date}.csv",
"delivery": "auto", // auto | attach | link (auto flips to link at >5MB)
"requires_auth": false // link/auto kinds only
}
}
Recipients are managed via /api/email/recipients
with a verification flow. Daily caps per tier — see the
rate limits table. Sender's
account email must be confirmed.
v1 supports PostgreSQL via the connector framework. Single-transaction semantics with batched commits.
Config
{
"source": "rows",
"connectionId": "<db-connection id>",
"targetTable": "fact_orders",
"targetSchema": "analytics", // optional
"mode": "append", // append | upsert
"keyColumns": ["order_id"], // required when mode=upsert
"batchSize": 5000, // optional (default 5000)
"maxRows": 5000000 // optional safety cap (default 5M)
}
Column names are taken from the source rows verbatim — there's
no per-column source→target mapping. Rename columns upstream
with select_columns if you need to.
Atomic upload: payload is written to <remotePath>.tmp.<run_id>, then renamed to the final path on success. Consumers polling remotePath never see a partial file.
Config
{
"source": "rows",
"connectionId": "<sftp-connection id>",
"remotePath": "/exports/orders-${date}.csv",
"outputFormat": "csv" // csv | tsv | psv | ssv | json
}
Outbound webhook step
The webhook step takes one of your variables, serializes
it to JSON, and POSTs the bytes to a URL you control. The body is
the variable's value as-is — there is no payload templating
layer; if you want a custom shape, build it upstream and feed that
variable in.
Config
{
"id": "notify",
"type": "webhook",
"name": "Notify ops API",
"config": {
"source": "summary", // path to the variable to POST
"url": "https://ops.acme.com/orkidata-events", // https only — http:// is rejected
"method": "POST", // POST | PUT
"headers": { "X-Source": "orkidata", "Authorization": "Bearer …" },
"outputVariable": "ops_response", // optional — captures {statusCode, responseSnippet}
"maxPayloadRows": 10000 // optional — caps row-count when source is a list (default 10,000)
}
}
Free tier is gated out — webhook destinations require Entry tier or higher (egress reputation hygiene).
Authentication on your receiver
Put any auth your receiver needs (Bearer tokens, basic auth, custom
keys) into the headers object. The platform doesn't add
its own signature header; what you send is what your receiver sees.
Headers matching authorization, api-key,
token, secret are redacted from execution
history so credentials don't leak through the run UI.
SSRF guard
URLs resolving to private CIDR ranges (10/8,
172.16/12, 192.168/16, 127/8,
169.254/16, IPv6 loopback / link-local / ULA) are rejected
before any request goes out. http:// URLs are also rejected
— HTTPS only.
Retry behavior
- Tier 1 (sync, ≤10 MB) — 1 attempt, no retries (the API Gateway 29 s budget can't accommodate them).
- Tier 2/3 (async worker) — up to 3 attempts with 1 s / 2 s / 4 s exponential backoff on connection errors and 5xx responses.
- 4xx responses — terminal failure, no retry. The step fails immediately.
The per-attempt timeout is 30 seconds. Final-attempt failure fails the workflow step; the response status code and body snippet land in execution history (and in outputVariable if you set one).
API reference (interactive)
Every public endpoint above (plus the rest of the surface) lives in a dedicated full-width interactive reference — try requests inline, see exact request/response schemas, copy generated curl commands.