Automate data transformations from your terminal.

Everything you can do programmatically with your Orkidata account is here: create workflows, define their step graph, ingest data, run synchronously or asynchronously, browse execution history, download exports, and inspect every step's output — all from curl.

Get started Open API reference

Quickstart

Two steps to your first authenticated API call. This page assumes you already have an Orkidata account — sign-up happens through the web UI, not the API.

1. Mint a Bearer token

Tokens are minted from Profile → API tokens in the web UI. Click Generate, copy the token (it's shown once — store it somewhere safe), and use it as your Authorization header for every API request.

Each account holds one active token at a time. Generating a new token revokes the previous one. If you rotate, update every system using the old token before that token's last call.

2. First authenticated request

List the workflows in your account:

curl -H "Authorization: Bearer $ORKIDATA_TOKEN" \
  https://orkidata.com/api/workflows
Response — empty account
{
  "confirmEmpty": true,
  "items": []
}
Response — populated account
{
  "items": [
    {
      "id":            "id_1777515532786_jwshcxvic",
      "name":          "Daily sales report",
      "type":          "workflow",
      "createdAt":     "2026-04-30T02:18:52Z",
      "showInSidebar": false
    },
    {
      "id":   "fold_customer_pipelines",
      "name": "Customer pipelines",
      "type": "folder"
    }
  ]
}

The tree may also include a top-level confirmEmpty field — that's a write-time safety flag (see tutorial step 3) the server persists from your last save; it's not meaningful in read responses.

From here, jump to the tutorial for an end-to-end walkthrough that creates a workflow, runs it, and downloads its export — entirely via curl.

Authentication

The public API uses Bearer-token authentication only. Every request must carry an Authorization: Bearer <token> header.

Token lifecycle

Operation Where Notes
Mint Profile → API tokens (UI) or POST /api/auth/token The plaintext token is shown ONCE. Store it in a secret manager.
Revoke Profile → API tokens (UI) or DELETE /api/auth/token Revocation is immediate. The next request with that token returns 401.
Rotate Mint a new one Re-minting auto-revokes the previous active token.

Each request counts toward your account's RPM quota — see the rate limits table.

Rate limits & tiers

Every account has per-minute, per-hour, and concurrency budgets sized to its tier. GET /api/usage returns your current consumption and the active limits at any time.

Tier RPM Exec/hr Concurrent Max data Max exec sec Emails/day
Free3010110 MB600
Entry120605500 MB300500
Pro600300205 GB8705,000

Inspect your current usage

curl -H "Authorization: Bearer $ORKIDATA_TOKEN" \
  https://orkidata.com/api/usage
Response (free tier, idle)
{
  "tier": "free",
  "current": {
    "concurrent_active": 0,
    "exec_hour_used": 0,
    "rpm_used": 2
  },
  "limits": {
    "rpm": 30,
    "exec_per_hour": 10,
    "concurrent": 1,
    "max_data_mb": 10,
    "max_exec_seconds": 60,
    "max_emails_per_day": 0
  }
}

Error responses

Every non-2xx response is JSON with at minimum an error field:

{ "error": "Human-readable description", "retry_after": 60 }

The retry_after field appears on rate-limit responses (429).

StatusMeaning
400Validation error in the request body or query.
401Missing, invalid, or expired Bearer token.
403Authenticated, but not authorized for this resource.
404Resource doesn't exist (or you can't see it — same code on purpose).
409Conflict — for example, re-adding a previously unsubscribed email recipient.
429Rate limit hit. retry_after indicates seconds until next attempt.
500Unhandled server error. Please report at support@orkidata.com.

Example: missing token

curl -i https://orkidata.com/api/workflows
Response
HTTP/1.1 401 Unauthorized
Content-Type: application/json

{"error":"Authentication required"}

Tutorial: build & introspect a complete pipeline via API

A nine-step end-to-end walkthrough — build a workflow, run it, pull per-step results, download the export, and sample everything else your account exposes. Every command and response below was captured from a real Free-tier account on https://dev.orkidata.com.

Set your token once for the rest of the tutorial:

export ORKIDATA_TOKEN='etl_…your token…'
export H="Authorization: Bearer $ORKIDATA_TOKEN"
export BASE='https://orkidata.com'

1. List what's in your account

curl -s -H "$H" "$BASE/api/workflows"
Response — fresh account
{ "items": [] }

A populated tree returns {"items":[…]} with each workflow / folder entry. The response may also carry a confirmEmpty flag persisted from your last write — see step 3 for what it does on the write side.

2. Upload a tiny CSV (data for the workflow)

Uploads use a 2-step presigned-URL pattern (it bypasses the API Gateway 10 MB body limit). First, ask for the URL:

curl -s -H "$H" \
  "$BASE/api/data-storage/upload-url?filename=orders.csv&contentType=text/csv"
Response
{
  "bucket":     "etl-platform-prod-…",
  "expiresIn":  300,
  "key":        "data-storage/<user-id>/1777513758_orders.csv",
  "uploadUrl":  "https://…s3.amazonaws.com/…?AWSAccessKeyId=…&Signature=…&Expires=…"
}

Then PUT the file bytes to that signed URL, and confirm the upload:

curl -X PUT -H "Content-Type: text/csv" \
  --data-binary @orders.csv \
  "<uploadUrl from above>"

curl -s -X POST -H "$H" -H 'Content-Type: application/json' \
  -d '{"name":"orders.csv","s3Key":"data-storage/<user-id>/…orders.csv","format":"csv","mimeType":"text/csv"}' \
  "$BASE/api/data-storage/confirm-upload"
Response
{
  "status": "success",
  "file": {
    "id":          "491ec56f-0e2d-41b4-8f8c-db530fdaeea8",
    "type":        "file",
    "name":        "orders.csv",
    "format":      "csv",
    "mimeType":    "text/csv",
    "rowCount":    10,
    "columnCount": 5,
    "s3Key":       "data-storage/<user-id>/…orders.csv",
    "uploadedAt":  "2026-04-30T01:49:54Z"
  }
}

Hold on to file.id — the workflow's load_data step references it.

3. Add the workflow to your tree

POST /api/workflows replaces the entire tree. Always GET first, append your new entry, then POST the full result. Sending a partial tree wipes anything missing. The empty-tree wipe requires confirmEmpty: true — partial trees do not have that guard.
curl -s -X POST -H "$H" -H 'Content-Type: application/json' \
  -d '{"items":[{"id":"docs-tutorial","name":"docs-tutorial","type":"workflow","parentId":null}]}' \
  "$BASE/api/workflows"
Response
{ "success": true }

4. Save the step graph

Three steps — load_data reads the file you just uploaded, filter_rows keeps the paid orders, and export_file writes the result to CSV.

curl -s -X POST -H "$H" -H 'Content-Type: application/json' \
  -d '{
    "steps": [
      {
        "id":   "load",
        "type": "load_data",
        "name": "Load orders",
        "config": {
          "sources": [
            { "fileId": "491ec56f-…", "outputVariable": "orders" }
          ]
        }
      },
      {
        "id":   "filter",
        "type": "filter_rows",
        "name": "Paid only",
        "config": {
          "source":         "orders",
          "outputVariable": "paidOrders",
          "conditions": [
            { "field": "status", "operator": "==", "value": "paid" }
          ]
        }
      },
      {
        "id":   "export",
        "type": "export_file",
        "name": "Export paid orders",
        "config": {
          "source":    "paidOrders",
          "delimiter": "comma",
          "filename":  "paid-orders.csv"
        }
      }
    ]
  }' \
  "$BASE/api/workflow/docs-tutorial/definition"
Response
{ "success": true }

5. Inspect the saved definition

The same definition you just stored, now read back:

curl -s -H "$H" "$BASE/api/workflow/docs-tutorial/definition"
Response
{
  "_permission": "owner",
  "steps": [
    { "id": "load",   "type": "load_data",    "name": "Load orders",
      "config": { "sources": [ { "fileId": "491ec56f-…", "outputVariable": "orders" } ] } },
    { "id": "filter", "type": "filter_rows",  "name": "Paid only",
      "config": { "source": "orders", "outputVariable": "paidOrders",
                  "conditions": [ { "field": "status", "operator": "==", "value": "paid" } ] } },
    { "id": "export", "type": "export_file",  "name": "Export paid orders",
      "config": { "source": "paidOrders", "delimiter": "comma", "filename": "paid-orders.csv" } }
  ]
}

6. Run it

Tier 1 (≤10 MB estimated peak) returns the full result inline. Tier 2 or 3 returns {"async": true, "runId": "…"}; poll /api/execution/{runId}/status until the status is completed or failed. See section 8 below.

curl -s -X POST -H "$H" -H 'Content-Type: application/json' \
  -d '{"input":{},"source":"api"}' \
  "$BASE/api/workflow/docs-tutorial/execute"
Response (Tier 1, sync)
{
  "workflow_id":      "docs-tutorial",
  "success":          true,
  "started_at":       "2026-04-30T01:50:54Z",
  "completed_at":     "2026-04-30T01:50:54Z",
  "duration_seconds": 0.000109,
  "step_count":       3,
  "steps": [
    {
      "step_id":   "load", "step_type": "load_data", "success": true,
      "message":   "Loaded 10 row(s) from orders.csv into 'orders'",
      "output": {
        "totalRows":   10,
        "sources": [
          { "fileName": "orders.csv", "rowCount": 10, "columnCount": 5,
            "columns":   ["order_id","customer_email","status","amount_usd","created_at"],
            "preview":   [/* first 3 rows */] }
        ]
      }
    },
    {
      "step_id":   "filter", "step_type": "filter_rows", "success": true,
      "message":   "Filtered 10 → 6 rows (4 removed)",
      "output": {
        "originalCount": 10, "filteredCount": 6, "removedCount": 4,
        "logic":         "AND",
        "conditions":    ["status = paid"],
        "preview":       [/* first 3 matching rows */]
      }
    },
    {
      "step_id":   "export", "step_type": "export_file", "success": true,
      "message":   "Export ready: 6 rows, 5 columns (comma)",
      "output": {
        "filename":    "paid-orders.csv",
        "delimiter":   "comma",
        "rowCount":    6,
        "columnCount": 5,
        "exportReady": true
      }
    }
  ]
}

7. Inspect run history (per-step results)

List recent runs (use ?slim=true for a lightweight payload — exports/HTML are stripped):

curl -s -H "$H" "$BASE/api/workflow/docs-tutorial/history?slim=true"
Response (slim)
{
  "runs": [
    {
      "id":             "run_1777513854059",
      "runId":          "run_1777513854059",
      "status":         "success",
      "apiTriggered":   true,
      "startedAt":      "2026-04-30T01:50:54Z",
      "completedAt":    "2026-04-30T01:50:54Z",
      "duration":       0.000109,
      "stepCount":      3,
      "hasExports":     true,
      "exportSummary": [
        { "stepId": "export", "name": "Export paid orders", "rows": 6, "delimiter": "comma" }
      ],
      "dataMetrics": {
        "totalInputRows":  10,
        "totalOutputRows": 6,
        "emptyResult":     false
      }
    }
  ]
}

Pull the full per-step detail for a specific run:

curl -s -H "$H" "$BASE/api/workflow/docs-tutorial/history/run/run_1777513854059"

Returns the same top-level fields as the slim list plus result.steps[] — each step's full output payload, duration, errors, and the stepsSnapshot (the exact config the run executed against).

8. Async variant (for larger data)

When a run's estimated memory peak exceeds 10 MB, execute returns immediately with an async-dispatch response:

{ "async": true, "runId": "async_id_17772_a0c5ffd8", "tier": 2 }

Poll until terminal:

RUN_ID='async_id_17772_a0c5ffd8'
while true; do
  STATUS=$(curl -s -H "$H" "$BASE/api/execution/$RUN_ID/status" | jq -r .status)
  case "$STATUS" in
    completed|failed|timeout|cancelled) break ;;
  esac
  sleep 3
done
echo "Final: $STATUS"

Once completed, fetch the full result via the same /history/run/{runId} endpoint as step 7.

9. Download the export

curl -X POST -H "$H" \
  -o paid-orders.csv \
  "$BASE/api/workflow/docs-tutorial/history/run_1777513854059/export/export"

Tier-1 (sync) runs return the CSV bytes directly with Content-Type: text/csv. Tier-2/3 (async) runs return {"downloadUrl": "https://s3…","filename": "…"} — follow the redirect or curl -O it. The presigned URL is valid for 5 minutes.

paid-orders.csv
order_id,customer_email,status,amount_usd,created_at
ord-1001,alice@example.com,paid,49.50,2026-04-21
ord-1003,carol@example.com,paid,128.75,2026-04-22
ord-1005,eve@example.com,paid,21.40,2026-04-24
ord-1006,frank@example.com,paid,87.25,2026-04-25
ord-1008,heidi@example.com,paid,9.99,2026-04-27
ord-1009,ivan@example.com,paid,210.00,2026-04-28

You can see your entire account from curl

The endpoints used above are a subset. The same Bearer token also reads:

WhatEndpoint
Current usage & limits GET /api/usage
Day-by-day usage history GET /api/usage/history
Subscription state GET /api/billing/subscription
Plan catalog GET /api/billing/plans
Data Storage tree GET /api/data-storage
Connections list GET /api/connections
Dashboards GET /api/dashboards
Verified email recipients GET /api/email/recipients
Per-workflow usage breakdown GET /api/workflow/{id}/usage

Every one of those is a plain curl -H "$H" away. See the API reference below for the full list with request/response schemas.

Step types reference

Each card shows the step's purpose, config schema, and a minimal create example — the JSON you'd embed in steps[] when saving a workflow definition (see tutorial step 4). To see the runtime output of any step, run the workflow and read result.steps[] from GET /history/run/{runId}.

Free-tier accounts cannot execute webhook or send_email — those step types are gated to Entry+.

One step can load multiple sources in parallel. Each source becomes a context variable downstream. Pick exactly one source mode per entry: Data Storage file, DB connection + query, or SFTP connection + path.

Config

{
  "sources": [
    // Data Storage file
    { "fileId":         "<data-storage file id>",
      "outputVariable": "myData",
      "maxRows":        10000  /* optional cap */ },

    // DB connector (requires a connection of type postgres/mysql/etc.)
    { "connectionId":   "<connection id>",
      "query":          "SELECT … FROM …",
      "outputVariable": "myData",
      "maxRows":        10000 },

    // SFTP file (requires a connection of type sftp)
    { "connectionId":   "<sftp connection id>",
      "remotePath":     "/exports/orders.csv",
      "outputVariable": "myData",
      "matchMode":      "specific"  /* or "pattern" with folderPath + pattern + aggregateMode */ }
  ]
}

Example

{ "id": "load", "type": "load_data", "name": "Load orders",
  "config": { "sources": [
    { "fileId": "491ec56f-…", "outputVariable": "orders" }
  ] } }

Output: { totalRows, sourceCount, sources: [{ outputVariable, fileName, fileFormat, rowCount, columnCount, columns, preview, truncated }] }

Diagnostic step. Surfaces the resolved value of each listed variable path in the run's output. Doesn't transform data.

Config

{
  "variables": ["orders", "steps.filter.output.outputRowCount"]
}

Config

{
  "leftSource":     "orders",
  "rightSource":    "customers",
  "leftKey":        "customer_email",
  "rightKey":       "email",
  "joinType":       "left",            // inner | left | right | outer  (default: left)
  "rightPrefix":    "customer",        // optional — prefix for right-side columns to avoid collisions
  "flatten":        false,             // optional — flatten nested right-side object into top-level keys
  "outputVariable": "joined"
}

Each columns[] entry can be either a plain string (use the original name as-is) or a {original, rename} object to alias on the way through.

Config

{
  "source":         "orders",
  "outputVariable": "trimmed",
  "mode":           "keep",                                  // keep | drop
  "columns": [
    "order_id",                                              // plain — keep / drop as-is
    { "original": "amount_usd", "rename": "amount" }         // alias while keeping
  ]
}

Config

{
  "source":         "orders",
  "outputVariable": "filtered",
  "logic":          "and",            // and | or  (default: and)
  "conditions": [
    { "field": "status",     "operator": "==",  "value": "paid" },
    { "field": "amount_usd", "operator": ">",   "value": 50 }
  ]
}

Operators: ==, !=, >, <, >=, <=, contains, not_contains, is_null, is_not_null. Empty strings count as null for the null checks.

Config

{
  "source":         "rows",
  "outputVariable": "rowsSplit",
  "column":         "full_name",
  "delimiter":      " ",                       // any string; default is "|"
  "newColumns":     ["first_name", "last_name"],   // required unless dynamicSplit=true
  "dynamicSplit":   false,                     // if true, auto-detect max parts → column_1, column_2, …
  "removeOriginal": true                       // drop the source column from output rows (default: true)
}

Map one or more variables into the named keys you want returned. Useful when you want a tidy result object instead of dumping the entire context.

Config

{
  "outputs": [
    { "name": "orders",  "source": "paidOrders" },
    { "name": "summary", "source": "steps.filter.output" }
  ],
  "dataOnly": false  // optional — when true, the run response is just the data dict
}

The step itself doesn't write a file — it records metadata (delimiter, columns, row count). The actual file bytes are generated on demand by the download endpoint.

Config

{
  "source":    "paidOrders",
  "delimiter": "comma",                          // comma | tab | semicolon | pipe
  "filename":  "paid-orders.csv"                 // free-form
}

File extension is auto-derived from the delimiter: comma → .csv, tab → .tsv, anything else → .txt. Download via POST /api/workflow/{id}/history/{runId}/export/{stepId} — see tutorial step 9.

Two systems, two match modes. system: data_storage moves entries inside your Data Storage tree (recorded as a deferred intent that applies after the workflow succeeds). system: sftp moves remote files immediately during the step. Path templates support {{variable}} placeholders that resolve from context.

Config — Data Storage, specific file

{
  "system":     "data_storage",     // default
  "matchMode":  "specific",         // default
  "sourcePath": "inbox/orders.csv",
  "destPath":   "processed/{{date}}"
}

Config — Data Storage, glob pattern

{
  "system":           "data_storage",
  "matchMode":        "pattern",
  "sourceFolderPath": "inbox",
  "sourcePattern":    "orders-*.csv",
  "destPath":         "processed/{{date}}"
}

Config — SFTP

{
  "system":       "sftp",
  "connectionId": "<sftp connection id>",
  "sourcePath":   "/inbox/orders.csv",
  "destPath":     "/processed/orders-{{date}}.csv"
}

Detailed shape, retry rules, and SSRF guards in Outbound webhook step below.

Config

{ "reason": "All upstream rows already processed" }

Marks the run successful and skips remaining steps. Pairs naturally with conditional.

Config

{
  "conditions": [
    { "variable": "steps.load.output.totalRows", "operator": "==",        "value": 0 },
    { "variable": "orders",                       "operator": "has_rows",  "value": null }
  ],
  "logic":      "and",   // and | or  (default: and)
  "then_steps": [
    /* step definitions to execute if conditions pass */
  ],
  "else_steps": [
    /* step definitions to execute otherwise (optional) */
  ]
}

Operators: ==, !=, >, <, >=, <=, contains, not_contains, has_rows, is_empty, exists, not_exists.

Config

{
  "recipients":    ["ops@acme.com"],     // must be in your verified-recipients list (status=verified)
  "subject":       "Daily report — ${steps.summary.output.row_count} rows",
  "body_mode":     "inline_data",         // plain | inline_data
  "body_text":     "…${path}…",            // when body_mode=plain
  "body_variable": "steps.summary.output.rows",  // when body_mode=inline_data
  "attachment": {
    "kind":          "variable",          // none | variable | step
    "source":        "steps.transform.output.results",
    "format":        "csv",                // csv | tsv | psv | ssv | json  (variable kind only)
    "filename":      "report-${run_date}.csv",
    "delivery":      "auto",               // auto | attach | link  (auto flips to link at >5MB)
    "requires_auth": false                 // link/auto kinds only
  }
}

Recipients are managed via /api/email/recipients with a verification flow. Daily caps per tier — see the rate limits table. Sender's account email must be confirmed.

v1 supports PostgreSQL via the connector framework. Single-transaction semantics with batched commits.

Config

{
  "source":       "rows",
  "connectionId": "<db-connection id>",
  "targetTable":  "fact_orders",
  "targetSchema": "analytics",       // optional
  "mode":         "append",          // append | upsert
  "keyColumns":   ["order_id"],       // required when mode=upsert
  "batchSize":    5000,               // optional (default 5000)
  "maxRows":      5000000             // optional safety cap (default 5M)
}

Column names are taken from the source rows verbatim — there's no per-column source→target mapping. Rename columns upstream with select_columns if you need to.

Atomic upload: payload is written to <remotePath>.tmp.<run_id>, then renamed to the final path on success. Consumers polling remotePath never see a partial file.

Config

{
  "source":       "rows",
  "connectionId": "<sftp-connection id>",
  "remotePath":   "/exports/orders-${date}.csv",
  "outputFormat": "csv"             // csv | tsv | psv | ssv | json
}

Outbound webhook step

The webhook step takes one of your variables, serializes it to JSON, and POSTs the bytes to a URL you control. The body is the variable's value as-is — there is no payload templating layer; if you want a custom shape, build it upstream and feed that variable in.

Config

{
  "id":   "notify",
  "type": "webhook",
  "name": "Notify ops API",
  "config": {
    "source":          "summary",                              // path to the variable to POST
    "url":             "https://ops.acme.com/orkidata-events",  // https only — http:// is rejected
    "method":          "POST",                                  // POST | PUT
    "headers":         { "X-Source": "orkidata", "Authorization": "Bearer …" },
    "outputVariable":  "ops_response",                          // optional — captures {statusCode, responseSnippet}
    "maxPayloadRows":  10000                                    // optional — caps row-count when source is a list (default 10,000)
  }
}

Free tier is gated out — webhook destinations require Entry tier or higher (egress reputation hygiene).

Authentication on your receiver

Put any auth your receiver needs (Bearer tokens, basic auth, custom keys) into the headers object. The platform doesn't add its own signature header; what you send is what your receiver sees. Headers matching authorization, api-key, token, secret are redacted from execution history so credentials don't leak through the run UI.

SSRF guard

URLs resolving to private CIDR ranges (10/8, 172.16/12, 192.168/16, 127/8, 169.254/16, IPv6 loopback / link-local / ULA) are rejected before any request goes out. http:// URLs are also rejected — HTTPS only.

Retry behavior

  • Tier 1 (sync, ≤10 MB) — 1 attempt, no retries (the API Gateway 29 s budget can't accommodate them).
  • Tier 2/3 (async worker) — up to 3 attempts with 1 s / 2 s / 4 s exponential backoff on connection errors and 5xx responses.
  • 4xx responses — terminal failure, no retry. The step fails immediately.

The per-attempt timeout is 30 seconds. Final-attempt failure fails the workflow step; the response status code and body snippet land in execution history (and in outputVariable if you set one).

API reference (interactive)

Every public endpoint above (plus the rest of the surface) lives in a dedicated full-width interactive reference — try requests inline, see exact request/response schemas, copy generated curl commands.

Open API reference Download spec (.yaml)