Uploading Data: Overview

Alphacast stores tabular data in datasets, which live inside repositories. To put rows into a dataset, you upload a CSV (plain or gzip-compressed) through the PUT /datasets/{id}/data endpoint. The upload is processed asynchronously by a background job that validates the file, merges it with existing data, and updates the dataset’s column schema, date range, and inferred frequency. This section explains every requirement before, during, and after an upload — what the manifest is, why it matters, what gets validated, and how the merge logic decides whether to insert, update, or delete rows.

Looking for the API reference? See PUT /datasets/{id}/data for the request shape, parameters, and example payloads. The pages in this section explain the concepts behind those parameters.

Before you upload: initialize the dataset

A dataset must already exist and have a manifest before its first upload. The manifest is the column schema that tells Alphacast what every column in your CSV means: which one carries the date, which ones identify a row (entities), which ones hold the actual measurements, and what data type each column has. There are two ways to provide a manifest:

In the upload request itself — pass a manifest form field on the first PUT /datasets/{id}/data call. Alphacast saves the manifest against the dataset and uses it for that upload and all subsequent ones (until you replace it).
From the Alphacast web UI — open the dataset and configure its columns visually before any upload happens.

If you call PUT /datasets/{id}/data without a manifest and the dataset has never had one, the upload fails with Dataset has no manifest. The next page covers the manifest schema in detail.

The upload lifecycle

Every upload creates an upload process record that moves through four states:

Requested

The API receives your PUT /datasets/{id}/data request, validates basic permissions, stores the CSV in temporary cloud storage, and returns a process object with an id and status: "Requested". The HTTP call returns immediately — the actual processing happens in the background.

Processing

A background worker picks up the request. Only one process per dataset can run at a time. If another upload for the same dataset is already in flight, the new request waits in Requested until the in-flight job finishes.During processing, Alphacast:

Loads the CSV from storage and applies the manifest (rename, ignore columns).
Runs every validation rule.
Compares the incoming rows with the dataset’s existing rows by their entity key.
Computes the final dataset using the conflict-resolution flags you passed.
Writes the merged dataset back to storage and updates the dataset’s metadata (date range, inferred frequency, last-edited timestamp).

Processed

The upload completed successfully. The process’s stats field is populated with row counts (insertedValues, updatedValues, deletedValues, changedValues, totalValues, minDate, maxDate, etc.) and a statusDescription summarizing what happened. Subscribers to the dataset receive a notification (rate-limited to one per dataset per four hours).

Error

Validation failed or processing raised an exception. The statusDescription field contains the error message — for example, Data has not Date column, Entity column 'Country' has empty values, or 42 rows are duplicated. Fix the CSV (or the manifest) and retry.

The stats field on the process record is the source of truth for what changed. After a successful upload, fetch the process by ID to see the row-level breakdown — useful for logging, alerting, or building your own upload dashboard.

Idempotency: identical content is skipped

If you upload the exact same content with the exact same parameters (manifest, deleteMissingFromDB, onConflictUpdateDB) as the most recent successful upload, Alphacast detects this via a content hash and skips the merge entirely. The process is still recorded, but with skippedReason: "identical_content" and zero inserts/updates/deletes. This makes scheduled re-uploads safe and cheap — pushing the same file every hour from a cron job will not generate spurious change notifications or re-write storage.

File formats

Format	Extension	Notes
Plain CSV	`.csv`	First row must be the header. UTF-8 or latin1 encoding.
Gzip-compressed CSV	`.csv.gz`	Same content as plain CSV, gzip-encoded. Reduces upload time for large files.

The upload endpoint accepts the file as a multipart/form-data field named data. Anything else in the form (such as the manifest JSON string) is treated as a parameter, not file content.

What you’ll find in this section

The manifest

Column definitions: entity vs value columns, data types, date format, ignored and renamed columns, manifest locking.

Validation & upload modes

Every validation rule the processor enforces, plus the three flags that control how new data is merged with existing rows.

Get Started

Core Concepts

Uploading Data

Data Providers

Uploading Data: Overview

Before you upload: initialize the dataset

The upload lifecycle

Idempotency: identical content is skipped

File formats

What you’ll find in this section

The manifest

Validation & upload modes

​Before you upload: initialize the dataset

​The upload lifecycle

​Idempotency: identical content is skipped

​File formats

​What you’ll find in this section

The manifest

Validation & upload modes

Before you upload: initialize the dataset

The upload lifecycle

Idempotency: identical content is skipped

File formats

What you’ll find in this section