The mechanics of validation, manifests, and the merge logic are documented in the Uploading Data section. This page focuses on the SDK surface; refer back to the concept pages when you need to understand why an upload behaved a certain way.
1. Create the dataset
datasets.create() registers a new empty dataset inside a repository:
Parameters
Display name. Must be unique within the parent repository.
Numeric ID of the parent repository.
Long-form description shown on the dataset page.
When
True, an existing dataset with the same name in the same repo is returned instead of raising. Use this for idempotent setup scripts.returnIdIfExists=False, the call raises ValueError. The error message includes the existing dataset’s ID.
2. Initialize the columns
Before the first upload, declare which column carries the date and which columns identify a row. You can do this in either of two ways:- Call
initialize_columns()before uploading — sets the manifest as a separate API call (covered in this section). - Pass the manifest fields directly to the upload call —
upload_data_from_dfandupload_data_from_csvaccept the same parameters and build the manifest in-flight (covered in Inline schema declaration below). This is what the SDK’s own tests use.
initialize_columns()
Use initialize_columns() on the dataset handle:
Name of the date column in your CSV / DataFrame. Must match exactly (case-sensitive).
Names of the entity columns. Together with the date column they form the dataset’s unique key — every
(date, entity values...) combination must be unique.strftime format string describing how dates are written in the CSV — for example "%Y-%m-%d" for ISO dates or "%d/%m/%Y" for European-style.Alphacast’s chart engine currently expects a single entity column. Multiple entities are supported for storage and download, but charting works only on single-entity datasets.
3. Upload the data
The SDK provides two upload methods. Both upload to the same endpoint and accept the same merge flags — the only difference is the input type.From a pandas.DataFrame
PUTs it to /datasets/{id}/data. The call returns the API’s raw response — the JSON of an upload process record.
The data to upload. Must contain the date column and every entity column declared in
initialize_columns. Raises if the DataFrame is empty.Whether to include the DataFrame’s index column in the upload. Set to
False if your DataFrame has a meaningless RangeIndex.From a CSV string
upload_data_from_csv accepts the CSV as a Python string. The SDK gzip-compresses it before uploading.
Merge flags
Both upload methods accept the same conflict-resolution flags:When
True, rows already in the dataset that are absent from the upload are deleted. Use for full replacements.When
True, rows whose key matches an existing row are updated with the incoming value. When False, the existing value is kept.When
True, columns in the CSV that aren’t in the existing schema are added to the dataset. When False (or omitted), unknown columns cause the upload to fail.Common combinations
| Goal | deleteMissingFromDB | onConflictUpdateDB |
|---|---|---|
| Append new rows, never overwrite existing | False | False |
| Append new rows, overwrite duplicates | False | True |
| Full replacement (mirror the upload exactly) | True | True |
| Trim the dataset to match the upload, but keep existing values where they overlap | True | False |
Inline schema declaration
Both upload methods can declare the column schema inline — useful when you want to push data and set the manifest in a single call. Pass the same field names asinitialize_columns, plus optional stringColumnNames and acceptNewColumns:
manifest form field — see the manifest reference. With this pattern you can skip the separate initialize_columns() call entirely.
Return value
Bothupload_data_from_df and upload_data_from_csv return the API response body as bytes — a JSON-encoded process record. Decode with json.loads(...) to use it:
Requested → Processing → Processed (or Error). To poll for completion, see Process status.
End-to-end example
Next steps
- Verify the upload succeeded — see Process status.
- Pull the data back as a DataFrame — see Downloading data.
- Read the validation rules to understand why an upload might end up in
Errorstate.