Skip to main content
The SDK turns the three-step Alphacast upload workflow — create dataset, initialize columns, upload data — into three short Python calls. This page walks through each step and the conflict-resolution flags that control how new data is merged with existing rows.
The mechanics of validation, manifests, and the merge logic are documented in the Uploading Data section. This page focuses on the SDK surface; refer back to the concept pages when you need to understand why an upload behaved a certain way.

1. Create the dataset

datasets.create() registers a new empty dataset inside a repository:
ds = alphacast.datasets.create(
    "Quarterly GDP",
    repo_id=42,
    description="GDP series by country",
    returnIdIfExists=True,
)
dataset_id = ds["id"]

Parameters

dataset_name
str
required
Display name. Must be unique within the parent repository.
repo_id
int
required
Numeric ID of the parent repository.
description
str
default:"\"\""
Long-form description shown on the dataset page.
returnIdIfExists
bool
default:"False"
When True, an existing dataset with the same name in the same repo is returned instead of raising. Use this for idempotent setup scripts.
If the dataset already exists and returnIdIfExists=False, the call raises ValueError. The error message includes the existing dataset’s ID.

2. Initialize the columns

Before the first upload, declare which column carries the date and which columns identify a row. You can do this in either of two ways:
  1. Call initialize_columns() before uploading — sets the manifest as a separate API call (covered in this section).
  2. Pass the manifest fields directly to the upload callupload_data_from_df and upload_data_from_csv accept the same parameters and build the manifest in-flight (covered in Inline schema declaration below). This is what the SDK’s own tests use.
Both approaches set the same underlying manifest. Pick whichever fits your code.

initialize_columns()

Use initialize_columns() on the dataset handle:
alphacast.datasets.dataset(dataset_id).initialize_columns(
    dateColumnName="Date",
    entitiesColumnNames=["country"],
    dateFormat="%Y-%m-%d",
)
dateColumnName
str
required
Name of the date column in your CSV / DataFrame. Must match exactly (case-sensitive).
entitiesColumnNames
list[str]
required
Names of the entity columns. Together with the date column they form the dataset’s unique key — every (date, entity values...) combination must be unique.
dateFormat
str
required
strftime format string describing how dates are written in the CSV — for example "%Y-%m-%d" for ISO dates or "%d/%m/%Y" for European-style.
Alphacast’s chart engine currently expects a single entity column. Multiple entities are supported for storage and download, but charting works only on single-entity datasets.

3. Upload the data

The SDK provides two upload methods. Both upload to the same endpoint and accept the same merge flags — the only difference is the input type.

From a pandas.DataFrame

process = alphacast.datasets.dataset(dataset_id).upload_data_from_df(
    df,
    deleteMissingFromDB=False,
    onConflictUpdateDB=True,
    uploadIndex=False,
)
The SDK serializes the DataFrame to CSV in memory, gzip-compresses it, and PUTs it to /datasets/{id}/data. The call returns the API’s raw response — the JSON of an upload process record.
df
pandas.DataFrame
required
The data to upload. Must contain the date column and every entity column declared in initialize_columns. Raises if the DataFrame is empty.
uploadIndex
bool
default:"True"
Whether to include the DataFrame’s index column in the upload. Set to False if your DataFrame has a meaningless RangeIndex.

From a CSV string

csv_text = open("gdp.csv").read()

process = alphacast.datasets.dataset(dataset_id).upload_data_from_csv(
    csv_text,
    deleteMissingFromDB=False,
    onConflictUpdateDB=True,
)
upload_data_from_csv accepts the CSV as a Python string. The SDK gzip-compresses it before uploading.

Merge flags

Both upload methods accept the same conflict-resolution flags:
deleteMissingFromDB
bool
default:"False"
When True, rows already in the dataset that are absent from the upload are deleted. Use for full replacements.
onConflictUpdateDB
bool
default:"False"
When True, rows whose key matches an existing row are updated with the incoming value. When False, the existing value is kept.
acceptNewColumns
bool
default:"None"
When True, columns in the CSV that aren’t in the existing schema are added to the dataset. When False (or omitted), unknown columns cause the upload to fail.
The same flags map directly to the REST API query parameters. The full validation logic and edge cases are described in Validation & upload modes.

Common combinations

GoaldeleteMissingFromDBonConflictUpdateDB
Append new rows, never overwrite existingFalseFalse
Append new rows, overwrite duplicatesFalseTrue
Full replacement (mirror the upload exactly)TrueTrue
Trim the dataset to match the upload, but keep existing values where they overlapTrueFalse

Inline schema declaration

Both upload methods can declare the column schema inline — useful when you want to push data and set the manifest in a single call. Pass the same field names as initialize_columns, plus optional stringColumnNames and acceptNewColumns:
alphacast.datasets.dataset(dataset_id).upload_data_from_df(
    df,
    deleteMissingFromDB=False,
    onConflictUpdateDB=True,
    uploadIndex=False,
    dateColumnName="Date",
    dateFormat="%Y-%m-%d",
    entitiesColumnNames=["country"],
    stringColumnNames=["notes"],
    acceptNewColumns=True,
)
When provided, these arguments are sent as the upload’s manifest form field — see the manifest reference. With this pattern you can skip the separate initialize_columns() call entirely.
Use acceptNewColumns=True together with the inline manifest to add new columns to an existing dataset on the next upload. Without it, columns not already in the schema cause the upload to fail.

Return value

Both upload_data_from_df and upload_data_from_csv return the API response body as bytes — a JSON-encoded process record. Decode with json.loads(...) to use it:
import json

raw = alphacast.datasets.dataset(dataset_id).upload_data_from_df(df)
process = json.loads(raw)

print(process["id"], process["status"])
# {"id": 45141, "status": "Requested", "createdAt": "2026-04-28T16:58:18.999786", "datasetId": 7938}
The process moves through Requested → Processing → Processed (or Error). To poll for completion, see Process status.

End-to-end example

import os
import pandas as pd
from alphacast import Alphacast

alphacast = Alphacast(os.environ["ALPHACAST_API_KEY"])

# 1. Repository
repo = alphacast.repository.create(
    "Macro Indicators", privacy="Private", returnIdIfExists=True
)

# 2. Dataset
ds = alphacast.datasets.create(
    "Quarterly GDP", repo["id"], returnIdIfExists=True
)
dataset_id = ds["id"]

# 3. Columns
alphacast.datasets.dataset(dataset_id).initialize_columns(
    dateColumnName="Date",
    entitiesColumnNames=["country"],
    dateFormat="%Y-%m-%d",
)

# 4. Data
df = pd.read_csv("gdp.csv")
alphacast.datasets.dataset(dataset_id).upload_data_from_df(
    df,
    deleteMissingFromDB=False,
    onConflictUpdateDB=True,
    uploadIndex=False,
)

Next steps