Uploading Data

The SDK turns the three-step Alphacast upload workflow — create dataset, initialize columns, upload data — into three short Python calls. This page walks through each step and the conflict-resolution flags that control how new data is merged with existing rows.

The mechanics of validation, manifests, and the merge logic are documented in the Uploading Data section. This page focuses on the SDK surface; refer back to the concept pages when you need to understand why an upload behaved a certain way.

1. Create the dataset

datasets.create() registers a new empty dataset inside a repository:

ds = alphacast.datasets.create(
    "Quarterly GDP",
    repo_id=42,
    description="GDP series by country",
    returnIdIfExists=True,
)
dataset_id = ds["id"]

Parameters

str

required

Display name. Must be unique within the parent repository.

int

required

Numeric ID of the parent repository.

str

default:"\"\""

Long-form description shown on the dataset page.

bool

default:"False"

When True, an existing dataset with the same name in the same repo is returned instead of raising. Use this for idempotent setup scripts.

If the dataset already exists and returnIdIfExists=False, the call raises ValueError. The error message includes the existing dataset’s ID.

2. Initialize the columns

Before the first upload, declare which column carries the date and which columns identify a row. You can do this in either of two ways:

Call initialize_columns() before uploading — sets the manifest as a separate API call (covered in this section).
Pass the manifest fields directly to the upload call — upload_data_from_df and upload_data_from_csv accept the same parameters and build the manifest in-flight (covered in Inline schema declaration below). This is what the SDK’s own tests use.

Both approaches set the same underlying manifest. Pick whichever fits your code.

`initialize_columns()`

Use initialize_columns() on the dataset handle:

alphacast.datasets.dataset(dataset_id).initialize_columns(
    dateColumnName="Date",
    entitiesColumnNames=["country"],
    dateFormat="%Y-%m-%d",
)

str

required

Name of the date column in your CSV / DataFrame. Must match exactly (case-sensitive).

list[str]

required

Names of the entity columns. Together with the date column they form the dataset’s unique key — every (date, entity values...) combination must be unique.

str

required

strftime format string describing how dates are written in the CSV — for example "%Y-%m-%d" for ISO dates or "%d/%m/%Y" for European-style.

Alphacast’s chart engine currently expects a single entity column. Multiple entities are supported for storage and download, but charting works only on single-entity datasets.

3. Upload the data

The SDK provides two upload methods. Both upload to the same endpoint and accept the same merge flags — the only difference is the input type.

From a `pandas.DataFrame`

process = alphacast.datasets.dataset(dataset_id).upload_data_from_df(
    df,
    deleteMissingFromDB=False,
    onConflictUpdateDB=True,
    uploadIndex=False,
)

The SDK serializes the DataFrame to CSV in memory, gzip-compresses it, and PUTs it to /datasets/{id}/data. The call returns the API’s raw response — the JSON of an upload process record.

pandas.DataFrame

required

The data to upload. Must contain the date column and every entity column declared in initialize_columns. Raises if the DataFrame is empty.

bool

default:"True"

Whether to include the DataFrame’s index column in the upload. Set to False if your DataFrame has a meaningless RangeIndex.

From a CSV string

csv_text = open("gdp.csv").read()

process = alphacast.datasets.dataset(dataset_id).upload_data_from_csv(
    csv_text,
    deleteMissingFromDB=False,
    onConflictUpdateDB=True,
)

upload_data_from_csv accepts the CSV as a Python string. The SDK gzip-compresses it before uploading.

Merge flags

Both upload methods accept the same conflict-resolution flags:

bool

default:"False"

When True, rows already in the dataset that are absent from the upload are deleted. Use for full replacements.

bool

default:"False"

When True, rows whose key matches an existing row are updated with the incoming value. When False, the existing value is kept.

bool

default:"None"

When True, columns in the CSV that aren’t in the existing schema are added to the dataset. When False (or omitted), unknown columns cause the upload to fail.

The same flags map directly to the REST API query parameters. The full validation logic and edge cases are described in Validation & upload modes.

Common combinations

Goal	`deleteMissingFromDB`	`onConflictUpdateDB`
Append new rows, never overwrite existing	`False`	`False`
Append new rows, overwrite duplicates	`False`	`True`
Full replacement (mirror the upload exactly)	`True`	`True`
Trim the dataset to match the upload, but keep existing values where they overlap	`True`	`False`

Inline schema declaration

Both upload methods can declare the column schema inline — useful when you want to push data and set the manifest in a single call. Pass the same field names as initialize_columns, plus optional stringColumnNames and acceptNewColumns:

alphacast.datasets.dataset(dataset_id).upload_data_from_df(
    df,
    deleteMissingFromDB=False,
    onConflictUpdateDB=True,
    uploadIndex=False,
    dateColumnName="Date",
    dateFormat="%Y-%m-%d",
    entitiesColumnNames=["country"],
    stringColumnNames=["notes"],
    acceptNewColumns=True,
)

When provided, these arguments are sent as the upload’s manifest form field — see the manifest reference. With this pattern you can skip the separate initialize_columns() call entirely.

Use acceptNewColumns=True together with the inline manifest to add new columns to an existing dataset on the next upload. Without it, columns not already in the schema cause the upload to fail.

Return value

Both upload_data_from_df and upload_data_from_csv return the API response body as bytes — a JSON-encoded process record. Decode with json.loads(...) to use it:

import json

raw = alphacast.datasets.dataset(dataset_id).upload_data_from_df(df)
process = json.loads(raw)

print(process["id"], process["status"])
# {"id": 45141, "status": "Requested", "createdAt": "2026-04-28T16:58:18.999786", "datasetId": 7938}

The process moves through Requested → Processing → Processed (or Error). To poll for completion, see Process status.

End-to-end example

import os
import pandas as pd
from alphacast import Alphacast

alphacast = Alphacast(os.environ["ALPHACAST_API_KEY"])

# 1. Repository
repo = alphacast.repository.create(
    "Macro Indicators", privacy="Private", returnIdIfExists=True
)

# 2. Dataset
ds = alphacast.datasets.create(
    "Quarterly GDP", repo["id"], returnIdIfExists=True
)
dataset_id = ds["id"]

# 3. Columns
alphacast.datasets.dataset(dataset_id).initialize_columns(
    dateColumnName="Date",
    entitiesColumnNames=["country"],
    dateFormat="%Y-%m-%d",
)

# 4. Data
df = pd.read_csv("gdp.csv")
alphacast.datasets.dataset(dataset_id).upload_data_from_df(
    df,
    deleteMissingFromDB=False,
    onConflictUpdateDB=True,
    uploadIndex=False,
)

Next steps

Verify the upload succeeded — see Process status.
Pull the data back as a DataFrame — see Downloading data.
Read the validation rules to understand why an upload might end up in Error state.

Get Started

Resources

Guides

Reference

1. Create the dataset

Parameters

2. Initialize the columns

`initialize_columns()`

3. Upload the data

From a `pandas.DataFrame`

From a CSV string

Merge flags

Common combinations

Inline schema declaration

Return value

End-to-end example

Next steps

​1. Create the dataset

​Parameters

​2. Initialize the columns

​initialize_columns()

​3. Upload the data

​From a pandas.DataFrame

​From a CSV string

​Merge flags

​Common combinations

​Inline schema declaration

​Return value

​End-to-end example

​Next steps

1. Create the dataset

Parameters

2. Initialize the columns

`initialize_columns()`

3. Upload the data

From a `pandas.DataFrame`

From a CSV string

Merge flags

Common combinations

Inline schema declaration

Return value

End-to-end example

Next steps