Downloading Data

download_data() is the SDK’s main read method. Call it on a per-dataset handle to retrieve the dataset’s rows in the format of your choice. The method also supports server-side filtering by date range, entity values, and column subset — saving bandwidth and post-processing for large datasets.

from alphacast import Alphacast

alphacast = Alphacast("YOUR_API_KEY")
df = alphacast.datasets.dataset(6755).download_data(format="pandas")

You only need read permission on a dataset to download it. For public datasets that you don’t own, look the dataset ID up in the URL on alphacast.io and pass it directly to dataset().

Output formats

Pass the format argument to choose how the data is returned:

`format`	Returns	Use it when
`"pandas"`	`pandas.DataFrame`	Working in a notebook or pipeline that consumes DataFrames.
`"csv"`	`bytes` (CSV)	Saving to disk or piping into another tool.
`"json"`	`list[dict]`	Iterating row-by-row in pure Python.
`"xlsx"`	`bytes` (XLSX)	Producing an Excel file for non-technical users.
`"tsv"`	`bytes` (TSV)	Tab-separated workflows.

When format="json" the API responds with newline-delimited JSON (NDJSON). The SDK parses it for you and returns a flat list of dicts — one dict per row.

# Direct to DataFrame
df = alphacast.datasets.dataset(6755).download_data(format="pandas")

# Raw CSV bytes
csv_bytes = alphacast.datasets.dataset(6755).download_data("csv")

# Save XLSX to disk
xlsx_bytes = alphacast.datasets.dataset(6755).download_data("xlsx")
with open("data.xlsx", "wb") as f:
    f.write(xlsx_bytes)

# JSON as a list of dicts
rows = alphacast.datasets.dataset(6755).download_data(format="json")
for row in rows[:5]:
    print(row)

When format="pandas", the SDK fetches CSV under the hood and parses it with pandas.read_csv. You get the same data as format="csv", just already wrapped in a DataFrame.

Filtering by date range

Pass startDate and/or endDate as datetime.date (or datetime.datetime) objects to restrict the rows returned:

import datetime as dt

df = alphacast.datasets.dataset(6755).download_data(
    format="pandas",
    startDate=dt.date(2020, 1, 1),
    endDate=dt.date(2024, 12, 31),
)

The SDK reads the dataset’s column definitions to detect the date column automatically, so you don’t need to pass its name. If the dataset has no date column, omit these parameters.

Filtering by entity values

Use filterEntities to restrict rows by the value of one or more entity columns. Pass a dict mapping each entity column to a list of accepted values:

df = alphacast.datasets.dataset(6755).download_data(
    format="pandas",
    filterEntities={"country": ["USA", "Argentina", "Brazil"]},
)

You can pass multiple entity columns at once. Each column’s values are OR’d internally; the columns are combined with AND:

df = alphacast.datasets.dataset(6755).download_data(
    format="pandas",
    filterEntities={
        "country": ["USA", "Mexico"],
        "sector": ["Manufacturing", "Services"],
    },
)
# country in (USA, Mexico) AND sector in (Manufacturing, Services)

Selecting specific columns

Use filterVariables to keep only certain value columns in the output. Entity and date columns are always returned automatically.

df = alphacast.datasets.dataset(6755).download_data(
    format="pandas",
    filterVariables=["GDP_USD_Billions", "Inflation_YoY"],
)

Column names must match the names in the dataset schema exactly (case-sensitive). Use get_column_definitions() to inspect the available columns first.

Combining filters

All three filters compose. The example below downloads US and Argentine GDP from 2020 onwards as a DataFrame:

import datetime as dt

df = alphacast.datasets.dataset(6755).download_data(
    format="pandas",
    startDate=dt.date(2020, 1, 1),
    filterEntities={"country": ["USA", "Argentina"]},
    filterVariables=["GDP_USD_Billions"],
)

Filters are applied server-side via OData query parameters, so only the matching rows travel over the network.

Method signature

download_data(
    format="csv",
    startDate=None,
    endDate=None,
    filterVariables=[],
    filterEntities=[],
)

str

default:"\"csv\""

One of "csv", "json", "xlsx", "tsv", or "pandas". The first four return the raw bytes (or list of dicts for JSON); "pandas" returns a DataFrame.

datetime.date

Inclusive lower bound for the date column. The SDK detects the date column automatically.

datetime.date

Inclusive upper bound for the date column.

list[str]

Column names to include. Entity and date columns are always included even if not listed.

dict[str, list[str]]

Mapping of entity column name to accepted values. Multiple keys are AND’d; values within a key are OR’d.

Public datasets

You can download from any public dataset on Alphacast — even ones you don’t own — by passing its ID. Find the ID in the URL on alphacast.io (e.g. /datasets/5208/... → ID 5208).

df = alphacast.datasets.dataset(5208).download_data(format="pandas")

Next steps

Discover dataset IDs across the catalog with Search.
Read download_data REST reference for the underlying OData parameters.

Get Started

Resources

Guides

Reference

Output formats

Filtering by date range

Filtering by entity values

Selecting specific columns

Combining filters

Method signature

Public datasets

Next steps

​Output formats

​Filtering by date range

​Filtering by entity values

​Selecting specific columns

​Combining filters

​Method signature

​Public datasets

​Next steps

Output formats

Filtering by date range

Filtering by entity values

Selecting specific columns

Combining filters

Method signature

Public datasets

Next steps