File Access

This guide demonstrates how to access Jua's weather forecasts through our HTTP file server at https://data.jua.ai/forecasts/.

For information on when to use the API vs. File access, please refer to the Forecast README.

Authentication

To gain access to the file server, you may use the same API key details used for the API. This should consist of both an ID and a secret. The file server makes use of HTTP Basic authentication.

File format

We use xarray-compatible Zarr datasets for our forecasts. These datasets contain all the metadata needed to read them conveniently using the Xarray Python library.

For more help on using Zarr and Xarray, please see the official documentation:

Each Zarr dataset contains the full set of weather parameters for the corresponding forecast data. For the full list of available parameters, please see:

Folder structure

Our file server is located at https://data.jua.ai. All forecasts are uploaded to this server, but the path structure differs between model versions.

Important: The storage format differs between model versions:

For EPT-1.5: Each lead time is stored in a separate Zarr v2 file
For EPT-2: All lead times are stored in a single Zarr v3 file

EPT-1.5 model

For the EPT-1.5 model, each lead time is stored as a separate Zarr file at the following path:

https://data.jua.ai/forecasts/MODEL/RUNDATEHOUR/LEADTIME.zarr

where:

MODEL is the name of the model (for example ept-1.5 or ept-1.5-early)
RUNDATEHOUR is the initialisation date/time in UTC of the forecast in format "YYYYMMDDHH", for example 2025020106
LEADTIME is the number of hours since the initialisation time for this specific predicted timestamp

For example:

https://data.jua.ai/forecasts/ept-1.5/2025020106/24.zarr

where:

MODEL is ept-1.5
RUNDATEHOUR is 2025020106 or the 1st of February 2025 at 06:00
LEADTIME is 24 (hours)

So this dataset will contain the weather forecast for the 2nd of February 2025 at 06:00 UTC, as predicted by the forecast initiated at 06:00 on the 1st of February.

EPT-2 model

For the EPT-2 model, all lead times for a given forecast run are stored in a single Zarr file at the following path:

https://data.jua.ai/forecasts/MODEL/RUNDATEHOUR.zarr

where:

MODEL is the name of the model (for example ept-2)
RUNDATEHOUR is the initialisation date/time in UTC of the forecast in format "YYYYMMDDHH", for example 2025020106

For example:

https://data.jua.ai/forecasts/ept-2/2025020106.zarr

This single Zarr dataset contains all lead times for the forecast initialised on the 1st of February 2025 at 06:00 UTC.

Accessing Forecast Data

Below are examples of how to authenticate and access forecast data for each model type. Before getting started, you will require several Python packages for the examples to work, namely:

xarray
aiohttp
fsspec
zarr
dask

We recommend installing the dependencies in a Python virtual environment.

Authentication Setup

First, import the required libraries and set up authentication which will be used in all examples:

import os
from aiohttp import BasicAuth
import xarray as xr
import numpy as np

API_KEY_ID = os.environ["JUA_API_KEY_ID"]
API_KEY_SECRET = os.environ["JUA_API_KEY_SECRET"]
AUTH = BasicAuth(API_KEY_ID, API_KEY_SECRET)

Accessing EPT-2 Forecast Data

For EPT-2, all lead times are stored in a single Zarr file:

# Define the path to the specific forecast
model = "ept-2" # or "ept-1.5-b"
init_time = "2025061112"  # 2025-05-08 at 06:00 UTC
zarr_url = f"https://data.jua.sh/forecasts/{model}/{init_time}.zarr"

ds = xr.open_dataset(
    zarr_url,
    engine="zarr",
    decode_timedelta=True,
    storage_options={"auth": AUTH}
)

# Display basic information
print(ds)

# The dataset already includes all lead times as a dimension
# You can access specific lead times like this:
lead_time = np.timedelta64(24, "h") # 24 hours ahead
ds_at_lead_time = ds.sel(prediction_timedelta=lead_time)

Checking Forecast Availability

When accessing the latest forecast, there is a risk that the Zarr files may be partially-written. Attempting to read these partially-written files could result in errors or data inconsistency.

In order to make sure that the latest Zarr dataset is complete, it's best to make use of the API to check that the latest forecast is fully available.

Below is partial example for checking if a forecast is available. For more details please refer to the API data access guide or the API reference documentation.

from datetime import datetime
import requests

AUTH_HEADER = {"X-API-Key": f"{API_KEY_ID}:{API_KEY_SECRET}"}

# In this example, we want the first two days' predictions of
# the EPT-1.5 forecast initialised on 2025-05-08 at 06:00 UTC
desired_init_time = datetime.fromisoformat("2025-05-08T06:00:00.000Z")
desired_forecasted_hours = 48

# Get the latest available forecast metadata from the API
response = requests.get(
    "https://api.jua.ai/v1/forecasting/ept1_5/forecasts/latest",
    headers=AUTH_HEADER
).json()
found_init_time = datetime.fromisoformat(response["init_time"])
available_forecasted_hours = response["available_forecasted_hours"]

# Check that our desired data is ready to be accessed:
if found_init_time > desired_init_time:
    print("Yes, we are accessing a historic forecast, therefore it should already be fully written")
elif found_init_time == desired_init_time and available_forecasted_hours >= desired_forecasted_hours:
    print("Yes, we are accessing the latest forecast, which already has the desired hours completed")
else:
    print("No, we are trying to access a forecast which is not yet fully available")

When attempting to retrieve the most recent forecast, poll for the latest available init_time and available_forecasted_hours via the API above and only read the data once it's available. Also refer to the expected dissemination times.

Accessing the Legacy EPT-1.5 Forecast Data

We are currently supporting two access types for the EPT-1.5 and EPT-1.5-early model. The standard approach is described above for EPT-2 and can be used to access the EPT-1.5 files under the names ept-1.5-b and ept-1.5-early-b. The lecagy approach is described below.

For the legacy EPT-1.5 format, each lead time is stored in a separate Zarr file. Here is how to open one or more lead times with Python:

# Define the path to the specific forecast
model = "ept-1.5"         # or "ept-1.5-early"
init_time = "2025050806"  # 2025-05-08 at 06:00 UTC
lead_time = 24            # 24 hours ahead
zarr_url = f"https://data.jua.sh/forecasts/{model}/{init_time}/{lead_time}.zarr"

# Open the remote dataset
ds = xr.open_dataset(
    zarr_url,
    engine="zarr",
    decode_timedelta=True,
    storage_options={"auth": AUTH}
)

# Print an overview of the dataset
print(ds)

# Define the paths to the specific forecasts
model = "ept-1.5"          # or "ept-1.5-early"
init_time = "2025050806"   # 2025-05-08 at 06:00 UTC
lead_times = range(48 + 1) # 0-48 hours ahead
zarr_urls = [
    f"https://data.jua.sh/forecasts/{model}/{init_time}/{lead_time}.zarr"
    for lead_time in lead_times
]

# Open the remote datasets and merge them into one
ds = xr.open_mfdataset(
    zarr_urls,
    engine="zarr",
    decode_timedelta=True,
    parallel=True,
    storage_options={"auth": AUTH}
)

# Print an overview of the dataset
print(ds)

# The dataset now includes all selected lead times as a dimension
# You can access specific lead times like this:
lead_time = np.timedelta64(24, "h") # 24 hours ahead
ds_at_lead_time = ds.sel(prediction_timedelta=lead_time)

PreviousAPI Access NextHistorical Data

Last updated 2 months ago