FAQs

How can I tell that a `zarr` file is complete?

When a forecast is being computed, it may be the case that the zarr file of the forecast hasn't been completely uploaded when you list the files. In order to avoid a race-condition, you can check to see that the file is complete by testing for the presence of the .zmetadata file:

# Verify that the latest lead time has finished uploading
# Once this file is written, it is ready to open/download
print(f'Checking if {latest_lead_time}/.zmetadata exists')
assert fs.exists(f"{latest_lead_time}/.zmetadata")

We write the .zmetadata file last, so if it exists, you are safe to assume that the zarr is complete.

What are the recommended package versions?

Recommended Python packages

We recommend using Python 3.11 or newer in a virtual environment. The following commands should be enough to set up a virtual environment with all of the recommended packages on MacOS or Linux:

python3 -m venv jua-venv
source jua-venv/bin/activate
pip install "xarray[complete]" "dask[complete]" zarr

Alternatively, we recommend using uv as a faster and more reliable package manager. You can set up a virtual environment with all the recommended packages as follows:

pip install uv
uv venv jua-venv
uv add "xarray[complete]" "dask[complete]" zarr

For more help, see the official documentation on virtualenv and the various dependencies listed above:

The python library ecosystem is rapidly evolving, and thus depending on the exact versions of tools you use, API's may be subtly different.

All of the code examples have been tested with the following versions:

python = "^3.11"
xarray = "^2025.03.0"
cartopy = "^0.24.1"
cmocean = "^4.0.3"
zarr = "^3.0.0"
dask = "^2025.1.0"

Zarr

Zarr is simply a file format, similar to NetCDF, GRIB, GeoTiff, all of which are storage formats for storing n-dimensional arrays, commonly used to store geospatial data

It is a modern, open source, cloud native data format that is well suited for working with large data sets, given it's support for parallel access patterns via chunking

Given Zarr is optimised for the cloud you can also view it as an API and use it in a similar way to which you'd use a rest API

See https://zarr.dev/

You can use the zarr-python library to access Zarr files that are stored both locally or remote, though it's more typical to access Zarr files through Xarray…

Xarray

Xarray is a python library for working with labeled multi0dimensional arrays. Its data model is based on NetCDF, is heavily inspired by Pandas and is build on top of NumPy.

You typically load data into Xarray from NetCDF or Zarr files, which is as simple as:

import xarray as xr

zarr_file_loc = "fs://my-bucket/my-file.zarr"
data = xr.open_zarr(zarr_file_loc)

When we load a Zarr file we get a Dataset which is a dict like object that wraps DataArrays

Each DataArray in turn wraps a numpy style n-dimensional array with associated metadata

From here we can choose to continue to use Xarray to work with our data, or export to NetCDF, Pandas, or any other tooling of choice.

Xarray vs Pandas

Xarray's API is heavily influenced from Pandas, so if you are familiar with Pandas Xarray is easy. Some key differences

Pandas does not support labelled dimensions, Xarray does
Pandas only supports 2 dimensional 'DataFrames', Xarray supports n-dimensional DataArrays, multiple of which can be contained within a DataSet

Xarray vs Numpy

Xarray is built on top of (or at least depends on) Numpy, so again, if you are familiar with Numpy then Xarray should be easy to pick up.

When are forecasts made available?

Our forecasts are delivered 4 times per day per model. For detailed dissemination times, please see our Models and Products documentation and Dissemination Times page.

Which parameters do you expose?

We provide a range of weather parameters including temperature, wind speed, solar radiation, and more. For the complete list with units and descriptions, see:

How big are the forecast and hindcast files?

Each Zarr file contains a single lead time (prediction hour) for all parameters across the entire model domain. For EPT-1.5:

Forecast files: ~40MB per variable per lead time for the full globe (~150GB for all variables for a full 20-day forecast)
Hindcast files: ~1.5TB for a full year, 4 runs a day, all variables for Europe

The data is stored in float32 format, with compression enabled, meaning that the downloaded amount will be lower than what is listed above.

Zarr's chunked access means that you will never need to download the full globe's worth of data to read the data you are interested in.

Why does the API return different points to what I request?

Our models operate on a fixed grid with a resolution of 0.081° (~9km at the equator). When you request a specific point, we return data from the nearest grid point. The API response includes both the requested coordinates and the actual returned coordinates (returned_latlon) for transparency.

How can I access previous forecasts?

Previous forecasts remain available on our file server and can be accessed using the same folder structure as real-time forecasts. See our guide on Historic Forecasts for details and examples.

How far back do your historical archives go?

We maintain two types of historical data:

Recent Forecasts: All forecasts since operational launch remain accessible through our file server
Hindcast Data: Our hindcast datasets cover specific periods and regions. See our Hindcast Documentation for current coverage details.

How far back does the live forecast bucket go?

Our live forecast bucket contains data from January 30th, 2025 06:00 UTC onwards for both EPT-1.5 and EPT-1.5 Early models.

For more details about accessing historical forecasts, see our Forecast Documentation.

What's the difference between EPT-1.5 and EPT-1.5 Early?

Both models use a similar underlying architecture, but EPT-1.5 Early provides earlier dissemination with different model characteristics. For specific details and dissemination times, see our Models and Products documentation and Dissemination Times page.

Where can I find data for backtesting?

We offer two options for backtesting:

Recent Historical Forecasts: Access previous operational forecasts through our file server. See Historic Forecasts.
Hindcast Datasets: Large multi-year datasets specifically designed for backtesting. See our Hindcast Documentation for coverage and access details.

Can't find the answer you're looking for?

If you have any questions that aren't covered here, please don't hesitate to reach out to our support team at [email protected]. We're here to help!

PreviousWind Data Examples NextMigrations

Last updated 5 months ago

How can I tell that a zarr file is complete?