# Query Structure

### Overview

A forecast query to the data endpoint is a JSON object that specifies:

1. **What data** you want (models, variables)
2. **Where** you want it (geographic filter)
3. **When** you want it (temporal filter)
4. **How** to process it (aggregation, grouping, weighting)
5. **How** to return it (format, ordering, pagination)

#### Basic Query Structure

```json
{
  "models": ["ept2"],
  "geo": { ... },
  "init_time": "latest",
  "variables": [ ... ],
  "prediction_timedelta": { ... },
  ...
}
```

***

### Required Parameters

Every query must include these three parameters:

#### `models` (required)

List of forecast model identifiers to query.

**Type:** `array of strings`\
**Minimum length:** 1

```json
{
  "models": ["ept2"]
}
```

**Multiple models:**

```json
{
  "models": ["ept2", "aifs", "ecmwf_ifs_single"]
}
```

You can find an overview of the available models [here](/models-and-products/models-and-products.md).

{% hint style="info" %}
Use `GET /v1/forecast/meta` to fetch available models and supported variables programmatically
{% endhint %}

***

#### `geo` (required)

Geographic filter specifying where to query data.

**Type:** `object`\
**Structure:**

```json
{
  "type": "point" | "bounding_box" | "polygon" | "market_zone" | "country_key",
  "value": ...,
  "method": "nearest" | "bilinear"  // Only for type="point"
}
```

See Geographic Filtering section below for details.

***

#### `init_time` (required)

Forecast initialization time(s).

**Type:** `string | array | object`

**Options:**

1. **Latest forecast:**

   ```json
   { "init_time": "latest" }
   ```
2. **Specific datetime** (ISO 8601 format):

   ```json
   { "init_time": "2025-10-22T00:00:00Z" }
   { "init_time": "2025-10-22 00:00:00" }
   ```
3. **List of datetimes:**

   ```json
   {
     "init_time": ["2025-10-22T00:00:00Z", "2025-10-22T12:00:00Z"]
   }
   ```
4. **Time range:**

   ```json
   {
     "init_time": {
       "start": "2025-10-01T00:00:00Z",
       "end": "2025-10-31T23:59:59Z"
     }
   }
   ```

***

### Geographic Filtering

The `geo` parameter supports five types of geographic queries.

#### 1. Point Query

Query at specific coordinate(s).

**Coordinates:** `[latitude, longitude]`

**Single point:**

```json
{
  "type": "point",
  "value": [47.37, 8.54], // Zurich [lat, lon]
  "method": "nearest" // or "bilinear"
}
```

**Multiple points:**

```json
{
  "type": "point",
  "value": [
    [52.52, 13.4], // Berlin
    [47.37, 8.54], // Zurich
    [48.86, 2.35] // Paris
  ],
  "method": "nearest"
}
```

**Interpolation methods:**

* `"nearest"` (default): Returns value from nearest grid point
* `"bilinear"`: Interpolates between 4 surrounding grid points

***

#### 2. Bounding Box

Query a rectangular region.

**Format:** `[[lat_min, lon_min], [lat_max, lon_max]]`

```json
{
  "type": "bounding_box",
  "value": [
    [45.0, 5.0],
    [50.0, 15.0]
  ]
}
```

This queries all grid points within the rectangle from (45°N, 5°E) to (50°N, 15°E).

{% hint style="warning" %}
Large bounding boxes return many grid points. Consider using `group_by` with `aggregation` to aggregate data over the region.

Large requests might get blocked by the `request_credit_limit`, which helps avoid large costs but can be increased for intentionally querying large amounts of raw data.
{% endhint %}

***

#### 3. Polygon

Query data within a custom polygon area.

**Format:** Array of `[latitude, longitude]` coordinates defining the polygon boundary.

```json
{
  "type": "polygon",
  "value": [
    [45.82, 5.96], // Southwest corner
    [45.82, 10.49], // Southeast corner
    [47.81, 10.49], // Northeast corner
    [47.81, 5.96], // Northwest corner
    [45.82, 5.96] // Close the polygon (repeat first point)
  ]
}
```

**Requirements:**

* Minimum 4 points (3 unique vertices + closing point)
* Polygon should follow counter-clockwise ordering

{% hint style="warning" %}
Polygons covering a large area return many grid points. Consider using `group_by` with `aggregation` to aggregate data over the region.

Large requests might get blocked by the `request_credit_limit`, which helps avoid large costs but can be increased for intentionally querying large amounts of raw data.
{% endhint %}

***

#### 4. Market Zone

Query data for predefined energy market zones.

**Format:** String or array of market zone codes

```json
{
  "type": "market_zone",
  "value": "DE" // Germany
}
```

**Multiple market zones:**

```json
{
  "type": "market_zone",
  "value": ["IR", "GB-NIR"]
}
```

{% hint style="warning" %}
Market zones covering a large area return many grid points. Consider using `group_by` with `aggregation` to aggregate data over the region.

Large requests might get blocked by the `request_credit_limit`, which helps avoid large costs but can be increased for intentionally querying large amounts of raw data.
{% endhint %}

***

#### 5. Country

Query data for entire countries.

**Format:** ISO country codes

```json
{
  "type": "country_key",
  "value": "DE"
}
```

**Multiple countries:**

```json
{
  "type": "country_key",
  "value": ["DE", "FR", "US"]
}
```

{% hint style="warning" %}
Querying countries returns many grid points. Consider using `group_by` with `aggregation` to aggregate data over the region.

Large requests might get blocked by the `request_credit_limit`, which helps avoid large costs but can be increased for intentionally querying large amounts of raw data.
{% endhint %}

***

### Temporal Filtering

Control which forecast times to retrieve using temporal filters.

#### Forecast Time Concepts

* **`init_time`** (required): When the forecast was generated
* **`prediction_timedelta`**: How far ahead from init\_time (lead time)
* **`time`**: Absolute forecast valid time (`init_time + prediction_timedelta`)

#### `prediction_timedelta` (optional)

Forecast lead time from initialization.

**Units:** Controlled by `timedelta_unit` parameter (default: `"h"` for hours)

{% hint style="info" %}
The `timedelta_unit` also affects the unit in which `prediction_timedelta` is returned
{% endhint %}

**Single value:**

```json
{
  "prediction_timedelta": 24,
  "timedelta_unit": "h" // 24 hours ahead
}
```

**List of values:**

```json
{
  "prediction_timedelta": [0, 6, 12, 18, 24],
  "timedelta_unit": "h"
}
```

**Range:**

```json
{
  "prediction_timedelta": {
    "start": 0,
    "end": 168 // 0 to 168 hours (7 days)
  },
  "timedelta_unit": "h"
}
```

**If omitted:** Returns all available lead times for the specified init\_time(s).

{% hint style="warning" %}
Requesting longer time periods results in higher costs. Only request the time period you are interested in to save costs.
{% endhint %}

***

#### `time` (optional)

Filter by absolute forecast valid time.

**Single datetime:**

```json
{
  "time": "2025-10-22T12:00:00Z"
}
```

**List of datetimes:**

```json
{
  "time": [
    "2025-10-22T00:00:00Z",
    "2025-10-22T06:00:00Z",
    "2025-10-22T12:00:00Z"
  ]
}
```

**Time range:**

```json
{
  "time": {
    "start": "2025-10-22T00:00:00Z",
    "end": "2025-10-23T00:00:00Z"
  }
}
```

{% hint style="info" %}
`time` and `prediction_timedelta` are complementary ways to filter temporal data. You can use either or both.

You can use `time` with multiple `init_time` to compare how the forecast for a given point in time has changed.
{% endhint %}

***

#### `timedelta_unit` (optional)

Units for `prediction_timedelta` and `latest_min_prediction_timedelta`.

**Type:** `string`\
**Default:** `"h"` (hours)

**Supported values:**

* `"m"`, `"minute"`, `"minutes"` - Minutes
* `"h"`, `"hour"`, `"hours"` - Hours
* `"d"`, `"day"`, `"days"` - Days

**Example:**

```json
{
  "prediction_timedelta": {
    "start": 0,
    "end": 7
  },
  "timedelta_unit": "d" // 0 to 7 days
}
```

***

#### `latest_min_prediction_timedelta` (optional)

When using `init_time: "latest"`, only include forecasts with at least this much lead time available.

**Example:**

```json
{
  "init_time": "latest",
  "latest_min_prediction_timedelta": 24,
  "timedelta_unit": "h" // Only use latest forecast if it has ≥24h lead time
}
```

This is useful when you need a minimum forecast horizon regardless of when the forecast was generated.

***

### Variable Selection

#### `variables` (optional)

Weather variables to retrieve.

**Type:** `array of strings`\
**Default:** All variables available for the selected models

```json
{
  "variables": [
    "air_temperature_at_height_level_2m",
    "wind_speed_at_height_level_100m",
    "precipitation_amount_sum_1h"
  ]
}
```

**Common variables:**

* `air_temperature_at_height_level_2m` - Temperature at 2m (Kelvin)
* `wind_speed_at_height_level_10m` - Wind speed at 10m (m/s)
* `wind_speed_at_height_level_100m` - Wind speed at 100m (m/s)
* `relative_humidity_at_height_level_2m` - Relative humidity (%)
* `precipitation_amount_sum_1h` - 1-hour accumulated precipitation (mm)
* `surface_downwelling_shortwave_flux_sum_1h` - Solar radiation (W/m²)
* `air_pressure_at_mean_sea_level` - Sea level pressure (Pa)
* And many more...

{% hint style="info" %}
You can always query `/v1/forecasts/meta` to check the available variables per model.
{% endhint %}

**If omitted:** Returns all variables common to the selected models.

{% hint style="warning" %}
Only request the variables you are interested in to save costs.

Some variables such as solar and wind at 100m are only available in a Pro subscription or higher.
{% endhint %}

***

### Aggregation and Grouping

Aggregate data across space and/or time using `group_by` and `aggregation`.

#### Concept

* **`group_by`**: Dimensions to preserve in the result
* **`aggregation`**: How to aggregate values within each group

#### `group_by` (optional)

Dimensions to group by.

**Type:** `array of strings`

**Supported dimensions:**

* `"model"` - Model identifier
* `"init_time"` - Forecast initialization time
* `"time"` - Forecast valid time
* `"prediction_timedelta"` - Lead time
* `"latitude"` - Latitude coordinate
* `"longitude"` - Longitude coordinate
* `"point"` - Equivalent to settings both `latitude` and `longitude` . Useful for ensemble models where you are interested in statistics per location (e.g. min, max, std)
* `"market_zone"` - Market zone (when using geo type market\_zone)
* `"country_key"` - Country (when using geo type country\_key)

**Time transformations:**

* `"time__to_start_of(hour)"` or `"hourly"` - Grouped by start of hour
* `"time__to_start_of(day)"` or `"daily"` - Grouped by start of day
* `"time__to_start_of(week)"` or `"weekly"` - Grouped by start of week
* `"time__to_start_of(month)"` or `"monthly"` - Grouped by start of month
* `"time__to_start_of(year)"` or `"yearly"` - Grouped by start of year
* Same transformations available for `init_time`

{% hint style="info" %}
You can use time transformations to get for example the daily min & max temperature for a given location. Make sure to set `time_zone` to get the data group by the start-of-day in the region you are interested in.
{% endhint %}

**Example - Average over all grid points for each valid time:**

```json
{
  "group_by": ["model", "init_time", "time"],
  "aggregation": ["avg"]
}
```

**Example - Daily min & max:**

```json
{
  "group_by": ["model", "daily"], // "daily" is shortcut for "time__to_start_of(day)"
  "aggregation": ["min", "max"]
}
```

{% hint style="info" %}
Aggregation is required when using `group_by`. Per default `avg` is used.
{% endhint %}

***

#### `aggregation` (optional)

Aggregation functions to apply when grouping.

**Type:** `array of objects or strings`

**Supported functions:**

* `"avg"` - Average
* `"std"` - Standard deviation
* `"min"` - Minimum
* `"max"` - Maximum
* `"sum"` - Sum
* `"count"` - Count
* `"median"` - Median
* `"quantile"` - Quantile (requires parameter)

**Simple aggregation (all variables):**

```json
{
  "aggregation": ["avg"]
}
```

**Multiple aggregations:**

```json
{
  "aggregation": ["avg", "std", "min", "max"]
}
```

**Variable-specific aggregation:**

```json
{
  "aggregation": [
    {
      "aggregation": "avg",
      "variables": ["air_temperature_at_height_level_2m"]
    },
    {
      "aggregation": "max",
      "variables": ["wind_speed_at_height_level_100m"]
    }
  ]
}
```

**Parameterized aggregation (quantile):**

```json
{
  "aggregation": [
    {
      "aggregation": "quantile",
      "parameters": [0.95], // 95th percentile
      "variables": ["wind_speed_at_height_level_100m"]
    }
  ]
}
```

**Short syntax for parameterized aggregation:**

```json
{
  "aggregation": ["quantile_(0.95)__wind_speed_at_height_level_100m"]
}
```

{% hint style="warning" %}
`group_by` is required when `aggregation` is set.
{% endhint %}

***

#### Common Aggregation Patterns

**1. Spatial average at each timestep:**

```json
{
  "geo": { "type": "market_zone", "value": "DE" },
  "group_by": ["model", "init_time", "time"],
  "aggregation": ["avg"]
}
```

**2. Daily maximum over a region:**

```json
{
  "geo": {
    "type": "bounding_box",
    "value": [
      [45.0, 5.0],
      [50.0, 15.0]
    ]
  },
  "group_by": ["model", "daily"],
  "aggregation": ["max"]
}
```

**3. Statistics at a specific location (for ensemble models, e.g. `ept2_e`):**

```json
{
  "geo": { "type": "point", "value": [47.37, 8.54] },
  "group_by": ["model", "latitude", "longitude", "time"],
  "aggregation": ["avg", "std", "min", "max"]
}
```

***

### Weighting

Apply weighted aggregation based on capacity or population distribution.

#### `weighting` (optional)

**Type:** `object`

**Structure:**

```json
{
  "type": "wind_capacity" | "solar_capacity" | "population"
}
```

**Weighting types:**

1. **`wind_capacity`** - Weight by installed wind power capacity

   ```json
   {
     "weighting": { "type": "wind_capacity" }
   }
   ```
2. **`solar_capacity`** - Weight by installed solar power capacity

   ```json
   {
     "weighting": { "type": "solar_capacity" }
   }
   ```
3. **`population`** - Weight by population density

   ```json
   {
     "weighting": { "type": "population" }
   }
   ```

**Example - Wind capacity-weighted average:**

```json
{
  "models": ["ept2"],
  "geo": { "type": "market_zone", "value": "DE" },
  "init_time": "latest",
  "variables": ["wind_speed_at_height_level_100m"],
  "prediction_timedelta": { "start": 0, "end": 72 },
  "weighting": { "type": "wind_capacity" },
  "group_by": ["model", "init_time", "time"],
  "aggregation": ["avg"]
}
```

This computes the wind speed weighted by where wind turbines are located, giving more weight to regions with higher wind capacity and less weight to regions without production capacity.

{% hint style="warning" %}
Weighting only applies when using `aggregation` with `"avg"`. Other aggregation functions ignore the weighting parameter.
{% endhint %}

***

### Output Control

Control how results are formatted and returned.

#### `include_time` (optional)

Include the forecast valid time column in results.

**Type:** `boolean`\
**Default:** `false`

```json
{
  "include_time": true
}
```

**Result with `include_time: true`:**

```json
{
  "model": ["ept2", "ept2"],
  "init_time": ["2025-10-22T00:00:00Z", "2025-10-22T00:00:00Z"],
  "prediction_timedelta": [1, 2],
  "time": ["2025-10-22T01:00:00Z", "2025-10-22T02:00:00Z"],
  "air_temperature_at_height_level_2m": [285.3, 284.8]
}
```

{% hint style="info" %}
Automatically set to `true` if `time` is in `group_by` or `order_by`.
{% endhint %}

***

#### `time_zone` (optional)

IANA time zone for time formatting.

**Type:** `string`\
**Default:** `"UTC"`

```json
{
  "time_zone": "Europe/Berlin"
}
```

**Common time zones:**

* `"UTC"` (default)
* `"Europe/Berlin"`
* `"America/New_York"`
* `"America/Los_Angeles"`
* `"Asia/Tokyo"`

All `time` in responses are formatted in the specified time zone.

{% hint style="warning" %}
The time zone is **not** applied to `init_time`, which is always in UTC
{% endhint %}

***

#### `order_by` (optional)

Sort results by specific dimensions.

**Type:** `array of strings`

```json
{
  "order_by": ["model", "init_time", "prediction_timedelta"]
}
```

**Sortable dimensions:**

* `"model"`
* `"init_time"`
* `"time"`
* `"prediction_timedelta"`
* `"latitude"`
* `"longitude"`
* Any variable name (e.g., `"air_temperature_at_height_level_2m"`)

**Constraints:**

* When using `group_by`, `order_by` dimensions must be in the `group_by` list
* Required when using `pagination`

***

#### `pagination` (optional)

Limit the number of results returned.

**Type:** `object`

**Structure:**

```json
{
  "limit": 1000, // Max rows to return
  "offset": 0 // Number of rows to skip
}
```

**Example - Get first 1000 rows:**

```json
{
  "order_by": ["model", "init_time", "time"],
  "pagination": {
    "limit": 1000,
    "offset": 0
  }
}
```

**Example - Get next 1000 rows:**

```json
{
  "order_by": ["model", "init_time", "time"],
  "pagination": {
    "limit": 1000,
    "offset": 1000
  }
}
```

{% hint style="warning" %}
`order_by` is required when using `pagination`.
{% endhint %}

***

### Best Practices

#### Performance

1. **Use Arrow format for large queries:** Set `?format=arrow` in the URL for queries returning >10k rows
2. **Enable streaming for very large queries:** Add `&stream=true` with Arrow format for >100k rows
3. **Limit prediction\_timedelta range:** Request only the lead times you need
4. **Split historical data into multiple requests:** Fetch data in chunks

#### Cost Optimization

1. **Set `request_credit_limit`:** Prevents accidentally expensive queries (default: 50 credits)
2. **Aggregate when possible:** Grouped queries accessing many points cost less per point
3. **Select specific variables:** Don't request all variables if you only need a few

#### Query Construction

1. **Use `group_by` for spatial aggregates:** When querying regions, group by time dimensions
2. **Include `time` in results:** Set `include_time: true` for easier result interpretation
3. **Specify time zones:** Use `time_zone` to get times in your local timezone
4. **Order results:** Use `order_by` for predictable result ordering
5. **Test with small queries first:** Start with short time ranges, then expand

***

### Validation and Errors

The API validates all query parameters and returns helpful error messages.

**Common validation errors:**

* **Invalid model:** Model not available or not in your subscription

  ```json
  { "detail": "Model 'xyz' is not valid" }
  ```
* **Variable not supported:** Variable not available for selected model(s)

  ```json
  { "detail": "Variables ['xyz'] are not supported by all models" }
  ```
* **Invalid geo filter:** Geographic coordinates out of range

  ```json
  { "detail": "Latitude must be between -90 and 90" }
  ```
* **Missing required parameter:** Required field not provided

  ```json
  { "detail": "Field required: 'models'" }
  ```
* **Insufficient credits:** Not enough credits for the query

  ```json
  { "detail": "Insufficient credits. Available: 10.5. Required: 25.3" }
  ```
* **Response too large:** Query returns too many rows

  ```json
  {
    "detail": "Query exceeds maximum rows for JSON format (50000). Use format=arrow or add pagination."
  }
  ```

***

### Next Steps

* **See examples:** Check out the [collection of example queries](/api-v2/query-engine/examples.md)
* **Explore variables:** Use `GET /v1/forecast/meta` to see all available models and variables
* **Check availability:** Use `GET /v1/forecast/available-forecasts` to see available forecast times
* **Check the OpenAPI docs:** [Endpoint & data models](https://query.jua.ai/docs) ready for you to try out


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.jua.ai/api-v2/query-engine/query-structure.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
