Response Formats

Response Formats

The Jua Query Engine supports three response formats optimized for different use cases: JSON, Apache Arrow, and Arrow Streaming. This guide explains each format and provides examples for working with responses in Python and JavaScript.


Format Overview

Feature
JSON
Arrow
Arrow Streaming

Query Parameter

?format=json

?format=arrow

?format=arrow&stream=true

Content-Type

application/json

application/vnd.apache.arrow.stream

application/vnd.apache.arrow.stream

Max Rows

50,000

5,000,000

1,000,000,000

Use Case

Small queries, quick testing

Medium to large datasets

Very large datasets

Memory Usage

Higher

Lower

Lowest (incremental)

Parse Speed

Slower

Faster

Fastest

Best For

Quick exploration, small data

Data analysis, medium data

Production, historical data


JSON Format

Description

JSON format returns data in a columnar structure where each column is a key with an array of values. This format is human-readable and easy to work with but becomes inefficient for large datasets.

Request

Add ?format=json to the query endpoint (this is the default):

POST /v1/forecast/data?format=json

Response Structure

{
  "model": ["ept2", "ept2", "ept2"],
  "init_time": [
    "2025-10-22T00:00:00Z",
    "2025-10-22T00:00:00Z",
    "2025-10-22T00:00:00Z"
  ],
  "latitude": [47.37, 47.37, 47.37],
  "longitude": [8.54, 8.54, 8.54],
  "prediction_timedelta": [0, 60, 120],
  "time": [
    "2025-10-22T00:00:00Z",
    "2025-10-22T01:00:00Z",
    "2025-10-22T02:00:00Z"
  ],
  "air_temperature_at_height_level_2m": [285.3, 284.8, 284.5],
  "wind_speed_at_height_level_100m": [12.5, 13.2, 13.8]
}

Limitations

  • Maximum 50,000 rows

  • Larger memory footprint compared to Arrow

  • Slower parsing for large datasets

  • Not suitable for production queries with large data volumes

When to Use

  • Quick testing and exploration

  • Small queries (<10k rows)

  • Debugging query structure

  • Simple web applications with limited data needs


Apache Arrow Format

Description

Apache Arrow is a high-performance columnar data format designed for efficient data interchange. It provides zero-copy reads and significantly faster parsing compared to JSON.

Request

Add ?format=arrow to the query endpoint:

POST /v1/forecast/data?format=arrow

Response Format

Returns a binary stream in Apache Arrow IPC format. The response must be parsed using an Arrow library.

Limitations

  • Maximum 5,000,000 rows (non-streaming)

  • Requires Arrow library to parse

  • Binary format (not human-readable)

When to Use

  • Medium to large datasets (10k - 1M rows)

  • Data science workflows with pandas/polars

  • When performance is important

  • Batch processing


Arrow Streaming Format

Description

Arrow Streaming builds on the Arrow format but streams data in chunks, allowing you to process datasets larger than available memory. This is the most efficient format for very large queries.

Request

Add ?format=arrow&stream=true to the query endpoint:

POST /v1/forecast/data?format=arrow&stream=true

Response Format

Returns a chunked binary stream in Apache Arrow IPC format. Data arrives incrementally and can be processed as it's received.

Limitations

  • Requires Arrow library with streaming support

  • Binary format (not human-readable)

  • Cannot easily inspect data during transfer

When to Use

  • Very large datasets (>1M rows)

  • Historical data queries

  • Production applications

  • When memory is constrained

  • Long-running queries

Are you using Python? Jua's Python SDK handles requests and streaming responses for you.


Choosing the Right Format

Decision Flow

Query returning < 10k rows?
  └─ Yes → Use JSON (easiest to work with)
  └─ No → Continue

Query returning < 10M rows?
  └─ Yes → Use Arrow (good performance)
  └─ No → Use Arrow Streaming (more complex to get started)

Recommendations by Use Case

Use Case
Recommended Format
Reason

API testing in browser

JSON

Easy to inspect

Dashboard (live data)

JSON or Arrow

Fast updates, moderate data

Data analysis (Jupyter)

Arrow

Fast pandas conversion

Historical data download

Arrow Streaming

Handles large volumes

Production ETL pipeline

Arrow or Arrow Streaming

Most efficient


Examples

Prerequisites

pip install requests pandas pyarrow

Example 1: JSON Format

import requests
import pandas as pd

# Query configuration
url = "https://query.jua.ai/v1/forecast/data"
api_key = "your_api_key_id:your_api_key_secret"

headers = {
    "X-API-Key": api_key,
    "Content-Type": "application/json"
}

payload = {
    "models": ["ept2"],
    "geo": {
        "type": "point",
        "value": [47.37, 8.54],
        "method": "nearest"
    },
    "init_time": "latest",
    "variables": [
        "air_temperature_at_height_level_2m",
        "wind_speed_at_height_level_100m"
    ],
    "prediction_timedelta": {
        "start": 0,
        "end": 72
    },
    "timedelta_unit": "h",
    "include_time": True
}

# Make request with JSON format (default)
response = requests.post(
    f"{url}?format=json",
    headers=headers,
    json=payload
)

# Check response
if response.ok:
    # Parse JSON response
    data = response.json()

    # Convert to pandas DataFrame
    df = pd.DataFrame(data)

    print(f"Downloaded {len(df)} rows")
    print(f"Columns: {list(df.columns)}")
    print("\nFirst few rows:")
    print(df.head())

    # Data is ready to use
    avg_temp = df['air_temperature_at_height_level_2m'].mean()
    print(f"\nAverage temperature: {avg_temp:.2f} K")
else:
    print(f"Error: {response.status_code}")
    print(response.text)

Output:

Downloaded 73 rows
Columns: ['model', 'init_time', 'latitude', 'longitude', 'prediction_timedelta', 'time', 'air_temperature_at_height_level_2m', 'wind_speed_at_height_level_100m']

First few rows:
   model             init_time  latitude  longitude  prediction_timedelta                 time  air_temperature_at_height_level_2m  wind_speed_at_height_level_100m
0   ept2  2025-10-22T00:00:00Z     47.37       8.54                     0  2025-10-22T00:00:00Z                              285.3                                 12.5
1   ept2  2025-10-22T00:00:00Z     47.37       8.54                     1  2025-10-22T01:00:00Z                              284.8                                 13.2
2   ept2  2025-10-22T00:00:00Z     47.37       8.54                     2  2025-10-22T02:00:00Z                              284.5                                 13.8

Average temperature: 285.12 K

Example 2: Apache Arrow Format

import requests
import pandas as pd
import pyarrow as pa
import pyarrow.ipc as pa_ipc

# Query configuration (same as JSON example)
url = "https://query.jua.ai/v1/forecast/data"
api_key = "your_api_key_id:your_api_key_secret"

headers = {
    "X-API-Key": api_key,
    "Content-Type": "application/json"
}

payload = {
    "models": ["ept2"],
    "geo": {
        "type": "market_zone",
        "value": "DE"
    },
    "init_time": {
        "start": "2025-10-01T00:00:00Z",
        "end": "2025-10-07T23:59:59Z"
    },
    "variables": ["wind_speed_at_height_level_100m"],
    "prediction_timedelta": {
        "start": 0,
        "end": 168
    },
    "timedelta_unit": "h",
    "group_by": ["model", "init_time", "time"],
    "aggregation": ["avg"],
    "weighting": {"type": "wind_capacity"},
    "include_time": True
}

# Make request with Arrow format
response = requests.post(
    f"{url}?format=arrow",
    headers=headers,
    json=payload
)

if response.ok:
    # Parse Arrow response
    arrow_buffer = pa.py_buffer(response.content)

    # Read Arrow IPC stream
    with pa_ipc.open_stream(arrow_buffer) as reader:
        # Read all batches into a table
        table = reader.read_all()

    # Convert to pandas DataFrame
    df = table.to_pandas()

    print(f"Downloaded {len(df)} rows")
    print(f"Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
    print(f"\nArrow schema: {table.schema}")
    print("\nDataFrame:")
    print(df.head())

    # Work with the data
    print(f"\nAverage wind speed: {df['wind_speed_at_height_level_100m'].mean():.2f} m/s")
else:
    print(f"Error: {response.status_code}")
    print(response.text)

Output:

Downloaded 1176 rows
Memory usage: 0.18 MB

Arrow schema: model: string
init_time: timestamp[us, tz=UTC]
time: timestamp[us, tz=UTC]
wind_speed_at_height_level_100m: double

DataFrame:
  model             init_time                 time  wind_speed_at_height_level_100m
0  ept2  2025-10-01T00:00:00Z  2025-10-01T00:00:00Z                             8.23
1  ept2  2025-10-01T00:00:00Z  2025-10-01T01:00:00Z                             8.45
2  ept2  2025-10-01T00:00:00Z  2025-10-01T02:00:00Z                             8.67

Average wind speed: 8.52 m/s

Example 3: Arrow Streaming Format

import requests
import pandas as pd
import pyarrow as pa
import pyarrow.ipc as pa_ipc

# Query configuration
url = "https://query.jua.ai/v1/forecast/data"
api_key = "your_api_key_id:your_api_key_secret"

headers = {
    "X-API-Key": api_key,
    "Content-Type": "application/json",
    "Accept-Encoding": "identity"  # Disable compression for streaming
}

payload = {
    "models": ["ept2", "aifs"],
    "geo": {
        "type": "market_zone",
        "value": ["DE", "FR", "IT"]
    },
    "init_time": {
        "start": "2025-01-01T00:00:00Z",
        "end": "2025-03-31T23:59:59Z"
    },
    "variables": [
        "air_temperature_at_height_level_2m",
        "wind_speed_at_height_level_100m"
    ],
    "prediction_timedelta": {
        "start": 0,
        "end": 168
    },
    "timedelta_unit": "h",
    "group_by": ["model", "market_zone", "init_time", "time"],
    "aggregation": ["avg"],
    "weighting": {"type": "wind_capacity"},
    "include_time": True,
    "order_by": ["model", "market_zone", "init_time", "time"]
}

# Make request with Arrow streaming format
response = requests.post(
    f"{url}?format=arrow&stream=true",
    headers=headers,
    json=payload,
    stream=True,  # Important: enable streaming in requests
    timeout=(10, 600)  # (connect timeout, read timeout)
)

if response.ok:
    # Enable decoding for the raw stream
    response.raw.decode_content = True

    # Read Arrow stream incrementally
    with pa_ipc.open_stream(response.raw) as reader:
        # Read all batches into a table
        table = reader.read_all()

    # Convert to pandas DataFrame
    df = table.to_pandas()

    print(f"Downloaded {len(df)} rows")
    print(f"Columns: {list(df.columns)}")
    print(f"Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
    print("\nFirst few rows:")
    print(df.head())

    # Save to file for later use
    df.to_parquet("forecast_data.parquet", compression="snappy")
    print("\nSaved to forecast_data.parquet")
else:
    print(f"Error: {response.status_code}")
    print(response.text)

Output:

Downloaded 45360 rows
Columns: ['model', 'market_zone', 'init_time', 'time', 'air_temperature_at_height_level_2m', 'wind_speed_at_height_level_100m']
Memory usage: 6.92 MB

First few rows:
   model market_zone             init_time                 time  air_temperature_at_height_level_2m  wind_speed_at_height_level_100m
0   aifs          DE  2025-01-01T00:00:00Z  2025-01-01T00:00:00Z                              278.45                                 9.23
1   aifs          DE  2025-01-01T00:00:00Z  2025-01-01T01:00:00Z                              278.12                                 9.45
2   aifs          DE  2025-01-01T00:00:00Z  2025-01-01T02:00:00Z                              277.89                                 9.67

Saved to forecast_data.parquet

Next Steps

  • Query Structure: Learn how to construct queries in docs-query-structure.md

  • Examples: See complete examples in docs-examples.md

  • API Reference: Explore all endpoints in the OpenAPI documentation

Last updated