# Model Evaluation

## Overview

This guide systematically evaluates weather model performance for energy market applications, emphasizing quantitative validation and operational testing.

## Evaluation Steps

Focus on metrics most relevant to your operations, such as power or price forecasts powered by Jua. If starting with weather data, prioritize real-life observations over analysis datasets like ERA5, as these often contain internal biases.

### Step 1: Model Assessment

1. Review model performance reports ([EPT-1.5](https://arxiv.org/abs/2410.15076), [EPT-2](https://arxiv.org/abs/2507.09703))
2. Decision: Proceed with evaluation.

{% hint style="info" %}
We recommend starting with our technical report to understand the model's capabilities before proceeding with your own evaluation.
{% endhint %}

### Step 2.1: Evaluate Your Impact Forecast

1. Generate generation or demand forecasts using model data
2. Compare with your existing provider
3. Decision: Proceed to trade evaluation.

### Step 2.2: Observations Benchmark

1. The best results are usually achieved by generating an impact forecast, but if you aim to have a weather comparison, we recommend comparing initial performance vs. station observations
2. Evaluate basic accuracy metrics
3. Decision: Continue to detailed testing?

### Step 3: Trading Signal Test and PnL Backtest

1. Simulate historical market scenarios and strategies and conduct paper trading tests.
2. Decision: Is ROI there?

## Example: Weather Model Testing Framework for Spot Markets

#### Approach 1: Generation or Demand Forecast

**Prerequisites:**

* Existing operational forecast using the current forecast provider
* New Impact forecast powered by Jua

**Key Metrics:**

* **Wind Power Forecast accuracy (12-36hr horizon)**
* Solar Power Forecast accuracy (12-36hr horizon)
* Demand forecast accuracy (12-36hr horizon)
* Price forecast accuracy (12-36hr horizon)

**Evaluation Approach, example Day-Ahead Germany:**

1. Jua EPT-2 & 3rd party forecast based Wind Power Forecast performance against realized power data in Germany
2. Compare each model performance against actuals
3. Calculate improvement metrics between models

#### Approach 2: Ground Truth Validation

**Prerequisites:**

* Weather station data
* Defined forecast window (typically 12-36hr for day-ahead markets)

**Testing Protocol:**

1. Forecast Evaluation:
   * Compare ECMWF HRES or your preferred model 12-36hr forecasts against ground truth
   * Compare Jua 12-36hr forecasts against ground truth
2. Performance Analysis:
   * Calculate error metrics (RMSE, MAE, bias)
   * Analyze temporal patterns in forecast accuracy
   * Identify systematic biases or errors

### Approach 3: Advanced Validation

**Option A: Live Strategy Testing & Historical PnL Backtesting**

1. Setup:
   * Historical market simulation with trading strategy
   * Trading strategy implementation
2. Analysis:
   * Calculate theoretical PnL
   * Stress test under different market conditions
   * Sensitivity analysis to forecast errors


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.jua.ai/guides/evaluation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
