Element | Element_description |
---|---|

$mean_std | data frame with calculated metrics for the selected regression methods. For each regression method and each calculated metric, mean and standard deviation are given |

$std_ranks | data frame with ranks of calculated metrics: mean rank and share of rank_1 are given |

$edge_results | data frame with calculated performance metrics for the central-edge test. The central part of the data represents the calibration data, while the edge data, i.e. extreme values, represent the validation data. Different regression models are calibrated using the central data and validated for the edge (extreme) data. This test is particularly important to assess the performance of models for the prediction of the extreme data. The share of the edge (extreme) data is defined with the edge_share argument |

$holdout_results | calculated metrics for the holdout data |

$bias_cal | ggplot object of mean bias for calibration data |

$bias_val | ggplot object of mean bias for validation data |

$transfer_functions | ggplot or plotly object with transfer functions of different methods, facet is used to separate methods |

$transfer_functions_together | ggplot or plotly object with transfer functions of methods plotted together |

$parameter_values | a data frame with specifications of parameters used for different regression methods |

$PCA_output | princomp object: the result output of the PCA analysis |

$reconstructions | ggplot object: reconstructed dependent variable based on the dataset_complete argument, facet is used to split plots by methods |

$reconstructions_together | ggplot object: reconstructed dependent variable based on the dataset_complete argument, all reconstructions are on the same plot |

$normal_QQ_cal | normal q-q plot for calibration data |

$normal_QQ_holdout | normal q-q plot for holdout data |

$normal_QQ_edge | normal q-q plot for edge data |

$residuals_vs_fitted_cal | residuals vs fitted values plot for calibration data |

$residuals_vs_fitted_holdout | residuals vs fitted values plot for holdout data |

$residuals_vs_fitted_edge | residuals vs fitted values plot for edge data |

For the basic example, we will use the dataset with the Mean Vessel Area (MVA) chronology and the mean April temperature, the dataset is saved as dataset_MVA. All five regression methods will be compared with 10-fold cross-validation repeated 1 times. The *optimize* argument is set to TRUE, therefore all tuning parameters will be defined in a preliminary optimization phase. After the comparison, the output elements are retrieved with the “$” operator.

```
# Load the dendroTools R package
library(dendroTools)
# Load the data
data(dataset_MVA)
# Basic example
basic_example <- compare_methods(formula = T_Apr ~ MVA, dataset = dataset_MVA, k = 10, repeats = 1, optimize = TRUE, MT_committees_vector = c(1), RF_maxnodes_vector = c(5), RF_nodesize_vector = c(10))
```

```
# The data frame with mean and standard deviation of performance metrics for the calibration and the validation data
kable(basic_example$mean_std)
```

Metric | Period | Mean BRNN | Std BRNN | Mean MLR | Std MLR | Mean MT | Std MT | Mean RF | Std RF | |
---|---|---|---|---|---|---|---|---|---|---|

1 | r | cal | 0.607 | 0.026 | 0.596 | 0.026 | 0.596 | 0.026 | 0.712 | 0.016 |

2 | r | val | 0.632 | 0.286 | 0.635 | 0.278 | 0.635 | 0.278 | 0.567 | 0.321 |

3 | RMSE | cal | 1.162 | 0.036 | 1.174 | 0.036 | 1.174 | 0.036 | 1.032 | 0.028 |

4 | RMSE | val | 1.186 | 0.326 | 1.193 | 0.335 | 1.193 | 0.335 | 1.235 | 0.318 |

5 | RRSE | cal | 0.794 | 0.020 | 0.802 | 0.020 | 0.802 | 0.020 | 0.706 | 0.017 |

6 | RRSE | val | 0.879 | 0.219 | 0.884 | 0.221 | 0.884 | 0.221 | 0.921 | 0.244 |

7 | d | cal | 0.716 | 0.024 | 0.709 | 0.025 | 0.709 | 0.025 | 0.791 | 0.017 |

8 | d | val | 0.669 | 0.149 | 0.664 | 0.147 | 0.664 | 0.147 | 0.653 | 0.165 |

10 | RE | val | 0.237 | 0.426 | 0.229 | 0.432 | 0.230 | 0.431 | 0.156 | 0.530 |

12 | CE | val | 0.183 | 0.420 | 0.175 | 0.427 | 0.175 | 0.427 | 0.097 | 0.524 |

14 | DE | val | -0.300 | 1.053 | -0.305 | 1.027 | -0.305 | 1.027 | -0.500 | 1.600 |

```
# The data frame with non-parametric estimation of different methods: average rank and share of rank one
kable(basic_example$rank)
```

Metric | Period | Avg rank BRNN | %rank_1 BRNN | Avg rank MLR | %rank_1 MLR | Avg rank MT | %rank_1 MT | Avg rank RF | %rank_1 RF | |
---|---|---|---|---|---|---|---|---|---|---|

13 | r | cal | 2.0 | 0.0 | 3.3 | 0.0 | 3.6 | 0.0 | 1.0 | 1.0 |

14 | r | val | 2.4 | 0.2 | 1.9 | 0.3 | 2.3 | 0.3 | 3.4 | 0.2 |

7 | RMSE | cal | 2.0 | 0.0 | 3.0 | 0.0 | 4.0 | 0.0 | 1.0 | 1.0 |

8 | RMSE | val | 2.3 | 0.3 | 2.1 | 0.4 | 2.3 | 0.1 | 3.3 | 0.2 |

9 | RRSE | cal | 2.0 | 0.0 | 3.0 | 0.0 | 4.0 | 0.0 | 1.0 | 1.0 |

10 | RRSE | val | 2.3 | 0.3 | 2.1 | 0.4 | 2.3 | 0.1 | 3.3 | 0.2 |

11 | d | cal | 2.2 | 0.0 | 3.4 | 0.0 | 3.4 | 0.0 | 1.0 | 1.0 |

12 | d | val | 2.1 | 0.3 | 2.6 | 0.1 | 2.4 | 0.3 | 2.9 | 0.3 |

6 | RE | val | 2.3 | 0.3 | 2.1 | 0.4 | 2.3 | 0.1 | 3.3 | 0.2 |

2 | CE | val | 2.3 | 0.3 | 2.1 | 0.4 | 2.3 | 0.1 | 3.3 | 0.2 |

4 | DE | val | 2.3 | 0.3 | 2.1 | 0.4 | 2.3 | 0.1 | 3.3 | 0.2 |

```
# See the transfer functions, separated by facets. This is a ggplot object and could be easily customized.
library(ggplot2)
basic_example$transfer_functions +
xlab(expression(paste('MVA [',mm^2,']'))) +
ylab("April Mean Temperature [°C]")
```

```
# See the transfer functions, plotted together. This is a ggplot object and could be easily customized.
basic_example$transfer_functions_together +
xlab(expression(paste('MVA [',mm^2,']'))) +
ylab("April Mean Temperature [°C]")
```

```
# The data frame of optimized tuning parameters for different methods
kable(basic_example$parameter_values)
```

Method | Parameter | Considered values | Selected value |
---|---|---|---|

BRNN | BRNN_neurons | 1, 2, 3 | 2 |

MT | MT_committees | 1 | 1 |

MT | MT_neighbors | 0, 5 | 5 |

MT | MT_rules | 100, 200 | 200 |

MT | MT_unbiased | TRUE, FALSE | TRUE |

MT | MT_extrapolation | 100 | 100 |

MT | MT_sample | 0 | 0 |

RF | RF_mtry | 1 | 1 |

RF | RF_maxnodes | 5 | 5 |

RF | RF_ntree | 100 250 500 | 250 |

RF | RF_nodesize | 10 | 10 |

```
# For calibration data, there are residual diagnosti plots available. Similar plots are available also for holdout and edge data.
library(gridExtra)
grid.arrange(basic_example$normal_QQ_cal, basic_example$residuals_vs_fitted_cal, ncol = 2)
```

`## `geom_smooth()` using formula 'y ~ x'`

Principal Component Analysis (PCA) is commonly used with tree-ring data to reduce the full set of original tree-ring chronologies to a more manageable set of transformed variables. These transformed variables, the set of principal component scores, are then used as predictors in climate reconstruction models. The PCA also acts to strengthen the common regional-scale climate response within a group of tree-ring chronologies by concentrating the common signal in the components with the largest eigenvalues.

To use PCA regression within the *compare_methods()*, set the argument *PCA_transformation* to TRUE. All independent variables in the *dataset* data frame will be transformed using the PCA transformation. If the parameter *log_preprocess* is set to TRUE, variables will be transformed with logarithmic transformation before used in PCA. With the argument *components_selection*, we specify how to select PC scores that will be used as predictors. There are three options: “automatic”, “manual” and “plot_selection”. If argument is set to “automatic”, all PC scores with eigenvalues above 1 will be selected. This threshold could be changed by changing the *eigenvalues_threshold* argument. If argument is set to “manual”, user should set the number of components with *N_components* argument. If *components_selection* is set to “plot_selection”, A scree plot will be shown, and a user must manually enter the number of components to be used as predictors. The latter seems to be the most reasonable choice.

For the example with PCA, we use dataset dataset_MVA_individual, which consist of 10 individual Mean Vessel Area (MVA) chronologies and mean April temperature for the Ljubljana region, Slovenia. The dataset has 56 observations. The selection of components is set to “manual”, *N_components* to be used in the later analysis is set to 2. In this example, the 5-fold cross-validation with 1 repeat will be used to compare MT, MLR and BRNN. The subset of methods could be set with the *methods* argument. The argument *optimize* is set to TRUE, therefore all tuning parameters will be set automatically.

```
# Load the dendroTools R package
library(dendroTools)
# Load data
data(dataset_MVA_individual)
# Example PCA
example_PCA <- compare_methods(formula = T_Apr ~ ., dataset = dataset_MVA_individual, k = 5, repeats = 1, optimize = TRUE, methods = c("MLR", "MT", "BRNN"), PCA_transformation = TRUE, components_selection = "manual", N_components = 2, seed_factor = 5, MT_committees_vector = c(1), RF_maxnodes_vector = c(5), RF_nodesize_vector = c(10))
```

```
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 1.8691406 1.5007125 1.1867201 0.87168877 0.79432916
## Proportion of Variance 0.3493686 0.2252138 0.1408305 0.07598413 0.06309588
## Cumulative Proportion 0.3493686 0.5745824 0.7154129 0.79139704 0.85449292
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Standard deviation 0.68818504 0.61467322 0.57233447 0.49611738 0.173060035
## Proportion of Variance 0.04735987 0.03778232 0.03275667 0.02461325 0.002994978
## Cumulative Proportion 0.90185279 0.93963510 0.97239178 0.99700502 1.000000000
```

Metric | Period | Mean BRNN | Std BRNN | Mean MLR | Std MLR | Mean MT | Std MT | |
---|---|---|---|---|---|---|---|---|

1 | r | cal | 0.646 | 0.025 | 0.639 | 0.024 | 0.631 | 0.024 |

2 | r | val | 0.543 | 0.227 | 0.542 | 0.208 | 0.467 | 0.315 |

3 | RMSE | cal | 1.157 | 0.068 | 1.164 | 0.070 | 1.176 | 0.083 |

4 | RMSE | val | 1.234 | 0.286 | 1.245 | 0.288 | 1.304 | 0.259 |

5 | RRSE | cal | 0.764 | 0.021 | 0.768 | 0.020 | 0.775 | 0.020 |

6 | RRSE | val | 0.971 | 0.247 | 0.975 | 0.226 | 1.043 | 0.318 |

7 | d | cal | 0.743 | 0.022 | 0.745 | 0.020 | 0.738 | 0.020 |

8 | d | val | 0.663 | 0.114 | 0.666 | 0.107 | 0.628 | 0.157 |

10 | RE | val | 0.286 | 0.254 | 0.282 | 0.214 | 0.174 | 0.371 |

12 | CE | val | 0.009 | 0.490 | 0.009 | 0.447 | -0.169 | 0.690 |

14 | DE | val | -0.178 | 0.692 | -0.177 | 0.657 | -0.397 | 0.941 |

The *compare_methods()* enables the comparison of methods for regression problems with two or more independent variables. However, users should select multiple proxies reasonably and with caution, since there is nothing to prevent from including colinear variables. To perform the comparison of methods with multiproxy variables, simply include dataset with more than one independent variable and specify the relationship with the *formula* argument. If metrics on validation data are much lower than on calibration data, there is a problem of overfitting and users should exclude some independent variables and repeat the analysis.

For the *multiproxy_example*, we will use *example_dataset_1*, which consist of the Mean Vessel Area (MVA) chronology and two temperature variables, the mean April temperature (T_APR) and the mean temperature from August to September (T_aug_sep) from the previous growing season. To compare methods with multiproxy approach, specify formula with two independent variables, such as *formula = MVA ~ T_APR + T_aug_sep*. Here, we will compare MT, BRNN and RF with 10-fold cross-validation and 1 repeat.

```
# Load the dendroTools R package
library(dendroTools)
# Load data
data(example_dataset_1)
# Example multiproxy
example_multiproxy <- compare_methods(formula = MVA ~ T_APR + T_aug_sep, dataset = example_dataset_1, k = 10, repeats = 1, optimize = FALSE, methods = c("MT", "BRNN", "RF"))
```

Metric | Period | Mean BRNN | Std BRNN | Mean MT | Std MT | Mean RF | Std RF | |
---|---|---|---|---|---|---|---|---|

1 | r | cal | 0.723 | 0.018 | 0.693 | 0.036 | 0.809 | 0.014 |

2 | r | val | 0.690 | 0.243 | 0.639 | 0.290 | 0.634 | 0.234 |

3 | RMSE | cal | 0.405 | 0.013 | 0.422 | 0.018 | 0.362 | 0.013 |

4 | RMSE | val | 0.409 | 0.112 | 0.441 | 0.138 | 0.449 | 0.098 |

5 | RRSE | cal | 0.690 | 0.019 | 0.719 | 0.034 | 0.618 | 0.017 |

6 | RRSE | val | 0.792 | 0.325 | 0.847 | 0.343 | 0.860 | 0.273 |

7 | d | cal | 0.817 | 0.015 | 0.795 | 0.031 | 0.845 | 0.011 |

8 | d | val | 0.729 | 0.186 | 0.685 | 0.204 | 0.660 | 0.152 |

10 | RE | val | 0.326 | 0.694 | 0.231 | 0.751 | 0.253 | 0.619 |

12 | CE | val | 0.277 | 0.704 | 0.176 | 0.761 | 0.193 | 0.623 |

14 | DE | val | 0.026 | 0.918 | -0.101 | 0.998 | -0.149 | 0.845 |

Reconstructions of past climate conditions include reconstructions of past temperature, precipitation, vegetation, streamflow, sea surface temperature, and other climatic or climate-dependent conditions. With the *compare_methods()* it is possible to directly reconstruct the dependent variable specified with the *formula* argument. To do so, supply additional complete dataset with tree-ring chronology that goes beyond the observations of instrumental records.

For the *example_reconstruction*, we use *data_TRW* dataset, which includes a tree-ring width (TRW) chronology of *Pinus nigra* from Albania and mean June-July temperature from Albania. The complete TRW chronology is supplied with the *dataset_TRW_complete*. In this example, we will compare RF and MLR models with 3-fold cross-validation and 1 repeat.

```
# Load the dendroTools R package
library(dendroTools)
# Load the data
data(dataset_TRW)
data(dataset_TRW_complete)
# Example reconstruction
example_reconstruction <- compare_methods(formula = T_Jun_Jul ~ TRW, dataset = dataset_TRW, k = 3, optimize = FALSE, methods = c("MLR", "BRNN", "MT", "RF"), dataset_complete = dataset_TRW_complete)
```

The comparison among different methods usually shows, that different regression techniques perform relatively similar for the central-calibration data, while for the extreme values, predictions differ. Therefore, we implemented additional central-edge test, where the central part of the dataset represents the calibration data, while the edge data, i.e. extreme values, represent the edge data. Different regression models are calibrated using the central data and validated using the edge (extreme) data. This test is particularly important to assess the performance of different methods for the prediction of the extreme and out of calibration data. The share of the edge (extreme) data is defined with the *edge_share* argument. To get the results for the central-edge test, use example_reconstruction$edge_results.

Metric | Period | BRNN | MLR | MT | RF |
---|---|---|---|---|---|

r | cal_central | 0.648 | 0.645 | 0.645 | 0.775 |

r | val_edge | 0.756 | 0.749 | 0.749 | 0.761 |

RMSE | cal_central | 0.687 | 0.689 | 0.689 | 0.576 |

RMSE | val_edge | 0.752 | 0.918 | 0.918 | 0.606 |

RRSE | cal_central | 0.762 | 0.764 | 0.764 | 0.639 |

RRSE | val_edge | 0.822 | 1.003 | 1.003 | 0.663 |

d | cal_central | 0.750 | 0.751 | 0.751 | 0.841 |

d | val_edge | 0.861 | 0.831 | 0.831 | 0.874 |

RE | val_edge | 0.326 | -0.004 | -0.003 | 0.562 |

CE | val_edge | 0.324 | -0.006 | -0.006 | 0.561 |

DE | val_edge | -0.066 | -0.588 | -0.588 | 0.307 |

In this example, MLR and MT had worse validation results than RF and BRNN. This indicates that linear interpolation for extreme values is not the best choice.

Machine learning methods have several tuning parameters that need to be set, e.g. the number of neurons for the BRNN (parameter *BRNN_neurons = 2*). By default the *optimize* argument is set to *TRUE*, therefore all parameters will be automatically optimized in a preliminary cross-validation phase, where different combinations of parameters are tested and the best combination for each method is later used for the final model comparison. Each parameter has a pre-defined vector of possible values, however, this vector of possible values could be extended and therefore a wider space of tuned values could be explored. To change the vector of possible values for the *BRNN_neurons* parameter, use e.g. *BRNN_neurons_vector = c(1, 2, 3, 4, 5)*. Bellow, see the table of all tuning parameters together with the vectors of possible values.

Method | Parameter | Vector_for_optimization |
---|---|---|

BRNN | BRNN_neurons | BRNN_neurons_vector |

MT | MT_committees | MT_committees_vector |

MT | MT_neighbors | MT_neighbors_vector |

MT | MT_rules | MT_rules_vector |

MT | MT_unbiased | MT_unbiased_vector |

MT | MT_extrapolation | MT_extrapolation_vector |

MT | MT_sample | MT_sample_vector |

RF | RF_mtry | RF_mtry_vector |

RF | RF_maxnodes | RF_maxnodes_vector |

RF | RF_ntree | RF_ntree_vector |

RF | RF_nodesize | RF_nodesize_vector |

To set the tuning parameters manually, set the parameter *optimize* to *FALSE* and supply the selected value of each tuning parameter with the corresponding argument (if not, the default value will be used). Here is a simple example, where tuning parameters are set manually.

```
# Load the dendroTools R package
library(dendroTools)
# Load the data
data(example_dataset_1)
example_optimize <- compare_methods(formula = MVA ~ T_APR, dataset = example_dataset_1, k = 5, repeats = 2, optimize = FALSE, BRNN_neurons = 1, MT_committees = 1, MT_neighbors = 0, MT_rules = 100, MT_unbiased = FALSE, MT_extrapolation = 100, MT_sample = 0, RF_mtry = 1, RF_ntree = 100, RF_maxnodes = 20, seed_factor = 5)
```

```
# The data frame of tuning parameters, as defined by the user
kable(example_optimize$parameter_values)
```

Method | Parameter | Selected value |
---|---|---|

BRNN | BRNN_neurons | 1 |

MT | MT_committees | 1 |

MT | MT_neighbors | 0 |

MT | MT_rules | 100 |

MT | MT_unbiased | FALSE |

MT | MT_extrapolation | 100 |

MT | MT_sample | 0 |

RF | RF_mtry | 1 |

RF | RF_maxnodes | 20 |

RF | RF_ntree | 100 |

RF | RF_nodesize | 1 |