REN R 690

Canonical Correspondence Analysis

Cca (vegan)
Canonical Correspondence analysis was used to analyze the relationship between the phenological (brown-down date) and environmental variables (geographic and climatic variables).  
(adapted from R Documentation)
Function cca (vegan) performs correspondence analysis (as a rotation technique), or optionally constrained correspondence analysis (a.k.a. canonical correspondence analysis), or optionally partial constrained correspondence analysis. Function rda performs redundancy analysis, or optionally principal components analysis. These are both very popular ordination techniques in community ecology. 

The assumption of cca is a unimodal distribution of the data response (Y~X).  cca links the community data matrix (Y) with the constraining matrix (environmental variables), with a conditioning matrix, which can be missing.  The function calculate data matrix in a Chi-square transformed data matrix subjected to weighted linear regression on X variables, and submit fitted values to Y variables analysis via singular value decomposition (svd).  It focuses on the relations between X and Y and provides an automated interpretation of the ordination axes (Ter Braak, 1986). 
Cca only display the part that can be explained by the constraints.  cca is not selective of the imported environmental variables as constraints which will include all in the calculation.  Cca is preferred when there is a clear priori hypothesis on constraints while the major structure of the data set is not important. 

The biplot show the ordination of brown-down date and environmental variables (arrows).  The horizontal axis is the CCA 1 (canonical coefficient 1) and the vertical axis is CCA 2.  Brown-down data points are shown away from the origin point.  The color legend is the same as mentioned before.  CCA 1 and CCA 2 explain 49.7% of the variance.  Green points representing MN aspen are long a set of environmental variable gradient: longitude, MAT, MSP, eFFP, DD>5, etc.  The black and red points (FH and BP) are more frequently distributed in the middle between MN and TP points, which shows a geographical gradient of the study area.  Latitude is a significant constraint negatively related with the distribution of MN brown-down points. And in the correlation image, the brown-down date negatively correlates with the latitude. 
Picture
Picture
Picture
Figure 1 Biplot of CCA1 and CCA2. 
Table 1 Eigenvalues, and their contribution to the mean squared contingency coefficient.
Picture
The cca function of the relationship between brown-down date versus climatic variables (Longitude, Latitude, MAT, MAP, MSP, DD_5, NFFD, eFFP and FFP).   The biplot below shows the vectors of different climatic variables as well as the brown-down date colored in the same legend as figures above. 
The folowing maps show the spatial distribution of the brown-down data points by cca1 and cca 2.  The color scheme in first and second maps is by site scores (brown-down date as "Y").  THe color scheme in the third and fourth maps is by site constraints (environmental variables as "X"). 
Picture
Picture
Picture
Picture
Figure 2  Brown-down point in the landscape by cca1 and cca2 (site scores and site constraints).   The color scheme show the magnitude of the cca vector value. 

Principle Component Analysis 

 PCA is a rotation method to summary the data content, by rotating the dataset to explain the most variance in the original variables.  The component loadings (eigenvectors) show the correlation with the original variables.  The principle components summarize multiple variables to simpler describe the major gradients and relations of variables.  This analysis is done with SAS procedure. 

PCA biplots of mean brown-down date and environmental variables (e.g., longitude, latitude, MWMT, MCMT, TD, MAP, MSP, AHM, SHM, DD_0, DD_5, NFFD, eFFP, FFP, and PAS) showed no significant  relationship between mean brown-down date and some climatic variables such as TD, SHM and PAS.  The relationship between brown-down date versus DD_0 is weaker than that between bud set and DD_0.  Some variables such as NFFD, DD_5, latitude, MSP and MAT have more significant correlation with brown-down date, which is similar as the cases of bud set. 

The maps of Principle components show that:
1. PC 1 is more related with latitudinal and heat-moisture conditions across the west of Canada, which explains 63% of the variance;
2. PC 2 more describes the continentality factor, which accumutively explains 83.32%.
Table 2 Eigenvectors for different principle components. 
Picture
Table 3  Eigenvalues of the Correlation Matrix for Pricinple Component 1-17. 
Picture
 
However, from the scatter plot, the relationship between brown-down dates versus daylight length is more obvious, which indicate the main trigger of brown-down could be this factor.  This also provides evidences of the spatial correlation of the remote sensing data and the latitude.  Because the leaf colorfulness and leaf senescence can happen with large temporal variation, for example one month or very sudden without a fall color process, the brown-down data is selective to present the leaf phenology over large area.   Latitude and early winter frost conditions as well as the daylight trigger, play important role in the large scale leaf phenology. 
Picture
Figure 3 Biplot of Prin1 and Prin 2. 
Picture
Picture
Figure 4 PC 1 and PC2 distribution on the landscape. 


RPART

Rpart is the acronym of Recursive Partitioning and Regression Trees analysis, which is a package in R.  It is a classification method based on tree function to split variables (Breiman et al., 1984).  In order to classify different brown-down points based on the environmental variables, I tried different combinations of variables:
1.  Brown-down ~ daymean+Longitude + Latitude+ climatic variables
2.  Brown-down ~ Longitude + Latitude+ climatic variables
3.  Brown-down ~ climatic variables (MWMT, MCMT, TD, MAP, MSP, AHM, SHM, DD_0, DD_5, NFFD, eFFP, FFP). 
The first partition process has five nodes and ends up with seven leaves (subsets).  The main variables are daylight length and latitude.  The tree graph and spatial distribution of the subsets are shown below.  The spatial zones are not similar as the eco-region system. 
More than 90% variances are explained by this procedure. 
Picture
Figure 5 Tree graph of rpart. 
Picture
Figure 6 Spatial distribution of rpart subsets. 
For the second function, the daylight length is not included.  There are eight nodes and nine leaves as the output.  The major splitting variables are longitude, latitude and MSP.  The map of subsets distribution is similar as the eco-region system, which is built based on growth trait of the aspen provenances.  Accumulatively 92% of the total variances are explained. 
Picture
Figure 7 Tree graph of rpart. 
Picture
Figure 8 Spatial distribution of rpart subsets. 

In the third function, the climatic variables are applied for partitioning to explore the climatic factors to the brown-down.  The major splitting variables for each node are MWMT, DD.0, eFFP, MSP, DD_0, TD and MAP.  There are eight nodes and nine leaves in the tree graph.  And 86.5% of the total variances are explained, which is less than the second approach.  The map below shows the distribution of brown-down point subsets, which is not too different from the eco-region system.  But Foothill and North Boreal Plain are merged with the Boreal Plain.  For different year from 2001 to 2006, the variation of brown-down date in 2002 is the most significant based on the rpart output. 
Picture
Figure 9 Tree graph of rpart. 
Picture
Figure 10  Spatial distribution of rpart subsets. 
In the fourth function, the climatic variables are applied for partitioning to explore the environmental factors to the brown-down date except daylight length, which includes longitude, latitude, MAT, MAP, MSP, DD_5, NFFD, eFFP and FFP.  The complexity parameter is 0.5, which gives three nodes and four leaves in the tree graph.  And 83.6% of the total variances are explained, which is less than the third approach. 

Thus, the latitudinal factor is one of the most important factor for the spatial distribution of aspen brown-down date as the proxy of daylight length is a function of latitude and brown-down date.  The overall factors of climatic conditions delineate the growth rate boundary in the aspen population distribution as eco-regions, which is similar as the brown-down event distribution. 
Picture
Figure 11 Classification tree of brown-down date.   

Relationship between the bud set time and the brown-down date

The statistical relationship between bud set score and brown-down date is analyzed by the unit of ecosystem variants.  The scatter plot below shows their relationship by joining the provenance corrdinates with the remote sensing pixel points within the same local region.  Each point represent a point within a variant unit.  Because there are less data points between SK and MN, the points have two clusters at two ends.  The linear relationship is not weak however, the assmption of linear regression is not well-met.  The residual is just nomally distributed (shapiro test: W = 0.9463, p-value = 0.08704) but not of equal variance.   The remotely sensed leaf phenology data could be validated by ground provenance trial data. 
Picture
Figure 12 Regression of bud set score (squared) versus brown-down date.