## INTRODUCTION

Surface air temperature in central Asia has undergone a striking warming trend (Lioubimtseva, 2004; Chen *et al*., 2009) and aridity may also have increased across the region during the past century. These effects are pronounced in the western region (Lioubimtseva and Henebry, 2009). The past century also marked the change from a climate system dominated by natural influences, to one dominated by anthropogenic activities (Brönnimann *et al*., 2007). Demand for water also increased during that time, with greater regional industry and agriculture in central Asia (O’Hara, 2000; Saiko and Zonn, 2000; Qi *et al*., 2005).

Lakes play an essential role in the regional water cycle and reflect watershed water balance in arid regions (Lamb *et al*., 1999; Williams, 1999; Ma *et al*., 2011a). In arid central Asia, rapid shrinking of the Aral Sea, Lop Nur, Manas Lake, and Ebinur Lake reflects regional environmental effects and threatens human livelihoods (Ma *et al*., 2011a). For example, the surface area of Ebinur Lake was reduced from 1107 km^{2} in 1955 to 428 km^{2} in 2010. The exposed land surface became a vast bare *solonchak* with large salt deposits, predominantly sodium sulfates. The northwestern winds blowing from the Dzungarian Gate move the salt-dust hundreds of kilometers (Abuduwaili *et al*., 2008).

Increasing environmental and ecological awareness led us to explore reliable methods to evaluate the impact of anthropogenic factors and climatic variability on the evolution of the lake (Gell *et al*., 2007; Dearing *et al*., 2006). In previous work, Wu and Lin (2004) simply compared lake area with precipitation and agricultural irrigation water demand at decadal resolution. Zhou *et al.* (2010) analyzed the correlation between lake area and meteorological variables, and concluded that precipitation was one of the most important factors. In this paper, macro-economic data were used to describe the anthropogenic impact on lake size, and then, the anthropogenic and climatic effects on the size of Ebinur Lake were quantitatively evaluated.

### Regional setting

Ebinur Lake is a shallow, closed lake in arid northwest China (Fig. 1). The lake has a drainage area of 50,321 km^{2}, including 24,317 km^{2} of mountainous terrain. Ala Mountain borders the lake to the north and northwest, Boertala Valley is to the west, the Jing River pluvial fan is to the south, and sand dunes around the Kuitun River are to the east. Mean annual precipitation around the lake is about 95 mm, whereas annual evaporation is 1315 mm (Wu *et al*., 2009). The lake has a maximum water depth of 3.5 m and an average depth of 1.2 m. The lake water has 85-124 g L^{–1} of total dissolved solids. Ebinur Lake receives surface water inputs from the Bo and Jing Rivers. The Ala Mountain pass, northwest of the lake, is a well-known wind corridor, with wind speeds exceeding 20 m s^{–1} on 164 days of the year and maximum wind speeds of up to 55 m s^{–1} (Wu *et al.*, 2009; Ma *et al*., 2011b).

## METHODS

### Data collection

Ebinur Lake surface-area data are from previous research (Ma *et al*., 2011b). We interpolated data on lake surface area to annual resolution using linear interpolation, and used the annual values in the modeling approach. Climate data recorded at the Jinghe meteorological station (82°54’ N, 44°37’ E, 321.2 m) were provided by China’s meteorological data-sharing service. Measured meteorological variables, and the resolution with which they were reported, included mean annual precipitation (0.1 mm), barometric pressure (0.1 hPa), wind speed (0.1 m s^{–1}), annual surface air temperature (0.1°C), water vapor pressure (0.1 hPa), relative humidity (%), percent sunshine (%) and sunshine duration (0.1 hour) (Supplementary Tab. 1).

Economic data, used as a proxy for anthropogenic impact, are from the Xinjiang economic statistics yearbooks. Economic variables included population (10^{4} persons), gross domestic product (GDP) [10^{4} CNY, 1 CNY (Chinese Yuan)=0.16 USD (US Dollar)], the primary industry (10^{4} CNY), secondary industry (10^{4} CNY) and tertiary industry (10^{4} CNY), total investment in fixed assets (10^{4} CNY), total sown area of farm crops (10^{3} ha), total sown area of food crops (10^{3} ha), food production (10^{3} kg), cotton production (10^{3} kg) and oil crops production (10^{3} kg) (Supplementary Tab. 2).

### Quantitative analysis: a multivariate linear model

Multivariate regression was used to distinguish between climatic and anthropogenic impacts on observed lake surface area fluctuation in the past 50 years (eq. 1). The standard linear regression model assumes the value of Y has a linear form with the set of predictor variables (X_{1}, X_{2}, X_{3}…, X_{p}), as follows:

The surface area of Ebinur Lake was the dependent variable Y (Fig. 2). For a closed lake, hydrologic inputs included direct precipitation, stream and groundwater inflows. Outputs were evaporation and transpiration. In the Ebinur Lake catchment, there were insufficient hydrological data to complete a water balance analysis; consequently, hydrology was accounted for indirectly using climate and economic data. Industrial demand and agricultural irrigation in the catchment increase water consumption and reduce runoff, which directly affects lake surface area. We captured this economic development in economic data published by the government.

We used principal component analysis (PCA) (Pardo *et al.,* 1990) of the climatic and economic data to find the latent variables (X_{1}, X_{2}, X_{3}…, X_{p}) to explain the original variance and to simultaneously reduce the dimensionality of the dataset. We completed the PCA for climatic and economic variables separately to reduce the number of variables and extract latent variables for multivariate regression. The extracted climatic and economic variables, however, have some degree of correlation. We used partial least squares (PLS) modeling to eliminate potential collinearity among the independent, latent variables (X_{1}, X_{2}, X_{3}…, X_{p}). PLS was first introduced by H. Wold (1975) under the name NIPALS (nonlinear iterative partial least squares), which focuses on maximizing the variance of the dependent variables explained by the independent ones, instead of reproducing the empirical covariance matrix (Haenlein and Kaplan, 2004). A PLS model consists of a structural part, which reflects the relationships between the latent variables, and a measurement component, which shows how the latent variables and their indicators are related; but it also has a third component, weight relations, which are used to estimate case values for the latent variables (Chin and Newsted, 1999). Haenlein and Kaplan (2004) provided a comprehensible introduction to this technique. Partial least squares regression analysis is appropriate when the matrix of predictors has more variables, and when there is multi-collinearity among X values. By contrast, standard regression will fail in these cases. PLS regression was achieved using the Unscrambler software package (CAMO ASA, 1997). We completed PCA analysis and linear interpolation in R STATS packages (R Core Team, 2012).

### Assessment indices for the multivariate linear model

We used the Nash-Sutcliffe efficiency (NSE) ratio (Nash and Sutcliffe, 1970) of the root mean square error to the standard deviation of measured data (RSR), and percentage bias (PBIAS) to evaluate the model (Moriasi *et al.,* 2007). They are as follows:

*X*and n are, respectively, the observed value, the predicted value, the mean value of the observed data, and the number of observations. The theoretical range of NSE is from –∞ to 1, with NSE=1 being the optimal value. Values between 0.0 and 1.0 are generally viewed as acceptable levels of performance, whereas values <0.0 indicate that the mean observed value is a better predictor than the simulated value, which indicates unacceptable performance. The RSR varies from the optimal value of 0, which indicates zero root mean square error (RMSE), or residual variation, and therefore perfect model simulation, to a large positive value. Lower RSR implies lower RMSE and better model simulation performance (Moriasi

_{i}, X̂_{i}, X̄_{i}*et al.,*2007). The PBIAS measures the average tendency of the simulated data to be larger or smaller than their observed counterparts. The optimal value of PBIAS is 0.0, with low values indicating accurate model simulation (Gupta

*et al.,*1999).

## RESULTS

### Variation of lake surface area over the past 50 years

The evolution of the lake was divided into four periods (Fig. 2). During the first period, which ended in 1975, the lake shrank sharply. In the second period (1975-1995), lake area was relatively stable and was 476 km^{2} in 1995. The lake had an historically large area of 915 km^{2} in 2003, and subsequently decreased to 428 km^{2} in 2010.

### Latent variables extracted from the climate and economic data

Four climate factors accounted for 93% of the total variance in the PCA (Tab. 1). The remaining factors did not contribute significantly to information in the data matrix. The first factor (PC1_climate), spanning 49% of the variance, was related well to precipitation, wind speed, water vapor pressure, relative humidity, percent sunshine and sunshine duration. Temperature and station pressure were significantly correlated with PC2_climate. PC3_climate and PC4_climate accounted for 13% and 9% of total variance, respectively. The four principal components, as independent variables (X_{1}, X_{2}, X_{3} and X_{4}) in the multivariable linear model, represented climate change over the past 50 years (Supplementary Tab. 3).

Three social and economic factors accounted for 95% of total variance and the remaining factors were not significant (Tab. 2). PC2_human and PC3_human accounted for 18% and 7% of total variance, respectively. PC1_human accounted for 70% of total variance, and was positively correlated with population, gross domestic product, primary industry, secondary industry, tertiary industry, total investment in fixed assets and total sown area of farm crops. The three principal components, as independent variables (X_{5}, X_{6} and X_{7}) in the multivariable linear model, represented human activities (Supplementary Tab. 3).

### Assessment of the multivariate linear model

In our model, parameters X_{1}, X_{2},…, X_{7} extracted from the PCA, were assigned to the independent variables, and lake area was assigned to the dependent variable (Y) (Supplementary Tab. 3). The best match between the observed lake-surface area and the model predicted value is shown in Fig. 3. The regression coefficients (β_{1}, β_{2}, β_{3}…, β_{7}) were 21.83, -35.64, -30.11, -4.458, -47.57, -132.86 and -69.02, respectively. We use equation (5) to calculate the standard error of the estimate (SE).

Model assessment indices were NSE=0.76, RSR=0.49, and PBIAS=0. Based on general performance ratings for recommended statistics (Tab. 3) (Moriasi *et al*., 2007), our model was very good and the reconstructed annual surface area of Ebinur Lake was described well by the model.

### Contribution of climate change and human activities to change in the surface area of Ebinur Lake

We define a comprehensive climatic factor as Φ_{climate=}β_{1}X_{1}+X_{2}β_{2}+X_{3}β3+X_{4}β_{4}, and a comprehensive anthropogenic factor as Φ_{human=}β_{5}X_{5}+X_{6}β_{6}+X_{7}β_{7}. A change in the surface area of Ebinur Lake can be calculated as follows: ΔE_{predicted}=E2_{_average}-E1_{_average,} where ΔE_{predicted} is the predicted surface area change between two different stages, E1__{average} is the average lake surface area during the reference stage, and E2_{_average} is the average annual surface area during the following stage. The change in lake area was also estimated as follows: ΔE_{predicted}=ΔΦ_{climate}+ΔΦ_{human}, the impact of climatic change on lake surface area ΔΦ_{clmat} =Φ_{climate–2}^{–}Φ_{climate–1}, where Φ_{climate–1} is the average value of Φ_{climate} during the reference stage, and Φ_{climate–2} is the average value of Φ_{climate} during the following stage. The impact of human activities on lake surface area ΔΦ_{human}=Φ_{human–2}-Φ_{human–1}, where Φ_{human–1} is the average value of Φ_{human} during the reference stage, and Φ_{human–2} is the average value of Φ_{human} during the following stage. When compared to the reference stage of 1955-1960, the impacts of climate change across the catchment were generally positive for Ebinur Lake except during the 1961-1970 stage (Tab. 4). Based on our model, climate change increased annual lake area by 50.6 km2, 22.8 km2, 12.6 km2 and 38.2 km2 in the 1970s, 1980s, 1990s and 2000s, respectively. The impacts of human activities increased during the 1960s, 1970s and 1980s (Tab. 4, ΔΦ_{human}). Compared with previous stages, anthropogenic impacts on lake variation decreased from 1991 to 2010 (a positive value for ΔΦ_{human} in 1990s and 2000s).

## DISCUSSION

Regression coefficients for the climate and economic variables were (β_{1=}21.83, β_{2}=-35.64, β_{3}=-30.12, β_{4}=-4.46, β_{5}=-47.57, β_{6}=-132.86, and β_{7}=-69.02). Among the PCA-extracted economic variables, X_{6} had the greatest weight coefficient and reflected the change in food production. Water demand for agricultural irrigation was the main factor for water consumption in the Ebinur Lake watershed.

The surface area of Ebinur Lake shrank from 2330 km^{2} ca*.* 4.5 ka BP to 1107 km^{2} in 1955, whereas the lake shrank from 1107 km^{2} in 1955 to 428 km^{2} in 2010 (Ma *et al.* 2011a). Sayram Lake is another lake in the Ebinur Lake catchment. There are no residential villages or agricultural development around Sayram Lake. The surface area of Sayram Lake was relatively stable from 1960 (443.9 km^{2}) to 1987 (448.3 km^{2}). In 1988, the lake area increased to 459 km^{2} and has remained high since then (Wu and Ma, 2011). Human activity was responsible for Ebinur Lake shrinking by 286.8 km^{2} over the past half century. Assuming only climate impacts on lake variation *(i.e.,* no human activities in the watershed), the lake would have expanded from 873.3 km^{2} in the 1960s to 973.2 km^{2} in the 2000s. On 1 October 1955, the Xinjiang Uygur Autonomous Region was established, opening new possibilities for development in Xinjiang. During the past half century, Xinjiang’s economy has advanced rapidly, as has its social undertakings. Although climatic conditions alone would have led to a 99.8 km^{2} increase in lake surface area, intensive human activities in the catchment over the past 50 years led to the shrinkage of Ebinur Lake.

Although our model works well in terms of assessment statistics, the model could be improved. In some years, not all data were measured, and we estimated these missing values by linear interpolation. Of the 56 data points for lake surface area in the model, only 20 were measured directly. Consequently, the climate and economic variables cannot exactly reflect lake surface area, and this effect partly explains the difference between the observed and predicted values for lake surface area.

For Ebinur Lake, precipitation, stream input, groundwater inflow, evaporation and transpiration were direct factors influencing lake surface area. These factors were indirectly influenced by interactions between climate and human activities. Moreover, the internal mechanisms that determine how climatic and economic variables influenced lake size were complex and can only be expressed by linear mathematical formulas. These all affect our model for estimating lake surface area.

## CONCLUSIONS

Economic variables were used as proxies for anthropogenic impact on Ebinur Lake size. Seven independent variables were extracted from a larger set of variables using Principal Component Analysis (PCA), which accounted for 93% and 95% of climate and economic development variation, respectively.

A multivariate regression with the principal components as independent variables was conducted and was used to distinguish between climate and anthropogenic impacts on the lake surface area. There was a good linear correlation between the predicted and observed data (r=0.87, P<0.01). We conclude that human activity was responsible for the lake surface area reduction of 286.8 km^{2} over the past half century. In the absence of human activities in the watershed, climate conditions would have led to a 99.8 km^{2} increase in lake surface area.