TM3C4 Sediment Spreadsheet User Manual
TM3C4 Sediment Spreadsheet User Manual
Manual
7/21/2010
1
Model standard Percentage Error .......................................................................................................... 10
References .............................................................................................................................................. 10
Appendix ..................................................................................................................................................... 12
2
Introduction
Getting Started
In order to use this program, Macros must be enabled. For Excel 2003, go to the Tools menu. Under the
Macros option select Security. Set the Security level to either low or medium. You must then close and
reopen excel and this program. If medium is selected, indicate that you wish to enable macros. For
Excel 2007, above the formula far, there is a Security warning. Click options and choose the “enable this
content” option. Click OK, and macros will be enabled.
Note: Program was created on Microsoft Windows XP using Microsoft Office Excel 2007 with 32 bit
color. All properties and formats are described as they appear using this software. Variations may occur
based on operating programs and settings.
Data Entry
First select the “Regression and Analysis 1” tab located at the bottom of Excel. Enter the Site Number in
cell B3 and the Site Name in cell F3. If this information is not entered prior to running a macro the
program will require you to enter this information before proceeding. Insert date and time beginning in
cells C7 and D7 respectively. Insert Turbidity, Streamflow, Suspended Sediment Concentration, and
Percent Sand data beginning in cells E7, F7, G7, and H7 respectively. All data that is entered by the user
is shaded in a darker shade of green. The Date+Time and Julian Day columns will be automatically filled
by the macro once the “Data Analysis” button is clicked. The program will not work if there are empty
cells in streamflow, SSC, or turbidity. Empty cells are permissible for percent sand. The program can
handle data up to row 2500, as more data is entered more time will be required for the program to
process. For a sample of the data entry see Image 1 in Appendix.
Process
After the data has been entered, the buttons should be clicked in increasing numerical order. The only
exception is button #3 which does not need to be clicked to proceed to button #4. Button #3 should
only be pressed when a Candidate Model is wished to be generated for the current data on the current
Regression and Analysis sheet. For a sample of the buttons see Image 2 in Appendix.
Use the drop down boxes in Cells J5 through L5 to choose the transformations for each variable
(user.selected.transform). For a sample of drop down boxes see Image 3 in Appendix.
3
NOTE: Currently, MSPE and Duan BCF cannot be computed using the sqrt
transformation for the response variable.
1. Data Analysis
This button does a number of calculations designed to help the user choose the appropriate model for
the data submitted. Residual plots are automatically generated as a result of these calculations. This
should be done for each Regression and Analysis sheet.
a. Transforms the Turb, Q, and SSC data based on the user selected transformations
b. Computes six different regression models
a. (user.selected.transform)SSC vs (user.selected.transform)Turb
b. SSC vs Turb
c. (user.selected.transform)SSC vs (user.selected.transform)Q
d. SSC vs Q
e. (user.selected.transform)SSC vs (user.selected.transform)Turb +
(user.selected.transform)Q
f. SSC vs Turb + Q
c. Computes a number of regression statistics for each of the six models
a. Slope(s)
b. Intercept
c. R2
d. Adjusted R2
e. Standard Error
f. T-Statistics
g. Duan Bias Correction Factor
h. PRESS
i. Mallow’s Cp
j. MSPE
k. VIF (for multiple explanatory variables)
l. P-value for the explanatory variable(s)
d. Calculates estimated SSC values and residuals
e. Gives each data point a data reference number
2. Normal Quantiles
This button calculates the normal quantiles for the residuals of each model. Normal quantiles plots are
automatically generated. Possible outliers are identified using quartiles. Upper and lower outliers are
defined by:
3rd Quartile + 1.5*Inter-Quartile Range < Residual ≤ 3rd Quartile + 3*Inter-Quartile Range
Or 1st Quartile – 1.5*Inter-Quartile Range > Residual ≥1st Quartile – 3*Inter-Quartile Range
The extreme upper and extreme lower outliers are defined by:
4
Residual > 3rd Quartile + 3*Inter-Quartile Range
The user is asked to review the outliers, models, and plots to determine whether or not some possible
outliers should be removed. Points can be removed by analyzing columns BK through BP, and typing the
work “yes” into column BQ in the row of the corresponding data reference number. Extreme outliers
are marked in red. If the user determines that no points should be removed, then button #4 is
unnecessary. This should be done for each Regression and Analysis sheet.
This does not have to be done for each Regression and Analysis sheet, only when the user wishes to
choose a model.
4. Delete Points
If the user has indicated that a point is to be deleted, this button deletes the point(s) and carries the rest
of the data over to the next Regression and Analysis sheet. This entire process (buttons 1-4) can be
repeated on all Regression and Analysis sheets except for sheet R & A 4, which will not delete points.
5
SSC/SSL Model Switch
This button in on each of the Regression and Analysis sheets, by clicking on this button the user switches
between analyzing the data relative to SSC and SSL. The button label will change to display which model
is active. When the button is clicked a message box will pop us asking the user if he/she would like to
switch models. The “yes” button switches the model and “no” cancels the macro. The SSC model
should be used the majority of the time, the top two rows on the candidate models only apply if the SSC
model is in use. The SSL should only be used if there is a bias for the SSC model.
Candidate Model
There are a total of six candidate model sheets that can be created from any of the four Regression and
Analysis sheets. Each candidate model’s layout goes as follows:
The top two rows hold all statistical and numerical information for easy transition to the USGS National
Real-Time Water Quality website. The top row consists of the labels, while the second row holds the
data which can be copy and pasted for website use. Next the layout matches the National Real-Time
Water Quality model info tab for computed suspended sediment. It displays the model equation, model
calibration, explanatory variables, and covariance matrix. See Image 8 in Appendix. The candidate
model sheet also shows the measured data from its specific Regression and Analysis sheet, including the
regression computed data, residuals and normal quantiles. The four corresponding graphs for that
model are also included on the sheet.
If a simple linear regression model is chosen an additional graph will be created to compare the accuracy
of the transformed vs. the accuracy of the untransformed models. This graph will demonstrate whether
the chosen model is bias at lower or higher values.
Cell B5 contains a reference from where the candidate model was created; if this cell is double clicked it
will take the user to that Regression and Analysis page.
The candidate model sheet also contains two buttons, see Image 9 in Appendix. The two buttons are:
Clear Results
This button clears all data and graphs from the candidate model sheet. Once the candidate model is
cleared it is hidden from the user to simplify the tabs located at the bottom of the screen.
The Regression and Analysis sheet from which the candidate model came from is reported in cell B5 of
each candidate model sheet. When double-clicked on it will take the user to that sheet.
6
Time-Series
There are a total of two time-series sheets, which compute estimated SSC or SSL for a chosen candidate
model along with upper and lower 90% confidence intervals.
Once a model is decided on the user should click the “Transfer to Time-Series Sheet” button on that
candidate model sheet. The equation and measured data will then be moved to an empty time-series
sheet where the estimated SSC and SSL will be calculated along with upper and lower 90% confidence
intervals for both SSC and SSL. The loads table located to the right of the time-series table will compute
the total loads for each month and then sum the loads for the annual year and for the water year.
The samples, estimated SSC or SSL, upper and lower confidence intervals will then be graphed together
so the user can visually compare the estimated data to the measured data. A drop down bar lets the
user see either SSC or SSL and a check box lets the user see the data on a log or linear scale.
Each time-series sheet will be filled in increasing order. If both sheets are full a Userform will allow the
user to choose which model they would like to replace. The equation used for the time-series will be
given for both time-series sheets. If the user does not wish to replace either of the time-series models
clicking continue without selecting a model to replace ends the transfer. See Image 10 in Appendix.
The user must input data into columns “J-P” before transferring a candidate model to the time series
sheet. The entered time series data will be graphed with the sample data so the user can see graphically
the accuracy of the model. If there is no data in the first row of the required columns a message box will
be displayed asking the user to input the data before proceeding and no action will be made. Excel
graphs are only capable of graphing up to 32,000 data points, if more are entered in the time series
columns, anything over 32,000 will not be displayed. For a sample of the Time-Series sheet layout see
Image 11 in Appendix.
The candidate model sheet form which the time-series model was generated from is reported in cell D2
of each time-series sheet. When double clicked on it will take the user to that sheet. If the candidate
model sheet has been cleared and is hidden, no action will be made. If the candidate model sheet has
been replaced the new candidate model will have no relevance to the time series.
7
Note: The additional models added to the time series graph will not have their specific equation
displayed; only the generic equation form will be displayed.
Note: If the Regression and Analysis sheet or Candidate Model sheet in which the time series model was
generated has been deleted then there is no way for the program to know where to obtain the
equations for additional models, and none will be added.
The “Log Scale” check box allows the user to switch between a log base 10 scale and linear scale. It is
suggested that a log scale is used for log transformed equations and a linear scale is used for an
untransformed equation.
The top scroll bar allows the user to scroll through starting dates on the graph. The minimum will start
on the first day of the earliest month of the earliest year in the time series dates. Each click of the
forward and back arrows changes the start date forward and back by one month. If the empty space to
the left or right of the indicator moves the start date back or forward one year, respectively. The
starting date is displayed just above the scroll bar.
The bottom scroll bar allows the user to change the period of time displayed on the graph. The shortest
period is 15 days and each click of the forward and back arrow increases and decreases the period by 15
days. After the 90 day period, the last position is for a display of one year. A scale is placed under the
scroll bar to assist the user. The period length is displayed just above the scroll bar.
X-Section
2.33 Cross-Section
This button will create the same number of cross-section sheets as you have samples. From here, cross-
section data can be entered. This step is not required, but may be helpful. To use button 2.67, the date,
time and turbidity values are required.
Other
There are notes throughout the first frame detailing how some processes are done. Just hold the mouse
over the red “flags” to see the notes.
8
Methods
LINEST
Linest reports regression statistics as follows:
Slopem Slopem-1
Standard errorm Standard errorm-1
R2 Standard error y (RMSE)
F-statistic Degrees of freedom
Regression sum of squares Residual sum of squares
T-Statistics
A hypothesis test that determines whether each slope coefficient is useful. The higher the T-stat is, the
more useful it is.
(slopem)/(standard errorm)
Adjusted R-Squared
The Adjusted R-squared in an R-squared adjusted for the number of explanatory variables in the
regression model. The regression model with the highest adjusted R-squared is identical to the
regression model with the lowest standard error.
Adj R2 = 1-(1-R2)*(n-1)/(n-p) where p is the number of explanatory variables plus one and n
is the number of observations
Estimated SSC
Yi(estimated) = (slope1)*(x1,i)+b SLR
Residuals
ei = (Actual y)-yi(estimated)
9
Variance Inflation Factor
A measure of co linearity used in multiple linear regression (MLR). Multi co linearity occurs when two of
more explanatory variables are related to each other. There are serious problems when VIF > 10 and
one should be cautious when VIF > 5.
VIF = 1/(1-Rj2) where Rj2 is the R-squared from the regression of the jth
explanatory variables with all other explanatory
variables
Mallow’s Cp
Mallow’s Cp is designed to minimize bias and to minimize the standard error by keeping the number of
coefficients small. The best model is the one with the lowest Cp.
MSPE = ±(RMSE/yavg)*100
References
Heisel, D.R., and R.M. Hirsch, 2002, Statistical Methods in Water Resources, U.S. Geological Survey
Report
Hurvich, C.M., and Tsai, C.L., 1989. Regression and time series model selection in small samples.
Biometrika, Vol 76.pp.297-307
10
NOTE: These statistics are not appropriate for comparing models with different units of y.
11
Appendix
12
Image 3: Drop Down Box Sample
13
Image 5: MSPE Analysis Sample
14
Image 7: Regression and Analysis Buttons Sample
15
Image 9: Candidate Model Buttons
16
Image 12: Time-Series Add a Model Userform
17