Krig                 package:fields                 R Documentation

_K_r_i_g_i_n_g _s_u_r_f_a_c_e _e_s_t_i_m_a_t_e

_D_e_s_c_r_i_p_t_i_o_n:

     Fits a surface to irregularly spaced data. The Kriging model
     assumes   that the unknown  function is a realization  of a
     Gaussian  random spatial processes. The assumed model is additive 
     Y = P(x) +  Z(X) + e, where P is a low order polynomial and Z is a
      mean zero,  Gaussian stochastic process with a   covariance that
     is unknown up to a scale constant. The main advantages of this
     function are the flexibility in specifying the covariance as an R
     language function and also the supporting functions plot, predict,
     predict.se, surface for subsequent analysis. Krig also supports a
     correlation model where the mean and marginal variances are
     supplied.

_U_s_a_g_e:

     Krig(x, Y, cov.function=exp.cov, lambda=NA, df = NA,
     cost=1, knots, weights=rep(1,length(Y)), 
     m=2, return.matrices=TRUE, nstep.cv=80, scale.type="user", 
     x.center=rep(0, ncol(x)), x.scale=rep(1, ncol(x)), rho=NA, sigma2=NA, 
     method="GCV", decomp="DR", verbose=FALSE, cond.number=10^8, mean.obj=NULL, 
     sd.obj=NULL, yname=NULL, return.X=TRUE, null.function=make.tmatrix, offset=0, 
      outputcall = NULL, cov.args=NULL,na.rm=FALSE,...)

_A_r_g_u_m_e_n_t_s:

       x: Matrix of independent variables.  

       Y: Vector of dependent variables.  

cov.function: Covariance function for data in the form of an R function
          (see  exp.cov).   Default assumes that correlation is an
          exponential function of distance. 

      df: The effective number of parameters for the fitted surface.
          Conversely,  N- df, where N is the total number of
          observations is the degrees of freedom associated with the
          residuals.  This is an alternative to specifying lambda and
          much more interpretable.  

  lambda: Smoothing parameter that is the ratio of the error variance
          (sigma**2)  to the scale parameter of the   covariance
          function (rho). If omitted this is estimated by GCV ( see
          method below).  

    cost: Cost value used in GCV criterion. Corresponds to a penalty
          for   increased number of parameters.  

   knots: A matrix of locations similar to x. These can define an
          alternative set of basis functions for representing the
          estimate. One choice may be a space-filling subset of the
          original x locations. (See details.) 

 weights: Weights are proportional to the reciprocal variance of the
          measurement   error. The default is no weighting i.e. vector
          of unit weights.  

       m: A polynomial function of degree (m-1) will be   included in
          the model as the drift (or spatial trend) component.  

return.matrices: Matrices from the decompositions are returned. The
          default is T.   

nstep.cv: Number of grid points for minimum GCV search.  

scale.type: This is a character string among: "range", "unit.sd",
          "user", "unscaled". The independent variables and knots are
          scaled to the specified scale.type.  By default no scaling is
          done. Scale type of "range" scales the data to the interval
          (0,1) by forming  (x-min(x))/range(x) for each x. Scale type
          of "unit.sd"  Scale type of "user" allows specification of an
          x.center and x.scale by the  user. The default for "user" is
          mean 0 and standard deviation 1. Scale  type of "unscaled"
          does not scale the data.  

x.center: Centering values to be subtracted from each column of the x
          matrix.  

 x.scale: Scale values that are divided into each column after
          centering.  

     rho: Scale factor for covariance.  

  sigma2: Variance of the errors, often called the nugget variance. If
          weights are specified then the error variance is sigma2
          divided by weights.  

  method: How should the "smoothing" parameter be estimated? The
          default is by standard GCV  Other choices are: GCV.model,
          GCV.one, RMSE, pure error. The differences  are explained
          below.   

  decomp: Type of matrix decompositions used to compute the solution.
          Default is   "DR" Demmler-Reinsch an alternative that more
          numerically stable is   "WBW" Wendelberger-Bates-Wahba. This
          is the strategy used in GCV pack.    "WBW" can not be used if
          knots are specified. 

 verbose: If true will print out all kinds of intermediate stuff.
          Default is false, of course.    

cond.number: Maximum size of condition number to allow when using DR
          decomposition.  

mean.obj: Object to predict the mean of the spatial process. This used
          in when fitting a correlation model with varying spatial
          means and varying marginal variances. (See details.) 

  sd.obj: Object to predict the marginal standard deviation of the
          spatial process.  

   yname: Name of y variable  

return.X: If true returns the big X matrix used for the estimate.   

null.function: An S function that creates the matrices for the null
          space model.   The default is make.tmatrix, an S function
          that creates polynomial  null spaces of degree up to m-1.
          (See details)   

  offset: The offset to be used in the GCV criterion. Default is 0.
          This would be  used when Krig is part of a backfitting
          algorithm and the offset has to be  adjusted to reflect other
          model degrees of freedom.  

cov.args: A list with the arguments to call the covariance function.
          (in addition to the locations) 

outputcall: If NULL the output object will have a $call argument based
          on this call.  If no NULL the output call will have whatever
          is passed. This is kludge for the Tps function so that it
          return a Krig object but have the right call argument. 

   na.rm: If true NAs will be removed from the 'y' vector and the 
          corresponding  rows of 'x'.

     ...: Optional arguments that appear are assume to be additional
          arguments to the covariance function.  

_D_e_t_a_i_l_s:

     This function produces a object of class Krig. With this object it
     is easy to subsequently  predict with this fitted surface, find
     standard errors, alter the y data ( but not x),  etc. 

     The Kriging model is:  Y(x)= P(x) + Z(x) + e 

     where Y is the dependent variable observed at location x, P is a
     low order polynomial, Z is a mean zero, Gaussian field with
     covariance function K and e is assumed to be independent normal
     errors. The estimated surface is the best linear unbiased estimate
     (BLUE)  of P(x) + Z(x) given the observed data. For this estimate
     K, is taken to be rho*cov.function and the errors have variance
     sigma**2. In more conventional geostatistical terms rho is the
     "sill" if the covariance function is actually a correlation
     function  and  sigma**2 is the nugget variance or measure error
     variance (the two are confounded in this  model.)  

     If these parameters rho and sigma2 are omitted in the call, then
     they are estimated in the following way. If lambda is given, then
     sigma2 is estimated from the residual sum of squares divided by
     the degrees of freedom associated with the residuals.  Rho is
     found as the difference between the sums of squares of the
     predicted values having subtracted off the polynomial part and
     sigma2.  

     A useful extension of a stationary correlation to a nonstationary
     covariance is what we term a correlation model.  If mean and
     marginal standard deviation objects are included in the call.  
     Then the observed data is standardized based on these functions. 
     The spatial process is then estimated with respect to the
     standardized scale. However for predictions and standard errors
     the mean and standard deviation surfaces are used to produce
     results in the original scale of the observations.

     The GCV function has several alternative definitions when
     replicate observations are present or if one uses a reduced set
     knots.  Here are the choices based on the method argument:  

     GCV: leave-one-out GCV. But if there are replicates it is leave
     one group out. (Wendy and Doug prefer this one.)  

     GCV.one: Really leave-one-out GCV even if there are replicate
     points.  This what the old tps function used in FUNFITS.

     rmse: Match the estimate of sigma**2 to a external value ( called
     rmse)  

     pure error: Match the estimate of sigma**2 to the estimate based
     on replicated data (pure error estimate in ANOVA language).  

     GCV.model: Only considers the residual sums of squares explained
     by the basis functions.  

     WARNING: The covariance functions often have a nonlinear parameter
     that control the strength of the correlations as a function of
     separation, usually referred to as the range parameter. This
     parameter must be specified in the call to Krig and will not be
     estimated.

_V_a_l_u_e:

     A object of class Krig. This includes the predicted values in  
     fitted.values and the residuals in residuals. The results of the
     grid  search to minimize the generalized cross validation function
     are  returned in gcv.grid. 

    call: Call to the function  

       y: Vector of dependent variables.  

       x: Matrix of independent variables.  

 weights: Vector of weights.  

   knots: Locations used to define the basis functions.   

transform: List of components used in centering and scaling data.  

      np: Total number of parameters in the model.  

      nt: Number of parameters in the null space.  

matrices: List of matrices from the decompositions (D, G, u, X, qr.T).  

gcv.grid: Matrix of values from the GCV grid search. The first column 
          is the grid of lambda values used in the search, the second
          column   is the trace of the A matrix, the third column is
          the GCV values and  the fourth column is the estimated value
          of sigma conditional on the vlaue of lambda.   

lambda.est: A table of estimated smoothing parameters with
          corresponding degrees  of freedom and estimates of sigma
          found by different methods.   

    cost: Cost value used in GCV criterion.  

       m: Order of the polynomial space: highest degree polynomial is
          (m-1).  This is a fixed part of the surface often referred to
          as the drift  or spatial trend.   

  eff.df: Effective degrees of freedom of the model.  

fitted.values: Predicted values from the fit.  

residuals: Residuals from the fit.  

  lambda: Value of the smoothing parameter used in the fit.  

   yname: Name of the response.  

cov.function: Covariance function of the model.  

    beta: Estimated coefficients in the ridge regression format  

       d: Estimated coefficients for the polynomial basis functions
          that span the  null space  

fitted.values.null: Fitted values for just the polynomial part of the
          estimate  

   trace: Effective number of parameters in model.  

       c: Estimated coefficients for the basis functions derived from
          the  covariance.  

coefficients: Same as the beta vector.  

just.solve: Logical describing if the data has been interpolated using
          the basis   functions.   

    shat: Estimated standard deviation of the measurement error (nugget
          effect).  

  sigma2: Estimated variance of the measurement error (shat**2).  

     rho: Scale factor for covariance.  COV(h(x),h(x')) =
          rho*cov.function(x,x')  If the covariance is actually a 
          correlation function then rho is also the "sill".  

mean.var: Normalization of the covariance function used to find rho.  

best.model: Vector containing the value of lambda, the estimated
          variance of the   measurement error and the scale factor for
          covariance used in the fit.  

_R_e_f_e_r_e_n_c_e_s:

     See "Additive Models" by Hastie and Tibshirani, "Spatial
     Statistics" by     Cressie and the FIELDS manual.

_S_e_e _A_l_s_o:

     summary.Krig, predict.Krig, predict.se.Krig, predict.surface.se,
     predict.surface, plot.Krig, surface.Krig

_E_x_a_m_p_l_e_s:

     #2-d example 
     # fitting a surface to ozone  
     # measurements. Range parameter is 10 (in miles) 

     fit <- Krig(ozone$x, ozone$y, exp.cov, theta=10)  
      
     summary( fit) # summary of fit 
     plot(fit) # diagnostic plots of fit  
     surface( fit, type="C") # look at the surface 
     out.p<- predict.surface.se( fit) 
     image(out.p)

     # predict at data

     predict( fit)

     # predict on a grid ( grid chosen here by defaults)
     out<- predict.surface( fit)
     persp( out)

     # predict at arbitrary points (10,-10) and (20, 15)
     xnew<- rbind( c( 10, -10), c( 20, 15))
     predict( fit, xnew)

     # standard errors of prediction based on covariance model.  
     predict.se( fit, xnew)

     #
     # Roll your own: using a user defined Gaussian covariance 
     #
     test.cov <- function(x1,x2,theta){exp(-(rdist(x1,x2)/theta)**2)} 
     # use this and put in quadratic polynomial fixed function 
     fit.flame<- Krig(flame$x, flame$y, test.cov, m=3, theta=.5)
     #
     # note how range parameter is passed to Krig.   
     # BTW:  GCV indicates an interpolating model (nugget variance is zero) 
     #
     # take a look ...
     surface(fit.flame, type="I") 

     # 
     # Thin plate spline fit to ozone data using the radial 
     # basis function as a generalized covariance function 
     #
     # p=2 is the power in the radial basis function (with a log term added for 
     # even dimensions)
     # If m is the degree of derivative in penalty then p=2m-d 
     # where d is the dimension of x. p must be greater than 0. 
     #  In the example below p = 2*2 - 2 = 2  
     #
     # See also the Fields function Tps

     out<- Krig( ozone$x, ozone$y, rad.cov, m=2,  p=2, 
     scale.type="range",decomp="WBW") # these last two options make sure the
                                      # the results with Tps are identical
     # A Knot example

     data(ozone2)
     y16<- ozone2$y[16,] # there are some missing values! 
     good<- !is.na( y16)
     y<- y16[good] # Krig does not remove missing values. 
     x<- ozone2$lon.lat[ good,]

     #
     # the knots can be arbitrary but just for fun find them with a space 
     # filling design. Here we select  50 from the full set of 147 points
     #
     xknots<- cover.design( x, 50)$design # select 50 knot points

     out<- Krig( x, y, knots=xknots,  rad.cov, m=2,  p=2,scale.type="range")

     # Zee plot 
     surface( out, type="C")
     US( add=TRUE)
     points( x, col=2)
     points( xknots, cex=2, pch="O")

     summary( out)
     #
     # note that that trA found by GCV is around 15 so 50>15  knots may be a 
     # reasonable approximation to the full estimator. 
     #
     surface( out, type='C')
     # obs and knots
     points( x)
     points( xknots, col=2)

     # Correlation model example

     # fit krig surface using a mean and sd function to standardize 
     # first get stats from 1987 summer Midwest O3 data set 
     # Compare the function Tps to the call to Krig given above 
     # fit tps surfaces to the mean and sd  points.  
     # (a shortcut is being taken here just using the lon/lat coordinates) 
     data(ozone2)
     stats.o3<- stats( ozone2$y)
     mean.o3<- Tps( ozone2$lon.lat, c( stats.o3[2,]))
     sd.o3<- Tps(  ozone2$lon.lat, c( stats.o3[3,]))

     # Now use these to fit particular day ( day 16) 

     y16<- ozone2$y[16,] # there are some missing values! 
     good<- !is.na( y16) 
      
     fit<- Krig( ozone2$lon.lat[good,], y16[good],exp.earth.cov, theta=353, 
     mean.obj=mean.o3, sd.obj=sd.o3)
     #
     # the finale
     surface( fit, type="I")
     US( add=TRUE)
     title("Estimated ozone surface")

       

