To make matters worse, the RegEM algorithm performs two very different functions: 1) it infills missing values in the station history data (which goes back to 1957) and 2) there is no satellite data prior to 1982 and it extrapolates values before 1982 – a process that is very different conceptually from simple “infilling”.
In our situation, the layering of the RegEM algorithm has the extreme disadvantage of making the underlying linear algebra even more murky. Layered onto this methodology is an expectation-maximization algorithm, whose properties are also poorly understood. Many CA readers know a lot about regression and quite a bit about principal components, but I dare say that a much fewer number are familiar with Truncated Total Least Squares (to which the regpar parameter belongs.) We’re seeing interesting interactions between PC=k and regpar=r – and there is little, if anything, in regular statistical literature about this particular interaction in the context of spatially autocorrelated time series.