Keith Briggs

 home · publications · thesis · talks · meetings · records · maths notes · software « · languages · music · travel · cv · memberships · students · maps · place-names · people · photos · links · ex libris · site map

### pipemath-1.2

#### introduction

This package is intended for processing of very large data sets via shell pipelines. The programs do not store the data. They are responses to the challenge: can one perform some of the standard computations of statistical data analysis (autocorrelation of a scalar time-series, covariance matrix of a set of vectors, and least-squares polynomials) if one receives the data points one at a time, and must process them and throw them away before receiving the next data point? Of course, all this must be done while preserving numerical stability. The three C programs I provide seem to achieve these aims for the three specific problems mentioned. The ideas could be relevant more generally to stream computing and distributed data analysis; see e.g.

Version 1.2 is 64-bit clean. A new feature is that the covariance program takes no arguments.

#### quick start

tar zvxf pipemath-1.2.tgz; cd pipemath-1.2; make

#### programs

Lines in the data file starting with # are ignored.

• autocorrelation:
Computes the autocorrelation function of a scalar time series. Usage: cat datafile | autocorrelation [maxlag=20 [stride=1 [dt=1]]]
• covariance:
Computes the covariance matrix of a set of n-vectors. Usage: cat datafile | covariance or: covariance < datafile Each line of datafile has an n-vector. The value of n is determined by the number of items on the first line. All subsequent lines must have the same number of items.
• lsqpoly:
Fits a least-squares polynomial. Usage: cat datafile | lsqpoly [degree=1]. Each line of datafile has an x,y pair and an optional weight

```