Package astLib :: Module astStats
[hide private]
[frames] | no frames]

Module astStats

source code

module for performing statistical calculations.

(c) 2007-2012 Matt Hilton

(c) 2013-2014 Matt Hilton & Steven Boada

http://astlib.sourceforge.net

This module (as you may notice) provides very few statistical routines. It does, however, provide biweight (robust) estimators of location and scale, as described in Beers et al. 1990 (AJ, 100, 32), in addition to a robust least squares fitting routine that uses the biweight transform.

Some routines may fail if they are passed lists with few items and encounter a `divide by zero' error. Where this occurs, the function will return None. An error message will be printed to the console when this happens if astStats.REPORT_ERRORS=True (the default). Testing if an astStats function returns None can be used to handle errors in scripts.

For extensive statistics modules, the Python bindings for GNU R (http://rpy.sourceforge.net), or SciPy (http://www.scipy.org) are suggested.

Functions [hide private]
float
mean(dataList)
Calculates the mean average of a list of numbers.
source code
float
weightedMean(dataList)
Calculates the weighted mean average of a two dimensional list (value, weight) of numbers.
source code
float
stdev(dataList)
Calculates the (sample) standard deviation of a list of numbers.
source code
float
rms(dataList)
Calculates the root mean square of a list of numbers.
source code
float
weightedStdev(dataList)
Calculates the weighted (sample) standard deviation of a list of numbers.
source code
float
median(dataList)
Calculates the median of a list of numbers.
source code
float
modeEstimate(dataList)
Returns an estimate of the mode of a set of values by mode=(3*median)-(2*mean).
source code
float
MAD(dataList)
Calculates the Median Absolute Deviation of a list of numbers.
source code
float
biweightLocation(dataList, tuningConstant)
Calculates the biweight location estimator (like a robust average) of a list of numbers.
source code
float
biweightScale(dataList, tuningConstant)
Calculates the biweight scale estimator (like a robust standard deviation) of a list of numbers.
source code
dictionary
biweightClipped(dataList, tuningConstant, sigmaCut)
Iteratively calculates biweight location and scale, using sigma clipping, for a list of values.
source code
list
biweightTransform(dataList, tuningConstant)
Calculates the biweight transform for a set of values.
source code
dictionary
OLSFit(dataList)
Performs an ordinary least squares fit on a two dimensional list of numbers.
source code
dictionary
clippedMeanStdev(dataList, sigmaCut=3.0, maxIterations=10.0)
Calculates the clipped mean and stdev of a list of numbers.
source code
dictionary
clippedWeightedLSFit(dataList, sigmaCut)
Performs a weighted least squares fit on a list of numbers with sigma clipping.
source code
dictionary
weightedLSFit(dataList, weightType)
Performs a weighted least squares fit on a three dimensional list of numbers [x, y, y error].
source code
dictionary
biweightLSFit(dataList, tuningConstant, sigmaCut=None)
Performs a weighted least squares fit, where the weights used are the biweight transforms of the residuals to the previous best fit .i.e.
source code
list
cumulativeBinner(data, binMin, binMax, binTotal)
Bins the input data cumulatively.
source code
list
binner(data, binMin, binMax, binTotal)
Bins the input data..
source code
list
weightedBinner(data, weights, binMin, binMax, binTotal)
Bins the input data, recorded frequency is sum of weights in bin.
source code
Variables [hide private]
  REPORT_ERRORS = True
  __package__ = 'astLib'
Function Details [hide private]

mean(dataList)

source code 

Calculates the mean average of a list of numbers.

Parameters:
  • dataList (list or numpy array) - input data, must be a one dimensional list
Returns: float
mean average

weightedMean(dataList)

source code 

Calculates the weighted mean average of a two dimensional list (value, weight) of numbers.

Parameters:
  • dataList (list) - input data, must be a two dimensional list in format [value, weight]
Returns: float
weighted mean average

stdev(dataList)

source code 

Calculates the (sample) standard deviation of a list of numbers.

Parameters:
  • dataList (list or numpy array) - input data, must be a one dimensional list
Returns: float
standard deviation

rms(dataList)

source code 

Calculates the root mean square of a list of numbers.

Parameters:
  • dataList (list) - input data, must be a one dimensional list
Returns: float
root mean square

weightedStdev(dataList)

source code 

Calculates the weighted (sample) standard deviation of a list of numbers.

Parameters:
  • dataList (list) - input data, must be a two dimensional list in format [value, weight]
Returns: float
weighted standard deviation

Note: Returns None if an error occurs.

median(dataList)

source code 

Calculates the median of a list of numbers.

Parameters:
  • dataList (list or numpy array) - input data, must be a one dimensional list
Returns: float
median average

modeEstimate(dataList)

source code 

Returns an estimate of the mode of a set of values by mode=(3*median)-(2*mean).

Parameters:
  • dataList (list) - input data, must be a one dimensional list
Returns: float
estimate of mode average

MAD(dataList)

source code 

Calculates the Median Absolute Deviation of a list of numbers.

Parameters:
  • dataList (list) - input data, must be a one dimensional list
Returns: float
median absolute deviation

biweightLocation(dataList, tuningConstant)

source code 

Calculates the biweight location estimator (like a robust average) of a list of numbers.

Parameters:
  • dataList (list) - input data, must be a one dimensional list
  • tuningConstant (float) - 6.0 is recommended.
Returns: float
biweight location

Note: Returns None if an error occurs.

biweightScale(dataList, tuningConstant)

source code 

Calculates the biweight scale estimator (like a robust standard deviation) of a list of numbers.

Parameters:
  • dataList (list) - input data, must be a one dimensional list
  • tuningConstant (float) - 9.0 is recommended.
Returns: float
biweight scale

Note: Returns None if an error occurs.

biweightClipped(dataList, tuningConstant, sigmaCut)

source code 

Iteratively calculates biweight location and scale, using sigma clipping, for a list of values. The calculation is performed on the first column of a multi-dimensional list; other columns are ignored.

Parameters:
  • dataList (list) - input data
  • tuningConstant (float) - 6.0 is recommended for location estimates, 9.0 is recommended for scale estimates
  • sigmaCut (float) - sigma clipping to apply
Returns: dictionary
estimate of biweight location, scale, and list of non-clipped data, in the format {'biweightLocation', 'biweightScale', 'dataList'}

Note: Returns None if an error occurs.

biweightTransform(dataList, tuningConstant)

source code 

Calculates the biweight transform for a set of values. Useful for using as weights in robust line fitting.

Parameters:
  • dataList (list) - input data, must be a one dimensional list
  • tuningConstant (float) - 6.0 is recommended for location estimates, 9.0 is recommended for scale estimates
Returns: list
list of biweights

OLSFit(dataList)

source code 

Performs an ordinary least squares fit on a two dimensional list of numbers. Minimum number of data points is 5.

Parameters:
  • dataList (list) - input data, must be a two dimensional list in format [x, y]
Returns: dictionary
slope and intercept on y-axis, with associated errors, in the format {'slope', 'intercept', 'slopeError', 'interceptError'}

Note: Returns None if an error occurs.

clippedMeanStdev(dataList, sigmaCut=3.0, maxIterations=10.0)

source code 

Calculates the clipped mean and stdev of a list of numbers.

Parameters:
  • dataList (list) - input data, one dimensional list of numbers
  • sigmaCut (float) - clipping in Gaussian sigma to apply
  • maxIterations (int) - maximum number of iterations
Returns: dictionary
format {'clippedMean', 'clippedStdev', 'numPoints'}

clippedWeightedLSFit(dataList, sigmaCut)

source code 

Performs a weighted least squares fit on a list of numbers with sigma clipping. Minimum number of data points is 5.

Parameters:
  • dataList (list) - input data, must be a three dimensional list in format [x, y, y weight]
Returns: dictionary
slope and intercept on y-axis, with associated errors, in the format {'slope', 'intercept', 'slopeError', 'interceptError'}

Note: Returns None if an error occurs.

weightedLSFit(dataList, weightType)

source code 

Performs a weighted least squares fit on a three dimensional list of numbers [x, y, y error].

Parameters:
  • dataList (list) - input data, must be a three dimensional list in format [x, y, y error]
  • weightType (string) - if "errors", weights are calculated assuming the input data is in the format [x, y, error on y]; if "weights", the weights are assumed to be already calculated and stored in a fourth column [x, y, error on y, weight] (as used by e.g. astStats.biweightLSFit)
Returns: dictionary
slope and intercept on y-axis, with associated errors, in the format {'slope', 'intercept', 'slopeError', 'interceptError'}

Note: Returns None if an error occurs.

biweightLSFit(dataList, tuningConstant, sigmaCut=None)

source code 

Performs a weighted least squares fit, where the weights used are the biweight transforms of the residuals to the previous best fit .i.e. the procedure is iterative, and converges very quickly (iterations is set to 10 by default). Minimum number of data points is 10.

This seems to give slightly different results to the equivalent R routine, so use at your own risk!

Parameters:
  • dataList (list) - input data, must be a three dimensional list in format [x, y, y weight]
  • tuningConstant (float) - 6.0 is recommended for location estimates, 9.0 is recommended for scale estimates
  • sigmaCut (float) - sigma clipping to apply (set to None if not required)
Returns: dictionary
slope and intercept on y-axis, with associated errors, in the format {'slope', 'intercept', 'slopeError', 'interceptError'}

Note: Returns None if an error occurs.

cumulativeBinner(data, binMin, binMax, binTotal)

source code 

Bins the input data cumulatively.

Parameters:
  • data - input data, must be a one dimensional list
  • binMin (float) - minimum value from which to bin data
  • binMax (float) - maximum value from which to bin data
  • binTotal (int) - number of bins
Returns: list
binned data, in format [bin centre, frequency]

binner(data, binMin, binMax, binTotal)

source code 

Bins the input data..

Parameters:
  • data - input data, must be a one dimensional list
  • binMin (float) - minimum value from which to bin data
  • binMax (float) - maximum value from which to bin data
  • binTotal (int) - number of bins
Returns: list
binned data, in format [bin centre, frequency]

weightedBinner(data, weights, binMin, binMax, binTotal)

source code 

Bins the input data, recorded frequency is sum of weights in bin.

Parameters:
  • data - input data, must be a one dimensional list
  • binMin (float) - minimum value from which to bin data
  • binMax (float) - maximum value from which to bin data
  • binTotal (int) - number of bins
Returns: list
binned data, in format [bin centre, frequency]