colour.algebra.regression Module

Regression Analysis

Defines various objects to perform statistical regression analysis:

References

[1]Wikipedia. (n.d.). Regression analysis. Retrieved May 24, 2014, from http://en.wikipedia.org/wiki/Regression_analysis
colour.algebra.regression.linear_regression(y, x=None, additional_statistics=False)[source]

Performs the statistics computation about the ideal trend line from given data using the least-squares method.

The equation of the line is \(y=b+mx\) or \(y=b+m1x1+m1x2+...+mnxn\) where the dependent variable \(y\) value is a function of the independent variable \(x\) values.

Parameters:
  • y (array_like) – Dependent and already known \(y\) variable values used to curve fit an ideal trend line.
  • x (array_like, optional) – Independent \(x\) variable(s) values corresponding with \(y\) variable.
  • additional_statistics (ndarray) – Output additional regression statistics, by default only the \(b\) variable and \(m\) coefficients are returned.
Returns:

Regression statistics.

Return type:

ndarray, ({{mn, mn-1, ..., b}, {sum_of_squares_residual}})

Raises:

ValueError – If \(y\) and \(x\) variables have incompatible dimensions.

References

[2]Wikipedia. (n.d.). Simple linear regression. Retrieved May 24, 2014, from http://en.wikipedia.org/wiki/Simple_linear_regression

Examples

Linear regression with the dependent and already known \(y\) variable:

>>> y = np.array([1, 2, 1, 3, 2, 3, 3, 4, 4, 3])
>>> linear_regression(y)  
array([ 0.2909090...,  1.        ])

Linear regression with the dependent \(y\) variable and independent \(x\) variable:

>>> x1 = np.array([40, 45, 38, 50, 48, 55, 53, 55, 58, 40])
>>> linear_regression(y, x1)  
array([ 0.1225194..., -3.3054357...])

Multiple linear regression with the dependent \(y\) variable and multiple independent \(x_i\) variables:

>>> x2 = np.array([25, 20, 30, 30, 28, 30, 34, 36, 32, 34])
>>> linear_regression(y, tuple(zip(x1, x2)))  
array([ 0.0998002...,  0.0876257..., -4.8303807...])

Multiple linear regression with additional statistics:

>>> linear_regression(y, tuple(zip(x1, x2)), True)  
(array([ 0.0998002...,  0.0876257..., -4.8303807...]), array([ 2.1376249...]))