Source code for colour.algebra.regression

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
Regression Analysis
===================

Defines various objects to perform statistical regression analysis:

-   :func:linear_regression: Implements multiple linear regression.

References
----------
.. [1]  http://en.wikipedia.org/wiki/Regression_analysis
"""

from __future__ import division, unicode_literals

import numpy as np

from colour.algebra import to_ndarray

__author__ = 'Colour Developers'
__maintainer__ = 'Colour Developers'
__status__ = 'Production'

__all__ = ['linear_regression']

"""
Performs the statistics computation about the ideal trend line from given
data using the *least-squares* method.

The equation of the line is :math:y=b+mx or
:math:y=b+m1x1+m1x2+...+mnxn where the dependent variable :math:y value
is a function of the independent variable :math:x values.

Parameters
----------
y : array_like
Dependent and already known :math:y variable values used to curve
fit an ideal trend line.
x : array_like, optional
Independent :math:x variable(s) values corresponding with :math:y
variable.
Output additional regression statistics, by default only the :math:b
variable and :math:m coefficients are returned.

Returns
-------
ndarray, ({{mn, mn-1, ..., b}, {sum_of_squares_residual}})
Regression statistics.

Raises
------
ValueError
If :math:y and :math:x variables have incompatible dimensions.

References
----------
.. [2]  http://en.wikipedia.org/wiki/Simple_linear_regression
(Last accessed 24 May 2014)

Examples
--------
Linear regression with the dependent and already known :math:y variable:

>>> y = np.array([1, 2, 1, 3, 2, 3, 3, 4, 4, 3])
>>> linear_regression(y)  # doctest: +ELLIPSIS
array([ 0.2909090...,  1.        ])

Linear regression with the dependent :math:y variable and independent
:math:x variable:

>>> x1 = np.array([40, 45, 38, 50, 48, 55, 53, 55, 58, 40])
>>> linear_regression(y, x1)  # doctest: +ELLIPSIS
array([ 0.1225194..., -3.3054357...])

Multiple linear regression with the dependent :math:y variable and
multiple independent :math:x_i variables:

>>> x2 = np.array([25, 20, 30, 30, 28, 30, 34, 36, 32, 34])
>>> linear_regression(y, tuple(zip(x1, x2)))  # doctest: +ELLIPSIS
array([ 0.0998002...,  0.0876257..., -4.8303807...])

Multiple linear regression with additional statistics:

>>> linear_regression(y, tuple(zip(x1, x2)), True)  # doctest: +ELLIPSIS
(array([ 0.0998002...,  0.0876257..., -4.8303807...]), array([ 2.1376249...]))
"""

y = to_ndarray(y)

if x is None:
x = np.arange(1, len(y) + 1)
else:
x = to_ndarray(x)
if len(x) != len(y):
raise ValueError(
'"y" and "x" variables have incompatible dimensions!')

x = np.vstack([np.array(x).T, np.ones(len(x))]).T
result = np.linalg.lstsq(x, y)