Documentation

This is machine translation

Translated by Microsoft
Mouse over text to see original. Click the button below to return to the English verison of the page.

mahal

Mahalanobis distance

Syntax

d = mahal(Y,X)

Description

d = mahal(Y,X) computes the Mahalanobis distance (in squared units) of each observation in Y from the reference sample in matrix X. If Y is n-by-m, where n is the number of observations and m is the dimension of the data, d is n-by-1. X and Y must have the same number of columns, but can have different numbers of rows. X must have more rows than columns.

For observation I, the Mahalanobis distance is defined by d(I) = (Y(I,:)-mu)*inv(SIGMA)*(Y(I,:)-mu)', where mu and SIGMA are the sample mean and covariance of the data in X. mahal performs an equivalent, but more efficient, computation.

Examples

collapse all

Generate correlated bivariate data.

X = mvnrnd([0;0],[1 .9;.9 1],100);

Input observations.

Y = [1 1;1 -1;-1 1;-1 -1];

Compute the Mahalanobis distance of observations in Y from the reference sample in X .

d1 = mahal(Y,X)
d1 =

    0.6288
   19.3520
   21.1384
    0.9404

Compute their squared Euclidean distances from the mean of X .

d2 = sum((Y-repmat(mean(X),4,1)).^2, 2)
d2 =

    1.6170
    1.9334
    2.1094
    2.4258

Plot the observations with Y values colored according to the Mahalanobis distance.

scatter(X(:,1),X(:,2))
hold on
scatter(Y(:,1),Y(:,2),100,d1,'*','LineWidth',2)
hb = colorbar;
ylabel(hb,'Mahalanobis Distance')
legend('X','Y','Location','NW')

The observations in Y with equal coordinate values are much closer to X in Mahalanobis distance than observations with opposite coordinate values, even though all observations are approximately equidistant from the mean of X in Euclidean distance. The Mahalanobis distance, by considering the covariance of the data and the scales of the different variables, is useful for detecting outliers in such cases.

See Also

|

Introduced before R2006a


Was this topic helpful?