Are problems of high dimensionality linearly separable

When it comes to machine learning, we often run into adaptation problems. The high dimensionality of the input data or features is one of the problems that lead to overfitting. . The higher the dimension, the smaller the distribution of your data in each feature dimension, which is inherently catastrophic for machine learning algorithms. There have been many methods of reducing dimensionality. Today we are going to discuss reducing the LDA dimensionality.

The idea of ​​reducing the LDA dimensionality is as follows: If the two data types can be linearly separated, there is a hyperplane to separate the two data types. Then: There is a modular rotation vector, two data types are projected onto one dimension and they are still linearly separable.

ask questions

Suppose a set of N marked dates (X.i, Ci), where the brand C is divided into two categories, namely: C.i= 0 or C.i= 1, design the classifier to separate the data. If the dimensionality of x is very high, even more than N, a decrease in dimensionality is required.

Problem solving process

1. Reduce X to one-dimensional according to the linear transformation

Assuming the rotation vector is W, project the data X onto a one-dimensional y and get y = W.TX, where the input data X is the rotation vector W.

In this way, the original x-dimensional vector is converted to a one-dimensional one and the data is classified as C using the classification algorithm. Thus, the threshold W can be found0,If y> W.0Is a category, y 0As a class.

2. Calculate the intra-class mean and variance of each category

Let C1 have N1 elements and C2 N2 elements, calculate the intra-class mean before projection and the intra-class mean and looseness (variance) after projection:


3. Find the Fisher criterion

4. Optimize the objective function

So take the extreme value after deriving the objective function.

The countdown is:


DeriveThe three are in the same direction.


Topic model ------ Principal component analysis PCA

The difference between PCA and LDA

LDA: The direction with the best classification performance

PCA: The direction in which the sample point projection has the greatest variance

Practical problems often need to examine several characteristics, and these characteristics have some relevance.

Combine several functions into some representative functions. Not only can the combined features represent most of the information from the original features, but they are also not related to each other, thereby reducing the correlation. This method of extracting the principal components of the original features is called principal component analysis.


For the data of m samples with n features, mark each sample as a row vector to get the matrix A:

Ideas to solve the problem:

Find the main direction U of the sample: project the values ​​of m samples onto a certain straight line L, get m points on the straight line L and calculate the m projection points variance. Superior Maximum Variance The direction of the straight line is the main direction. To

Assume that the sample is de-averaged

Find the variance, the core derivation process of PCA

Take the extension direction u of the projected line L and calculate the value of AXu

Find the variance of the vector A X u

Objective function: J (u) = uTA.TAu

Objective function to find a stationary point:


Since the direction obtained by multiplying the number of u is the same as u, increasing u is the constraint of the unit vector, namely: || u ||2= 1 = uTu

Write down the Lagrange equation:

L (u) = uTA.TAu -λ (uTu-1)

Find instructions:

Analysis A.TAu = λu

If the samples in A are de-averaged, then A.TThe covariance matrix of A and A differs only by the coefficient n-1

u is A.TThe feature vector of A, the value of λ is the variance of the projected value of the feature vector of the original observation data in the direction of the vector u



Important uses of PCA

Noise reduction, dimension reduction, pattern recognition, data correlation analysis and multi-source fusion, etc.

Reprinted at: