A lot of times, our data has lots of features. But a lot of these features are correlated with each other in linear and non-linear ways. Dimensionality reduction allows us to represent the same “information” using less dimensions.
This is useful because it gets rid of redundancy.
One thing commonly done with dimensionality reduction algorithms is reducing to 2 or 3 dimensions. With 2 or 3 dimensions, we can directly visualize stuff by plotting everything. This makes it easy to see how different data points relate to each other, how many clusters there might be, and how easy it is to separate data from different classes.
Techniques
- Autoencoder
- t-SNE
- UMAP
- Principal component analysis / PCA
- Non-linear principal component analysis
- Random projection