Intro to Machine Learning 6 | Unsupervised Learning and Clustering
Intro to Machine Learning 6 | Unsupervised Learning and Clustering

- Unsupervised Learning Techniques
- Principle Components Analysis
- Page Rank
- Word Embedding like glove
- Anomaly detection
2. Common Unsupervised Learning Models
- Clustering: ๐-means / ๐-medians / ๐-medoid / mean-shift
- Hierarchical Clustering
- Spectral Clustering
- Collaborative Filtering
3. Clustering Problems
- Useful in specific circumstances like compression
- Generally doesnโt work well with imbalanced data
- No clear measure of success
4. Common Distances for Clustering
- We need to normalize x values so distance means the same thing in all directions
- Taxicab Distance: L1 distance for Euclidean space
- Euclidean Distance: L2 distance for Euclidean space
- Chebyshev Distance: Lโ distance can also be useful
- Cosine Similarity: angle ฮธ between 2 vectors by cosine theorem, and the similarity should be 1-cos(ฮธ)
5. k-Means, k-Medians, and k-Medoids
- ๐-means uses mean for centroids and centroids donโt have to be points in X
- k-medians uses median not mean for centroids and it minimizes with respect to L1, not L2 distance
- k-medoids requires medoids to be points in X and it will pick the centrally located point for each cluster
6. Troubles with ๐-means
- Requires to specify k, which is usually difficult
- Not stable: starting centroids can lead to different results
- Hard predictions: would prefer to have soft predictions like probabilities