Clustering Wines with K-Means and DBSCAN
Coursework · Solo · 2025
Question
Can clustering tell us anything meaningful about wine attributes, and how do K-Means and DBSCAN compare on the same dataset?
Hypothesis
I expected K-Means to give cleaner clusters since the wine dataset doesn't have a ton of noise, but I wanted to see how DBSCAN would handle it as a comparison.
Approach
Started by normalizing the wine data so the distance calculations would work properly across features. Used both the elbow method and the silhouette method to pick the right number of clusters, and both pointed to k=3. Ran K-Means with k=3, then ran DBSCAN on the same data to compare how each algorithm handled the structure.
Result
The three K-Means clusters lined up with real wine types. One looked like dark wines with high color intensity and low flavanoids, one looked like high quality wines with high alcohol and phenols, and one looked like simple white wines. DBSCAN gave a different view by also flagging noise points instead of forcing every wine into a cluster. The biggest takeaway was that K-Means is fast and clean but assumes spherical clusters, while DBSCAN handles noise better but is harder to tune with two parameters.