Cluster Analysis on musical data from Last.Fm
|
For each user the list of 50 most played composers (and respective playcounts) was fetched.
This list represents musical preference. A user can be seen as a point in
a multi dimensional space where the composers are the dimensions and the 50 playcounts
of a user its coordinates.
Using principle component analysis, the large composer space (4932 artists)
can be reduced to 64 dimensions, retaining 99% of the variance in the data.
As measure of similarity between two users we use the correlation of their vectors in the
reduced composer space.
The graph on the left shows again the 388 users, each node connected to its three most
similar neighbours
The layout of the graph is realised by a force directed algorithm.
The colors represent 8 clusters in the 64 dimensional space that are found by
the so called QT-clustering algorithm (quality treshold clustering).
It is interesting to note that these clusters also happen to be perfectly separated by the
force directed layout of the (synthetic) graph. Below is a list of the 10 most important dimensions
(composers) of the 8 cluster centres. Combining cluster C-38 with C-24 and C-11
ons arrives at the same clustering as the previous (last.fm) graph. This shows that our similarity
measure is able to make a much finer distinction in taste than the undisclosed method of last.fm.
|