Differences

This shows you the differences between two versions of the page.

 cs-401r:adjusted-rand-index [2014/11/18 03:36]cs401rPML [Adjusted Rand Index] clarified that ARI is defined for clusterings with different numbers of clusters. cs-401r:adjusted-rand-index [2014/11/18 03:38] (current)cs401rPML [Adjusted Rand Index] Added note about number of clusters in the clusterings. Both sides previous revision Previous revision 2014/11/18 03:38 cs401rPML [Adjusted Rand Index] Added note about number of clusters in the clusterings.2014/11/18 03:36 cs401rPML [Adjusted Rand Index] clarified that ARI is defined for clusterings with different numbers of clusters.2014/11/18 03:34 cs401rPML Added example of what data items might be: documents.2014/11/18 03:32 cs401rPML Corrected the counts for n_{3,1} and n_{3,2} in the first two tables.2014/11/14 20:26 ringger 2014/11/11 23:20 cs401rPML Added another link to ARI on Wikipedia.2014/11/11 23:17 cs401rPML Updated link to assignment 6 to point back to the area that points to this page.2014/11/11 23:13 cs401rPML Finished, including math and acknowledgements.2014/11/11 22:50 cs401rPML Added notes about binomical coefficient.2014/11/11 22:41 cs401rPML created with example up to the ARI equation 2014/11/18 03:38 cs401rPML [Adjusted Rand Index] Added note about number of clusters in the clusterings.2014/11/18 03:36 cs401rPML [Adjusted Rand Index] clarified that ARI is defined for clusterings with different numbers of clusters.2014/11/18 03:34 cs401rPML Added example of what data items might be: documents.2014/11/18 03:32 cs401rPML Corrected the counts for n_{3,1} and n_{3,2} in the first two tables.2014/11/14 20:26 ringger 2014/11/11 23:20 cs401rPML Added another link to ARI on Wikipedia.2014/11/11 23:17 cs401rPML Updated link to assignment 6 to point back to the area that points to this page.2014/11/11 23:13 cs401rPML Finished, including math and acknowledgements.2014/11/11 22:50 cs401rPML Added notes about binomical coefficient.2014/11/11 22:41 cs401rPML created with example up to the ARI equation Line 4: Line 4: An ARI score of 1 indicates that the two clusterings are the same apart from the cluster ID/name. An ARI score of 1 indicates that the two clusterings are the same apart from the cluster ID/name. + + Note that the ARI can be computed even for clusterings with a different number of clusters in each clustering. To compute the ARI start by building the contingency table (similar to a confusion matrix) for the two clusterings. ​ Say our data is the set of items (e.g., documents) $\{A, B, C, D, E, F\}$, that the gold-standard (GS) clustering is $(\{A, D\}, \{B,C\}, \{E,​F\})$---where $\{A,D\}$ is the first cluster, $\{B,C\}$ is the second and so on---and that the EM clusterings is $(\{A, B\}, \{E,F\}, \{C,​D\})$. ​ The contingency table is then filled in by calculating the size of the intersection of each EM cluster with each GS cluster: To compute the ARI start by building the contingency table (similar to a confusion matrix) for the two clusterings. ​ Say our data is the set of items (e.g., documents) $\{A, B, C, D, E, F\}$, that the gold-standard (GS) clustering is $(\{A, D\}, \{B,C\}, \{E,​F\})$---where $\{A,D\}$ is the first cluster, $\{B,C\}$ is the second and so on---and that the EM clusterings is $(\{A, B\}, \{E,F\}, \{C,​D\})$. ​ The contingency table is then filled in by calculating the size of the intersection of each EM cluster with each GS cluster: Line 59: Line 61: ^  EM' 3         ​| ​ $|\emptyset| = 0$ |  $|\{C,D\}| = 2$ |  $|\emptyset| = 0$ |  2   | ^  EM' 3         ​| ​ $|\emptyset| = 0$ |  $|\{C,D\}| = 2$ |  $|\emptyset| = 0$ |  2   | ^  Column Sums  |  2  |  2  |  2  |  6  | ^  Column Sums  |  2  |  2  |  2  |  6  | - - Further note that the ARI can be computed even for clusterings with a different number of clusters. ​ For example, the GS with three clusterings can be compared to an EM''​ clustering with less than three clusters or more than three clusters. === Acknowledgements === === Acknowledgements === Thanks to Mike Brodie and Abraham Frandsen for help in developing this tutorial in a Google Group discussion. Thanks to Mike Brodie and Abraham Frandsen for help in developing this tutorial in a Google Group discussion. 