r - Hierarchical clustering on rows of varying length with sequence of numbers -
i want hierarchical clustering in 1 of project.
my original problem have huge graph on have iterated large number of paths , reported nodes of path in below format. each number in below sample represents graph node , row represents path. want cluster these paths on basis of number of sharing nodes way segregate similar kind of paths.
1210, 158, 1222, 1468 1210, 1222, 198 158, 1468, 25, 26, 27, 28
now want hierarchical clustering between rows based upon number of similar nodes. in table above, rows(paths) 1 , 2 part of 1 cluster due same nodes 1210 , 1222. rows(paths) 1 , 3 part of cluster due similar nodes 158 , 1468.
i checked can use hclust
function hierarchical clustering. function takes dissimilarity matrix argument. not sure how create distance metric. seems use jaccard similarity measure. don't find option in dist
method jaccard similarity , and variable column format above.
regards,
here's example of hclust jaccard distance (using vegdist in vegan package), based on abstraction of data binary dataset:
dat 25 26 27 28 158 198 1210 1222 1468 1 0 0 0 0 1 0 1 1 1 2 0 0 0 0 0 1 1 1 0 3 1 1 1 1 1 0 0 0 1 library(vegan) dist<-vegdist(dat, method="jaccard") hclust(dist) %>% plot
Comments
Post a Comment