r - Hierarchical clustering on rows of varying length with sequence of numbers -

- August 15, 2011

i want hierarchical clustering in 1 of project.

my original problem have huge graph on have iterated large number of paths , reported nodes of path in below format. each number in below sample represents graph node , row represents path. want cluster these paths on basis of number of sharing nodes way segregate similar kind of paths.
1210, 158, 1222, 1468 1210, 1222, 198 158, 1468, 25, 26, 27, 28

now want hierarchical clustering between rows based upon number of similar nodes. in table above, rows(paths) 1 , 2 part of 1 cluster due same nodes 1210 , 1222. rows(paths) 1 , 3 part of cluster due similar nodes 158 , 1468.

i checked can use hclust function hierarchical clustering. function takes dissimilarity matrix argument. not sure how create distance metric. seems use jaccard similarity measure. don't find option in dist method jaccard similarity , and variable column format above.

regards,

here's example of hclust jaccard distance (using vegdist in vegan package), based on abstraction of data binary dataset:

dat   25 26 27 28 158 198 1210 1222 1468 1  0  0  0  0   1   0    1    1    1 2  0  0  0  0   0   1    1    1    0 3  1  1  1  1   1   0    0    0    1  library(vegan) dist<-vegdist(dat, method="jaccard") hclust(dist) %>% plot

Search This Blog

Addrety

r - Hierarchical clustering on rows of varying length with sequence of numbers -

Comments

Post a Comment

Popular posts from this blog

javascript - Feed FileReader from server side files -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -