graph - Constrained k-medoids clustering in R -

- February 15, 2014

i looking way implement semi-supervised clustering, perchance constrained clustering in r, particularly "cannot-link" part (i think - see below). found this question, don't know these languages.

i have info set (of indices words) looks this:

11 195 752 67 77 243 130 8 2178 2581 201 129 77 3872 8967 282 880 77 65 535 363 282

the first index on each line "leaf" on directed acyclic graph, i.e., tree/hierarchy. there more 1 word hierarcies/trees in database; lastly item on each line root/top node (semantically, general/abstract word), line represents path leaf root. suppose want in end dimension reduction: instead of 1000 unique words there 200 clusters of words, can useful when dealing limited info (small texts), since reduces info sparseness in e.g. topic/summarization models.

what have far. first, derive similarity matrix (based on how many nodes pair of leaves have in mutual in tree - or more precisely, wu&palmer wordnet distance metric), perform partining around medoids (pam) using cluster bundle in r. on illustration data, k=3..5 clusters ok: if less 3, leaves no mutual root shoved same cluster. create no sense, since have nil in common. works. however, larger datasets (500+ such items), when take appropriately high k, happens - things should not clustered clustered. right solution print warning indicating (by checking clusters mutual root node), , choosing different k.

what need way "tell"/supervise clustering algorithm don't want items clustered (which easy in - e.g., when calculating (dis)similarity matrix, utilize parallel matrix of same dimensions, , when pair of items has no mutual root encountered, there mark there indicating fact). however, pam function i've been using not seem allow using such constraints. currently, items no mutual root assigned 0 similarity (upon converting dissimilarity matrix, maximum distance present). tried boosting distance values of such pairs in matrix, realized pulling such items farther apart makes other items "look closer" clustering algorithm, 1 time again distorting results.

after googling have come conclusion either need implement myself (i have found no r bundle out of box) or have been looking wrong keywords, due limited knowledge of machine learning. question (how working in r?) either reply helpful - either suggest r bundle it, or suggest way implement (some illustration r code helpful me started). or perhaps i've been going wrong, , k-medoids clustering not right solution here?

small update: found bundle flexclust indeed seems that, function kcca has parameter groupfun = "differentclusters" supposed implement cannot-link constraint - documentation doesn't much more (neither cited paper). also, kcca takes regular info matrix input, while have distance matrix (which construct, cell cell, calculating wu&palmer distance metric each pair of words, based on leaf-to-root paths described in info illustration above). guess not solution here.

r graph constraints cluster-analysis wordnet

Search This Blog

Five

graph - Constrained k-medoids clustering in R -

Comments

Post a Comment

Popular posts from this blog

java - How to set log4j.defaultInitOverride property to false in jboss server 6 -

c - GStreamer 1.0 1.4.5 RTSP Example Server sends 503 Service unavailable -

Using ajax with sonata admin list view pagination -