R: How to extract all labels in a certain node of a dendrogram

1.6k views Asked by At

I am writing a program that (as a part of it) automatically creates dendrograms from an input dataset. For each node/split I want to extract all the labels that are under that node and the location of that node on the dendrogram plot (for further plotting purposes). So, let's say my data looks like this:

> Ltrs <- data.frame("A" = c(3,1), "B" = c(1,1), "C" = c(2,4), "D" = c(6,6))
> dend <- as.dendrogram(hclust(dist(t(Ltrs))))
> plot(dend)

The dendrogram

Now I can extract the location of the splits/nodes:

> library(dendextend)
> nodes <- get_nodes_xy(dend)
> nodes <- nodes[nodes[,2] != 0, ]
> nodes
      [,1]     [,2]
[1,] 1.875 7.071068
[2,] 2.750 3.162278
[3,] 3.500 2.000000

Now I want to get all the labels under a node, for each node (/row from the 'nodes' variable).

This should look something like this:

$`1`
[1] "D" "C" "B" "A"

$`2`
[1] "C" "B" "A"

$`3 `
[1] "B" "A"

Can anybody help me out? Thanks in advance :)

2

There are 2 answers

1
steveLangsford On BEST ANSWER

How about something like this?

library(tidyverse)
library(dendextend)
Ltrs <- data.frame("A" = c(3,1), "B" = c(1,1), "C" = c(2,4), "D" = c(6,6))
dend <- as.dendrogram(hclust(dist(t(Ltrs))))

accumulator <- list();
myleaves <- function(anode){
    if(!is.list(anode))return(attr(anode,"label"))
    accumulator[[length(accumulator)+1]] <<- (reduce(lapply(anode,myleaves),c))
}

myleaves(dend);
ret <- rev(accumulator); #generation was depth first, so root was found last.

Better test this. I am not very trustworthy. In particular, I really hope the list ret is in an order that makes sense, otherwise it's going to be a pain associating the entries with the correct nodes! Good luck.

0
taprs On

Function partition_leaves() extracts all leaf labels per each node and makes a list ordered in the same fashion as get_nodes_xy() output. With your example,

Ltrs <- data.frame("A" = c(3,1), "B" = c(1,1), "C" = c(2,4), "D" = c(6,6))
dend <- as.dendrogram(hclust(dist(t(Ltrs))))
plot(dend)

partition_leaves(dend)

yields:

[[1]]
[1] "D" "C" "A" "B"

[[2]]
[1] "D"

[[3]]
[1] "C" "A" "B"

[[4]]
[1] "C"

[[5]]
[1] "A" "B"

[[6]]
[1] "A"

[[7]]
[1] "B"

filtering list by vector length will give output similar to the desired one.