Sankey Diagram in R & plotly: unnexpected connections

57 views Asked by At

I am trying to build Sankey plot with three layers in R using plotly package (plotly_4.10.2). Although connections from source to target seems reasonable from "links" data, plot itself displays connections incorrectly.

For example, "example.data" -> Gene3-Treatment-Catogory2 is displayed as Gene3-Treatment-Category1, Connections for Gene8 is wrong as well. Should I do any rearrangement of labels before plotting?

Screenshot of the plot

enter image description here

library(plotly)

# this is an example data

example.data <- data.frame(
  genes = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5", "Gene6", "Gene7", "Gene8", "Gene9"),
  conditions = c("Control", "Control", "Treatment", "Treatment", "Treatment", "Treatment", "Treatment", "Treatment", "Treatment"),
  category = c("Category1", "Category1", "Category2", "Category2", "Category2", "Category2", "Category2", "Category1", "Category2")
)

nodes <- data.frame(name = unique(c(as.character(example.data$genes),
                                    as.character(example.data$conditions),
                                    as.character(example.data$category))))

links <- data.frame(source = match(example.data$genes, nodes$name) - 1,
                    target = match(example.data$conditions, nodes$name) - 1,
                    stringsAsFactors = FALSE)

links <- rbind(links,
               data.frame(source = match(example.data$conditions, nodes$name) - 1,
                          target = match(example.data$category, nodes$name) - 1,
                          stringsAsFactors = FALSE))


plotly::plot_ly(
  type = "sankey",
  domain = list(x =  c(0,1),
                y =  c(0,1)),
  orientation = "h",
  customdata = nodes$name,
  node = list(
    label = nodes$name,
    pad = 15,
    thickness = 15,
    line = list(color = "black",
                width = 0.5)),
  link = list(source = links$source,
              target = links$target,
              value =   rep(1, nrow(links))
  ))
2

There are 2 answers

0
zx8754 On BEST ANSWER

Maybe try to plot in this order: condition -> genes -> category:

nodes <- unique(unlist(example.data))

links <- list(
  source = c(match(example.data$conditions, nodes) - 1, 
             match(example.data$genes, nodes) - 1),
  target = c(match(example.data$genes, nodes) - 1,
             match(example.data$category, nodes) - 1),
  value = rep(1, nrow(example.data) * 2))

plot_ly(type = "sankey",
        node = list(label = nodes),
        link = links)

enter image description here

1
ihecker On

The connections are actually correct, the Sankey diagram display the flows between State 1 (source node) genes and State 2 (target node) conditions and then, separately, the flows between State 2 conditions and State 3 category.

It makes more sense if you hover over the flows, you will see that, for example, one value from Treatment went to Category1. However, just because the flow seems linked to Gene3 does not necessarily mean that it is:

enter image description here