How do I filter out low occurrence nodes from a graph?

34 views Asked by At

I'm trying to make a graph using hashtags from Instagram. Each node is a hashtag and has edges to each hashtag it was paired with in a post. I want to filter out the nodes (hashtags) with a low number of occurrences.

I'm able to filter them out by manually setting a limit like 35 or 30, but I want to make it so this limit is calculated using some parameters from the graph.

1

There are 1 answers

0
ravenspoint On

You do not say how you want to adjust the limit according to which parameters of the graph.

Guess: you want to set the limit so that it eliminates some percentage of the nodes. e.g. you want to eliminate 10% of the nodes, those that have the lowest occurrence count.

  • LOOP over every hashtag
    • Count occurences
    • save in array of pairs ( hashtag, #occurences )
  • Sort pair array in ascending number of #occurences
  • LOOP over sorted pair array
    • delete nodes with hashtag
    • IF required percentage of nodes have been deleted
      • STOP