Turn each value in array column into it's own row (Equivalent of pivot_longer for Dataframes.jl)

65 views Asked by At

Given a dataset like this:

 Row │ name     num_fruits  fruits                            
     │ String   Int64       Array…                            
─────┼────────────────────────────────────────────────────────
   1 │ Alice             2  ["cherry", "apple"]
   2 │ Bob               3  ["grape", "apple", "elderberry"]
   3 │ Charlie           5  ["apple", "apple", "elderberry",…

How would one turn each value in the array column "fruits" into it's own row, similar to pivot_longer in R? I tried stack, but it didn't do anything, except creating a column where every value was the string "fruits"

Is there something in Julia to make the data longer, or do I have to use a for loop + append?

Desired output:

10×3 DataFrame
 Row │ name     num_fruits  fruit      
     │ String   Int64       String     
─────┼─────────────────────────────────
   1 │ Alice             2  cherry
   2 │ Alice             2  apple
   3 │ Bob               3  grape
   4 │ Bob               3  apple
   5 │ Bob               3  elderberry
   6 │ Charlie           5  apple
   7 │ Charlie           5  apple
   8 │ Charlie           5  elderberry
   9 │ Charlie           5  cherry
  10 │ Charlie           5  apple

Code to reproduce the data:

using Random, DataFrames
Random.seed!(0)

df = DataFrame(name = ["Alice", "Bob", "Charlie"],
               num_fruits = [2, 3, 5])


fruits = ["apple", "banana", "cherry", "durian", "elderberry", "fig", "grape"]
df[!, :fruits] = [rand(fruits, n) for n in df.num_fruits]

stack(df, :fruits) # thing I tried
1

There are 1 answers

0
Bogumił Kamiński On BEST ANSWER

You want to flatten your data frame:

julia> flatten(df, :fruits)
10×3 DataFrame
 Row │ name     num_fruits  fruits
     │ String   Int64       String
─────┼─────────────────────────────────
   1 │ Alice             2  cherry
   2 │ Alice             2  apple
   3 │ Bob               3  grape
   4 │ Bob               3  apple
   5 │ Bob               3  elderberry
   6 │ Charlie           5  apple
   7 │ Charlie           5  apple
   8 │ Charlie           5  elderberry
   9 │ Charlie           5  cherry
  10 │ Charlie           5  apple

Note that:

  • stack stacks multiple columns into a one column
  • flatten flattens a single column containing iterable values into the same column

Note that the same is done by Iterators.flatten in Base Julia:

julia> collect(Iterators.flatten([1:2, 3:4, 5:6]))
6-element Vector{Int64}:
 1
 2
 3
 4
 5
 6