Quicker way? Remove rows in book1, take row 4 values as column name, set some column name same as book2

Question

Quicker way? Remove rows in book1, take row 4 values as column name, set some column name same as book2

53 views Asked by Catalyst At 25 June 2022 at 10:48

Below is the first dataframe where I want to remove the first 3 rows:

book1 <- structure(list(Instructions..xyz = c("Note: abc", "", "Set1", 
                                              "id", "632592651", "633322173", "634703802", "634927873", "635812953", 
                                              "636004739", "636101211", "636157799", "636263106", "636752420"
), X = c("", "", "", "title", "asdf", "cat", "dog", "mouse", 
         "elephant", "goose", "rat", "mice", "kitty", "kitten"), X.1 = c("", 
                                                                         "", "", "hazard", "y", "y", "y", "n", "n", "y", "y", "n", "n", 
                                                                         "y"), X.2 = c("", "", "Set2", "id", "632592651", "633322173", 
                                                                                       "634703802", "634927873", "635812953", "636004739", "636101211", 
                                                                                       "636157799", "636263106", "636752420"), X.3 = c("", "", "", "title", 
                                                                                                                                       "asdf2", "cat2", "dog2", "mouse2", "elephant2", "goose2", "rat2", 
                                                                                                                                       "mice2", "kitty2", "kitten2"), X.4 = c("", "", "", "index", "0.664883807", 
                                                                                                                                                                              "0.20089779", "0.752228086", "0.124729276", "0.626285086", "0.134537909", 
                                                                                                                                                                              "0.612526768", "0.769622463", "0.682532524", "0.819015658")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                -14L))

I did book1 <- book1[-c(1:3),] but I'm not sure how to make id, title, hazard, id, title, index as the column name instead of Instructions..xyz, etc. See image below for desired output

Then for the second dataframe,

book2 <- structure(list(identity = c(632592651L, 633322173L, 634703802L, 
                                     634927873L, 635812953L, 636004739L, 636101211L, 636157799L, 636263106L, 
                                     636752420L, 636809222L, 2004722036L, 2004894388L, 2005045755L, 
                                     2005535472L, 2005630542L, 2005788781L, 2005809679L, 2005838317L, 
                                     2005866692L), text = c("asdf_xyz", "cat", "dog", "mouse", "elephant", 
                                                            "goose", "rat", "mice", "kitty", "kitten", "tiger_xyz", "lion", 
                                                            "leopard", "ostrich", "kangaroo", "platypus", "fish", "reptile", 
                                                            "mammals", "amphibians_xyz"), volume = c(1234L, 432L, 324L, 333L, 
                                                                                                     2223L, 412346L, 7456L, 3456L, 2345L, 2345L, 6L, 345L, 23L, 2L, 
                                                                                                     4778L, 234L, 8675L, 3459L, 8L, 9L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                              -20L))

I then rename column 1 and 2 in book2 so that it matches that of book1 by names(book2)[1:2] <- c('id','title') where I can later do inner_join. The desired output is shown in the image below by

library(dplyr)
book1 %>%
  inner_join(book2, by = c("id", "title"))

This is taking quite a few steps and wondering if there's a simplified version to this?

Original Q&A

There are 2 answers

Rui Barradas On 25 June 2022 at 11:18

Something like this?

# split the data by columns
book2a <- book1[-(1:4), 1:3]
book2b <- book1[-(1:4), 4:6]

# take care of names
names(book2a) <- book1[4, 1:3, drop = TRUE]
names(book2b) <- book1[4, 4:6, drop = TRUE]

# book2b needs processing
book2b$title <- sub("2", "", book2b$title)
book2b$index <- as.numeric(book2b$index)

# join both data sets and clean-up
book2 <- merge(book2a, book2b, all = TRUE)
rm(book2a, book2b)

book2
#>           id    title hazard     index
#> 1  632592651     asdf      y 0.6648838
#> 2  633322173      cat      y 0.2008978
#> 3  634703802      dog      y 0.7522281
#> 4  634927873    mouse      n 0.1247293
#> 5  635812953 elephant      n 0.6262851
#> 6  636004739    goose      y 0.1345379
#> 7  636101211      rat      y 0.6125268
#> 8  636157799     mice      n 0.7696225
#> 9  636263106    kitty      n 0.6825325
#> 10 636752420   kitten      y 0.8190157

^{Created on 2022-06-25 by the reprex package (v2.0.1)}

**Catalyst** · Accepted Answer · 2022-06-25T11:23:13+00:00

Found the solution to the first question

library(janitor)
book1 <- row_to_names(dat=book1, row_number=4, remove_row = TRUE, remove_rows_above = TRUE)

I applied

names(book1)[4:5] <- c('id1','title1')

to obtain unique column name, then tried inner_join as proposed earlier but with error and found that book1$id is character where book2$id is int and so I did

book1$id <- as.integer(book1$id)

and finally it works with

library(tidyverse)
Yeah <- book1 %>%
  inner_join(book2, by = c("id", "title"))

Output below:

 id    title hazard       id1    title1       index volume
1 633322173      cat      y 633322173      cat2  0.20089779    432
2 634703802      dog      y 634703802      dog2 0.752228086    324
3 634927873    mouse      n 634927873    mouse2 0.124729276    333
4 635812953 elephant      n 635812953 elephant2 0.626285086   2223
5 636004739    goose      y 636004739    goose2 0.134537909 412346
6 636101211      rat      y 636101211      rat2 0.612526768   7456
7 636157799     mice      n 636157799     mice2 0.769622463   3456
8 636263106    kitty      n 636263106    kitty2 0.682532524   2345
9 636752420   kitten      y 636752420   kitten2 0.819015658   2345

Still wondering if there's a quicker way?

TechQA.

Quicker way? Remove rows in book1, take row 4 values as column name, set some column name same as book2

There are 2 answers

Related Questions in R

Related Questions in RENAME

Related Questions in ANTI-JOIN

Popular Questions

Trending Questions