Deleting rows in R conditionally

Question

Deleting rows in R conditionally

1.1k views Asked by Swanny At 15 June 2015 at 17:44

I have a data where the first column is a bunch of ID numbers (some repeat), and the second column is just a bunch of numbers. I need a way to keep each ID number only once based on the smallest number in the second column.

Row#   ID   Number
1      10     180
2      12     167
3      12     182
4      12     135
5      15     152
6      15     133

Ex: I only want to keep Row# 1, 4, and 6 here and delete the rest

Original Q&A

There are 2 answers

Veerendra Gadekar On 15 June 2015 at 18:13

Or

do.call(rbind, 
lapply(split(df1, df1$ID), function(x) subset(x, Number == min(Number))))

#ID Number
#10 10    180
#12 12    135
#15 15    133

**akrun** · Accepted Answer · 2015-06-15T17:45:56+00:00

For selecting the row that has the minimum 'Number' for each 'ID' group, we can use one of the aggregating by group function. A base R option is aggregate. With aggregate, we can either use the 'formula' method or specify a list of grouping elements/variables with the by argument. Using the formula method, we get the min value of 'Number' for each 'ID'.

aggregate(Number~ID, df1, FUN=min)

Or we can use a faster option with data.table. Here, we convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', we get the min value of "Number".

library(data.table)
setDT(df1)[, list(Number=min(Number)), by = ID]

Or this can be also done with setorder to order the 'Number' column and use unique with by option to select the first non-duplicated 'ID' row. (from @David Arenburgs' comments)

 unique(setorder(setDT(df1), Number), by = "ID")

Or using dplyr, we group by 'ID' and get the subset rows with summarise.

library(dplyr)
df1 %>%
   group_by(ID) %>%
   summarise(Number= min(Number))

Or we can use sqldf syntax to get the subset of data.

library(sqldf)
sqldf('select ID,
        min(Number) as Number
        from df1 
        group by ID')

Update

If there are multiple columns and you want to get the row based on the minimum value of 'Number' for each 'ID', you can use which.min. Using .I will get the row index and that can be used for subsetting the rows.

setDT(df1)[df1[,  .I[which.min(Number)], by = ID]$V1]

Or with dplyr we use slice to filter out the rows that have the min value of 'Number' for each 'ID'

df1 %>% 
    group_by(ID) %>%
    slice(which.min(Number))

TechQA.

Deleting rows in R conditionally

There are 2 answers

Update

Related Questions in R

Related Questions in CONDITIONAL-STATEMENTS

Related Questions in DELETE-ROW

Popular Questions

Trending Questions