I have a data where the first column is a bunch of ID numbers (some repeat), and the second column is just a bunch of numbers. I need a way to keep each ID number only once based on the smallest number in the second column.
Row#   ID   Number
1      10     180
2      12     167
3      12     182
4      12     135
5      15     152
6      15     133
Ex: I only want to keep Row# 1, 4, and 6 here and delete the rest
                        
For selecting the row that has the minimum 'Number' for each 'ID' group, we can use one of the aggregating by group function. A
base Roption isaggregate. Withaggregate, we can either use the 'formula' method or specify alistof grouping elements/variables with thebyargument. Using theformulamethod, we get theminvalue of 'Number' for each 'ID'.Or we can use a faster option with
data.table. Here, we convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', we get theminvalue of "Number".Or this can be also done with
setordertoorderthe 'Number' column and useuniquewithbyoption to select the first non-duplicated 'ID' row. (from @David Arenburgs' comments)Or using
dplyr, we group by 'ID' and get the subset rows withsummarise.Or we can use
sqldfsyntax to get the subset of data.Update
If there are multiple columns and you want to get the row based on the minimum value of 'Number' for each 'ID', you can use
which.min. Using.Iwill get the row index and that can be used for subsetting the rows.Or with
dplyrwe usesliceto filter out the rows that have theminvalue of 'Number' for each 'ID'