How to recode one column conditional on another

32 views Asked by At

I currently have a dataframe in R with a variable for names ("Name") and one that notes whether an individual is female ("Female") (1, NA). A given individual likely has multiple rows because they responded to the survey multiple times. However, sometimes, the "Female" column is incorrect and will not list an individual who has previously responded to being Female (1), as (1) and instead will have an NA.

So for example, if my dataset looks like this:

Name <- c("Claire Smith", "John Doe", "Serena Williams", "Claire Smith", "Claire Smith", "Serena Williams")
Female <- c(1, NA, NA, NA, NA, 1)
df <- data.frame(Name, Female)

I want it to check across all rows so that it will recognize that "Claire Smith" and "Serena Williams" do have at least one "1" associated for their "Names" and propagate the rest of the "Female" columns for these names as "1". So I would want the code to do this:

Name <- c("Claire Smith", "John Doe", "Serena Williams", "Claire Smith", "Claire Smith", "Serena Williams")
Female <- c(1, NA, 1, 1, 1, 1)
df <- data.frame(Name, Female)

I'm sure this is quite an easy fix but I can't seem to figure it out. Thank you very much in advance.

3

There are 3 answers

3
MrFlick On BEST ANSWER

Using a bit of dplyr you can use

library(dplyr)

df %>% 
  mutate(Female=any(Female==1)+0, .by=Name)
#              Name Female
# 1    Claire Smith      1
# 2        John Doe     NA
# 3 Serena Williams      1
# 4    Claire Smith      1
# 5    Claire Smith      1
# 6 Serena Williams      1

You just check for any "1" value for the name and propagate that to the group. The any() function returns a TRUE/FALSE value so we use +0 to turn TRUE into 1.

0
Onyambu On
library(tidyverse)
group_by(df, Name)%>%
   fill(Female, .direction = 'updown')

# A tibble: 6 × 2
# Groups:   Name [3]
  Name            Female
  <chr>            <dbl>
1 Claire Smith         1
2 John Doe            NA
3 Serena Williams      1
4 Claire Smith         1
5 Claire Smith         1
6 Serena Williams      1
0
jpsmith On

A base R approach would be to use tapply and match:

xx <- tapply(df$Female, df$Name, sum, na.rm = TRUE)
df$Female2 <- xx[match(df$Name, names(xx))]

Output:

#              Name Female Female2
# 1    Claire Smith      1       1
# 2        John Doe     NA       0
# 3 Serena Williams     NA       1
# 4    Claire Smith     NA       1
# 5    Claire Smith     NA       1
# 6 Serena Williams      1       1

Note I made Female2 just for demonstration but could overwrite Female if desired.

Also, to preserve the NA instead of recoding males to 0, could do:

xx <- tapply(df$Female, df$Name, function(x) +any(x == 1))
df$Female2 <- xx[match(df$Name, names(xx))]

#              Name Female Female2
# 1    Claire Smith      1       1
# 2        John Doe     NA      NA
# 3 Serena Williams     NA       1
# 4    Claire Smith     NA       1
# 5    Claire Smith     NA       1
# 6 Serena Williams      1       1