Obtain counts within 3 variable factors

51 views Asked by At

I have a sensitive dataset so I created a mock one here for show.

data.frame(
  Year = c("2010", "2010", "2010", "2011", "2011", "2012", "2013", "2013", "2013"),
  Race = c("White", "White", "Asian", "White", "Black", "Black", "Unknown", "Unknown", "White"),
  Ethnicity = c("Hispanic", "Hispanic", "Not Hispanic", "Hispanic", "Not Hispanic", "Not Hispanic", "Unknown", "Hispanic", "Not Hispanic")
)

 Year    Race    Ethnicity
1 2010   White     Hispanic
2 2010   White     Hispanic
3 2010   Asian Not Hispanic
4 2011   White     Hispanic
5 2011   Black Not Hispanic
6 2012   Black Not Hispanic
7 2013 Unknown      Unknown
8 2013 Unknown     Hispanic
9 2013   White Not Hispanic

In reality, I have a dataset that goes from 2010-2021, so 12 years total. There are also around 6/7 racial categories, and 3 different answers for ethnicity (Hispanic/Latino, not hispanic/latino, unknown).

I am trying to obtain counts for each year, race, and ethnicity (for example, 2010 white hispanic, 2010 white non-hispanic, 2010 asian hispanic, 2010 asian non-hispanic, etc...). I am currently using this function to pull the counts-

raceethfunc <- function(x,y,z){
df %>% filter(Race == x & Ethnicity == y and Year = z) %>%
nrow()
}

H_white2010 <- raceethfunc(x = "White", y = "Hispanic or Latino", z = "2010")
H_white2011 <- raceethfunc(x = "White", y = "Hispanic or Latino", z = "2011")
H_white2012 <- raceethfunc(x = "White", y = "Hispanic or Latino", z = "2012")

Etc...

I am having to do this for each year, race, and ethnicity which means I would have to be copying and pasting like 200+ lines of code to change maybe the year in one line, or the race in another, it is a very inefficient way of going about it.

I am newer to coding but functions especially. I tried using a for() loop but could not understand how to get it to run, any guidance on a loop or a more efficient way to go about this would greatly be appreciated.

PS- This is my first post ever here as well, if I am doing something incorrectly, please let me know how I can better my future posts!

2

There are 2 answers

2
Grzegorz Sapijaszko On BEST ANSWER

group_by and count from {dplyr} package, like:

df <- data.frame(
  Year = c("2010", "2010", "2010", "2011", "2011", "2012", "2013", "2013", "2013"),
  Race = c("White", "White", "Asian", "White", "Black", "Black", "Unknown", "Unknown", "White"),
  Ethnicity = c("Hispanic", "Hispanic", "Not Hispanic", "Hispanic", "Not Hispanic", "Not Hispanic", "Unknown", "Hispanic", "Not Hispanic")
)

df |>
  dplyr::group_by(Year, Race, Ethnicity) |>
  dplyr::count()
#> # A tibble: 8 × 4
#> # Groups:   Year, Race, Ethnicity [8]
#>   Year  Race    Ethnicity        n
#>   <chr> <chr>   <chr>        <int>
#> 1 2010  Asian   Not Hispanic     1
#> 2 2010  White   Hispanic         2
#> 3 2011  Black   Not Hispanic     1
#> 4 2011  White   Hispanic         1
#> 5 2012  Black   Not Hispanic     1
#> 6 2013  Unknown Hispanic         1
#> 7 2013  Unknown Unknown          1
#> 8 2013  White   Not Hispanic     1

Created on 2023-06-30 with reprex v2.0.2

2
jkatam On

Alternatively we can use add_count to get the count by group and also name the count variable

library(dplyr)

df %>% add_count(Year, Race, Ethnicity, name = 'n') 

Created on 2023-06-30 with reprex v2.0.2

  Year    Race    Ethnicity n
1 2010   White     Hispanic 2
2 2010   White     Hispanic 2
3 2010   Asian Not Hispanic 1
4 2011   White     Hispanic 1
5 2011   Black Not Hispanic 1
6 2012   Black Not Hispanic 1
7 2013 Unknown      Unknown 1
8 2013 Unknown     Hispanic 1
9 2013   White Not Hispanic 1