How can column labels/descriptions be set in R using values from a dataframe?

98 views Asked by At

I often work with SAS datasets which tend to come with descriptions/labels for each column, visible when using the View() function (they appear as subtext under each column name).

My questions has two parts: 1) how can one set those labels manually? and 2) can those labels be set using values present in a tibble?

For example, let's say I have the following dataset (which is extremely similar to one I just actually received):

library(dplyr)

df <- tibble(SBJID = c("Subject ID", 1, 2, 3, 4, 5),
             AGBL = c("Age at baseline", 54, 23, 18, 29, 31),
             LBCD = c("Parameter code", rep("HGB", 5)),
             LBRSSTDU = c("Result in standard units", 10, 12, 9, 14, 11)) 
SBJID AGBL LBCD LBRSSTDU
Subject ID Age at baseline Parameter code Result in standard units
1 54 HGB 10
2 23 HGB 12
3 18 HGB 9
4 29 HGB 14
5 31 HGB 11

I obviously don't want the first row to remain, I want to remove that row and set the values to be the descriptions for the column heads (again, as one would see from a SAS-derived data frame).

Any suggestions?

1

There are 1 answers

7
jkatam On BEST ANSWER

Please try the labelled package as below

manual approach

library(labelled)

df2 <- labelled::set_variable_labels(df, SBJID="Subject ID",
                              AGBL="Age at baseline",
                              LBCD="Parameter code",
                              LBRSSTDU="Result in standard units") %>% 
  filter(row_number()!=1)

Created on 2023-08-02 with reprex v2.0.2

tibble approach

library(labelled)

# get the names of the variables 
nam <- names(df)

# get the labels from the first row of the tibble
lab <- paste(df[!str_detect(df$SBJID,'\\d'),],sep='#')

# create a tibble with name and label 
description <- tibble(name=nam, label=lab)

# set names to the labels 
var_labels <- setNames(as.list(description$label), description$name)

# set the labels with var_labels 
df_labelled <- df %>%
  set_variable_labels(.labels = var_labels, .strict = FALSE) %>% filter(row_number()!=1)

Created on 2023-08-02 with reprex v2.0.2

enter image description here