Let's say I have 3 columns of data in R:
- Type: 'A' 'B' 'C' 'D' 'E' 'F' 'G'
- Value: UT 30 45 50 62 70 72
- Efficiency: 70 72 80 88 90 92 98
I want to bin just the 'numerical' data in the 'Value' column in incremement of 20 and display that on X-axis, while leaving the 'text' value in place [so the x-axis will display: UT 30-50 50-70 70-90], while showing 'Efficiency' on the Y-axis, with color = type.
Binning numerical datatype seems straight forward: bins <- seq(30, 80, by = 20) then plotting, but having that 'UT' is giving me a real challenge.
I'm a noob; just played around
#create bins
bins <- seq(30, 90, by = 20)
#create plot
ggplot(df, aes(x = cut(`Value`, breaks = bins, labels = sprintf("%d-%d", bins[-length(bins)], bins[-1])), y = 'Efficiency', color = `Type`))
Take a step back and consider the organization of your dataset. Columns in dataframes are vectors of a single type, so by including "UT" in that column, this is a character column:
Your
Valuecolumn appears to originate from numerical data, but you have that peskyUT. In this case, I think it's better to get more detailed with the dataframe so you can more precisely describe the situation:Now we're a bit closer to the fabled "tidy" dataset: Where rows are each individual observations, and each observation has characteristics that are well-defined in separate variables / columns. You also get the convenience of accessing your
Valuemeasurements in a numerical vector. It's generally good to avoid casting numbers as strings when you need to write any sort of logic about their numerical properties.Since you have a category "UT", your x-axis is really categorical, rather than numeric. I suggest a barplot with "bins" created from these categories, rather than a histogram where you set bin width, as histograms are generally meant for continuous numerical data.