R code for reading Excel in character format

61 views Asked by At

I am trying to open an Excel dataset in R studio, exactly how it looks in Excel, so I can learn how to manipulate it in different ways within R studio, rather than doing prep via Excel first. I am essentially trying a blind practical self teaching of R, with no prior coding knowledge and see this as a good way to figure my way around R. The issue I am having is that some of the format of the data has translated differently in R (for example, a filename extension of 8.7 in Excel coming through as 8.6666529999776429). I have tried to work around this, but it has just made all cells have the correct numbers but with the same number of decimal places, which is still not what I need, and I am now stumped as to how I need to adapt my open.xlsx read code to have it match exactly what Excel shows.

I have been able to read my Excel file into R studio using openxlsx package and with the following code:

read.xlsx('*filepath*.xlsx', sheet = "overview", startRow = 1, colNames = TRUE, rowNames = FALSE, detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE, rows = NULL, cols = NULL, check.names = FALSE, sep.names = " ", namedRegion = NULL, na.strings = NA, fillMergedCells = FALSE, 

followed by:

data<- read.xlsx('*filepath*.xlsx', sheet = "overview", startRow = 1, colNames = TRUE, rowNames = FALSE, detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE, rows = NULL, cols = NULL, check.names = FALSE, sep.names = " ", namedRegion = NULL, na.strings = NA, fillMergedCells = FALSE)

But when I look at the table it has imported the format of some numbers in the 'Filename extension' column incorrectly. I have added the following code:

#reframing the column for numeric setting in the same format as the source

column_name <- 'Filename extension'

#setting the Filename extension column as numeric

data[[column_name]] <- as.numeric(data[[column_name]])

But this has added decimal places where they are not needed as I want the filename numbers as they are presented in Excel. For example, the left column here shows Excel, and the right column shows what R presents.

Excel R Studio
1.0 1.00
2.1 2.10
3.2 3.20
10.9 10.90
11.10 11.10
12.11 12.11

I want to present the filenames correctly and have read that I need to change the column classes to 'character', after which I can then assign that particular column as above, and this should retain the numbers in that column exactly as they are in the excel sheet. I have tried several variations of code in which I have added colClasses arguments to the end of my read_xlsx code, but each time I get the error showing the colClasses as an unused argument. I've also added 'as.data.frame = TRUE alongside the colClasses and had the same response. What am I doing wrong? And will this colClasses addition actually get me to my desired end result? I am massively new to R studio and coding, so explanations in great detail would be fantastic as I'm getting most of my code from pages like this, or chatGPT, so it's very much a trial and error learning process for me.

0

There are 0 answers