First post on StackExchange, please forgive me if I format incorrectly!
If I specify the levels of a vector v in R, then call factor(v), not all of the levels will show up. I'm trying to figure out why is this the case because I need to see all levels (including "empty" levels) when I call factor for a project that I am working on.
A very simple replication of this:
x <- c('a', 'a', 'b', 'b', 'c', 'c')
levels(x) <- c('a', 'b', 'c', 'd')
Now if we call levels(x), it will output exactly what you'd expect:
> levels(x)
[1] "a" "b" "c" "d"
However, the levels change when calling factor(x):
> factor(x)
[1] a a b b c c
Levels: a b c
What happened to the 'd' level that I introduced? I know there is no datapoint associated with this level, but I don't see why the level should get removed when I call 'factor'. Unfortunately, I need to be able to reference all levels when I call 'factor', so is there anyway to work around this?
When you first create
x, its class ischaracter. When you assign itlevels, it gains alevelsattribute, but it is stillcharacterclass, not afactor:When you call
factoron an object, it is converted tofactorclass, and as the?factordocumentation states, the default levels areAny existing levels are not considered
Even if we start with a
factorclass object, callingfactoron it "re-factors" it with the default levels--which are only the values that are present:As for workarounds:
Don't call
factoron things that are alreadyfactors unless you want to change the levels. It's not clear why you need to do this. Useis.factor()to test if your object is a factor or not, and only callfactor()on it if it isn't already.If you really have to call
factoron afactorobject (or even acharacterobject with alevelsattribute) and want to preserve its levels, specify its old levels in thelevelsargument, e.g.,x = factor(x, levels = levels(x)). Note that this won't work on an object without alevelsattribute, as above you probably want to useis.factor()to test your input and act accordingly.