I'm using a Bradley-Terry model to model the outcome of tennis matches, and have encountered the following error. When I run:
library(BradleyTerry2)
matches <- read.csv("data/matches.csv")
model <- BTm(cbind(wins1,
wins2),
player1, player2, data=matches)
model
I get the error message:
Error in Diff(player1, player2, formula, id, data, separate.ability, refcat, :
'player1$..' and 'player2$..' must be factors with the same levels
The dataframe 'matches' has this format (Small reproducible example).
| player1 | player2 | wins1 | wins2 |
|---|---|---|---|
| Agassi | Federer | 0 | 6 |
| Agassi | Hewitt | 1 | 0 |
| Agassi | Roddick | 1 | 0 |
| Federer | Henman | 3 | 1 |
| Federer | Hewitt | 9 | 0 |
| Federer | Roddick | 5 | 0 |
| Henman | Hewitt | 0 | 2 |
| Henman | Roddick | 1 | 1 |
| Hewitt | Roddick | 3 | 2 |
... and so on. Any name that appears in player1 will appear in player2.
I don't understand why the factors player1 and player2 have different levels? I've tried setting them to factors using as.factor, but that didn't work. I also tried removing data=matches and using matches$wins1 etc. as arguments to the BTm function but that also didn't work. Now I'm a bit stuck, so any ideas are welcome!! Thank you :)
Look at what you have in the 1st and 2nd columns.
Factors are internally coded as consecutive integers starting at 1. Below I
unclasseach of them in order to get their internal representation.player1has two values, 1 and 2;player2has two values, 1 and 2;player1the level"Rafael Nadal"is the 1st, its value is 1;player2the level"Rafael Nadal"is the 2nd, its value is 2.This is because each of the columns is a factor on its own, with no relation to the other column.
Created on 2023-11-14
The solution is to get all of the unique values of all columns and use those unique values as levels when creating the factor.
In the code that follows the first instruction gets all unique values as character strings. Then creates factor columns with those strings as their levels.
Created on 2023-11-14
Data
Created on 2023-11-14
Edit
In the mean time the example data set in the question has changed. Except for references to the players names the code above is still valid and solves the problem.