I'm working on a program to find a subset of items that maximizes the total value of those items while meeting several constraints on various properties of those items. I've implemented a recursive branch-and-bound algorithm that works on inputs of small sizes (input array of size ~15, subset of size ~6 or so), but if the input array size is >20 or so then the algorithm never finishes running. The target dataset I have is ~1000 items, so would need to greatly improve the efficiency.
In more detail, I have an input list of objects of this structure:
class Player {
public:
POSITION position;
std::unordered_map<CATEGORY, float> category_zscores;
std::string name;
int cost;
float total_zscore;
Player(const std::string& name,
const std::unordered_map<CATEGORY, float>& category_zscores,
POSITION position, const std::vector<POSITION>& positions, int cost)
: name(name),
category_zscores(category_zscores),
position(position),
positions(positions),
cost(cost) {
total_zscore = getTotalPlayerZScore();
}
private:
float getTotalPlayerZScore() const {
float total_zscore = 0.0;
for (const auto& category : category_zscores) {
total_zscore += category.second;
}
return total_zscore;
}
};
And have implemented the following branch-and-bound algorithm:
void findBestTeam(const std::vector<Player>& players, std::vector<Player>& current_team, std::vector<Player>& best_team, int index) {
// First verify that current_team passes constraints on total cost, team size, minimum positions and minimum zscore sums
if (!checkConstraints(current_team)) {
return;
}
// If current_team has met all constraints and is full roster size, then check if it's a new best team
if (current_team.size() == TEAM_SIZE_LIMIT && getTotalTeamZScore(current_team) > getTotalTeamZScore(best_team)) {
std::vector<Player> temp;
for (const auto& player : current_team) {
temp.push_back(player);
}
best_team = temp;
return;
}
// If reached end of players array then return
if (index == players.size()) {
return;
}
// Recursive call without including the current player
findBestTeam(players, current_team, best_team, index + 1);
// Recursive call including the current player
current_team.push_back(players.at(index));
findBestTeam(players, current_team, best_team, index + 1);
current_team.pop_back();
}
The constraints are somewhat complicated:
bool checkConstraints(std::vector<Player> team) {
if (team.empty()) {
return true;
}
if (team.size() > TEAM_SIZE_LIMIT) {
return false;
}
int cost = getTotalTeamCost(team);
int numOpenSpots = TEAM_SIZE_LIMIT - team.size();
// this check is strictly for pruning, to try and figure out early on if cost constraint cannot be met in this branch
if (cost > TEAM_COST_LIMIT || (TEAM_COST_LIMIT - cost) < numOpenSpots) {
return false;
}
int guardsNeeded = MIN_PLAYERS_PER_POSITION.at(POSITION::guard) - std::count_if(team.begin(), team.end(), [](const Player& player) {
return player.position == POSITION::guard;
});
int forwardNeeded = MIN_PLAYERS_PER_POSITION.at(POSITION::forward) - std::count_if(team.begin(), team.end(), [](const Player& player) {
return player.position == POSITION::forward;
});
int centersNeeded = MIN_PLAYERS_PER_POSITION.at(POSITION::center) - std::count_if(team.begin(), team.end(), [](const Player& player) {
return player.position == POSITION::center;
});
// this check is strictly for pruning, to try and figure out early on if position constraints cannot be met in this branch
if (guardsNeeded > numOpenSpots || forwardNeeded > numOpenSpots || centersNeeded > numOpenSpots) {
return false;
}
// this check is for position constraint, to ensure a full roster has met the minimum requirements at each position
if (team.size() == TEAM_SIZE_LIMIT) {
for (CATEGORY category : { /*fg,*/ ft, threes, points, /*rebounds,*/ assists, steals, blocks, /*tos*/ }) {
if (getTotalTeamZScoreForCategory(team, category) < MIN_ZSCORE_PER_CAT) {
return false;
}
}
}
return true;
}
How would I go about improving the efficiency of this algorithm? I've thought a bit about dynamic programming in this case, but I'm unsure if there are enough overlapping subproblems given the constraints that must be applied. Will branch-and-bound approach ever work for a problem like this? What class of algorithm could solve this efficiently?
Appreciate any advice, has been years since I've taken algorithm courses and this side project is my first foray into something like this :)