I have a training set of 12k samples. Each sample is a result for my model.
For instance, i have 2 features and a label (f1 is a category and f2 a text):
F1,F2,LABEL
ALPHA, 114, ALPHA_114
ALPHA, 125, ALPHA_125
BETA, 213, BETA_213
I would match "ALPHA 113" in ALPHA_114 and not in BETA_213 (I would correct user's input).
I've trained 40 samples in about 40 seconds. After that, I wanted to train 120 samples but after 360 seconds no model were found.
How long do I need to train to learn from 12k samples?
I suppose the problem could be the too many labels that I have
First solution : I would split my training set in smaller sets that are more distant from each other, for example a set where F1 like 'A*'
, F1 like 'B*'
and so on.
Then, build a model for each set and merge theese models to create an unique big model.
Is this correct? Does there exist a way to merge different models? Is there a smarter way?
Second solution could be to create n ramdom sets (300 sets of 40 samples?), learn from single sets and merge the models. The questions are the same