I am solving a classification task with TimeSeriesDataset from pytorch-forecasting.
Despite looking through the documentation, I have a few questions related to normalization.
how do I normalize certain x columns per group (defined by group_id 'filename'). I tried using GroupNormalizer(). Does that work without putting .fit()? In the definition I see that the grouping category should be called 'group'. Is this something handles internally by
TimeSeriesDataset?Do I need to put .fit() after the definition of the NanEncoders and scalers? Some tutorials do, some don't. E.g.
categorical_encoders={'filename': pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True).fit(dataset.filename),I have added the NanLabelEncoder both in
target_normalizer:andcategorical_encoders:. I assume I should only put it once? Which of these two?
Here is my TimeSeriesDataSet definition:
self.training_dataset = TimeSeriesDataSet(
dataset[lambda x: x['idx'] <= training_cutoff],
time_idx='idx',
target="y",
group_ids=["filename"], # groups different time series
min_encoder_length=max_encoder_length,
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
add_relative_time_idx= True,
static_categoricals=["filename"],
static_reals=[],
time_varying_known_categoricals=[], # if time shifted.
time_varying_known_reals=['ATR', 'Open', 'High', 'Low', 'Close', 'Volume', 'CC', 'Close_pct' , 'Volume_pct', 'btc_close_pct', 'Upper_Band', 'Middle_Band', 'Lower_Band', 'vol_std', 'Moving_Average_Indicator'], ## add other variables later on
time_varying_unknown_categoricals=['y'],
time_varying_unknown_reals=[], #list of continuous variables that change over time and are not know in the future
# variable_groups=["filename"],
categorical_encoders={'filename': pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True).fit(dataset.filename), 'y': pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=False).fit(dataset.y)}, ## how are nans processed? there should be none.
# scalers= {"Close": GroupNormalizer().fit(dataset.Close, dataset), "ATR": GroupNormalizer().fit(dataset.ATR, dataset), 'Volume':GroupNormalizer().fit(dataset.Volume, dataset), 'CC':GroupNormalizer().fit(dataset.CC, dataset), 'vol_std':GroupNormalizer().fit(dataset.vol_std, dataset)}, #StandardScaler, Defaults to sklearn’s StandardScaler()
scalers= {"Close": GroupNormalizer(), "ATR": GroupNormalizer(), 'Volume':GroupNormalizer(), 'CC':GroupNormalizer(), 'vol_std':GroupNormalizer()}, #StandardScaler, Defaults to sklearn’s StandardScaler()
target_normalizer=pytorch_forecasting.data.encoders.NaNLabelEncoder(),
# target_normalizer=NaNLabelEncoder(),
add_target_scales=False,
add_encoder_length=False,
allow_missing_timesteps=False, # does not allow idx missing
predict_mode = False #To get only last output
)