Normalizing x per group in TimeSeriesDataset

99 views Asked by At

I am solving a classification task with TimeSeriesDataset from pytorch-forecasting.

Despite looking through the documentation, I have a few questions related to normalization.

  1. how do I normalize certain x columns per group (defined by group_id 'filename'). I tried using GroupNormalizer(). Does that work without putting .fit()? In the definition I see that the grouping category should be called 'group'. Is this something handles internally by TimeSeriesDataset?

  2. Do I need to put .fit() after the definition of the NanEncoders and scalers? Some tutorials do, some don't. E.g. categorical_encoders={'filename': pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True).fit(dataset.filename),

  3. I have added the NanLabelEncoder both in target_normalizer: and categorical_encoders: . I assume I should only put it once? Which of these two?

Here is my TimeSeriesDataSet definition:

self.training_dataset = TimeSeriesDataSet(
    dataset[lambda x: x['idx'] <= training_cutoff],
    time_idx='idx', 
    target="y",
    group_ids=["filename"], # groups different time series
    min_encoder_length=max_encoder_length,  
    max_encoder_length=max_encoder_length,
    min_prediction_length=1, 
    max_prediction_length=max_prediction_length,
    add_relative_time_idx= True,
    static_categoricals=["filename"],
    static_reals=[],
    time_varying_known_categoricals=[],  # if time shifted. 
    time_varying_known_reals=['ATR', 'Open', 'High', 'Low', 'Close', 'Volume', 'CC', 'Close_pct' , 'Volume_pct', 'btc_close_pct', 'Upper_Band', 'Middle_Band', 'Lower_Band', 'vol_std', 'Moving_Average_Indicator'], ## add other variables later on
    time_varying_unknown_categoricals=['y'],
    time_varying_unknown_reals=[], #list of continuous variables that change over time and are not know in the future
    # variable_groups=["filename"],
    categorical_encoders={'filename': pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True).fit(dataset.filename), 'y': pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=False).fit(dataset.y)}, ## how are nans processed? there should be none.  
    # scalers= {"Close": GroupNormalizer().fit(dataset.Close, dataset), "ATR": GroupNormalizer().fit(dataset.ATR, dataset), 'Volume':GroupNormalizer().fit(dataset.Volume, dataset), 'CC':GroupNormalizer().fit(dataset.CC, dataset), 'vol_std':GroupNormalizer().fit(dataset.vol_std, dataset)}, #StandardScaler, Defaults to sklearn’s StandardScaler()
    scalers= {"Close": GroupNormalizer(), "ATR": GroupNormalizer(), 'Volume':GroupNormalizer(), 'CC':GroupNormalizer(), 'vol_std':GroupNormalizer()}, #StandardScaler, Defaults to sklearn’s StandardScaler()
    target_normalizer=pytorch_forecasting.data.encoders.NaNLabelEncoder(),
    # target_normalizer=NaNLabelEncoder(),
    add_target_scales=False, 
    add_encoder_length=False,
    allow_missing_timesteps=False, # does not allow idx missing
    predict_mode = False #To get only last output
)
0

There are 0 answers