How to use balanced sampler for torch Dataset/Dataloader

Question

How to use balanced sampler for torch Dataset/Dataloader

48 views Asked by Mateusz Konopelski At 29 February 2024 at 19:56

My simplified Dataset looks like:

class MyDataset(Dataset):
    def __init__(self) -> None:
        super().__init__()
        self.images: torch.Tensor[n, w, h, c]   # n images in memmory - specific use case
        self.labels: torch.Tensor[n, w, h, c]   # n images in memmory - specific use case
        self.positive_idx: List                 # positive 1 out of 10000 negative
        self.negative_idx: List
        
    def __len__(self):
        return 10000 # fixed value for training
        
    def __getitem__(self, idx):
        return self.images[idx], self.labels[idx]
    

ds = MyDataset()
dl = DataLoader(ds, batch_size=100, shuffle=False, sampler=...)   
# Weighted Sampler? Shuffle False because I guess the sampler should process shuffling.

What is the most "torch" way of balancing the sampling for Dataloader so the batch will be constructed as 10 positive + 90 random negative in each epoch and in case of not enough positive duplicating the possible ones?

For the purpose of this exercise I'm not implementing augmenting for increasing sample size of positives.

Original Q&A

There are 1 answers

**jupyter** · Answer 1 · 2024-03-02T14:27:34+00:00

I think you can implement a Batch Sampler to choose which data point will be yield for your dataset via __getitem__

class NegativeSampler:

  def __init__(self, positive_idx, negative_idx):
     
    self.positive_idx = positive_idx
    self.negative_idx = negative_idx 

  def __iter__(self): # this function will return index for your custom dataset ```__getitem__(self, idx)```
    
    for i in range(n_batch):
      positive_idx_batch = random.sample(self.positive_idx, batch_size)
      negative_idx_batch = []

      for pos_idx in positive_idx_batch:
        negative_idx_batch.append()
    
    
      yield positive_idx_batch + negative_idx_batch

TechQA.

How to use balanced sampler for torch Dataset/Dataloader

There are 1 answers

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in DATASET

Related Questions in SAMPLING

Related Questions in DATALOADER

Popular Questions

Trending Questions