Updating old code with new torch vocab methods (stoi and itos methods changed)

Question

Updating old code with new torch vocab methods (stoi and itos methods changed)

177 views Asked by BoostMatch At 11 August 2023 at 02:26

I am trying to create a Japanese-English translation model following this Medium article. https://arusl.medium.com/japanese-english-language-translation-with-transformer-using-pytorch-243738146806 Everything runs perfectly until the second to last cell, when I get an error running the translate function. The error is specifically on this line.

tokens = [BOS_IDX] + [src_vocab.stoi[tok] for tok in src_tokenizer.encode(src, out_type=str)]+ [EOS_IDX]

The error: AttributeError: 'Vocab' object has no attribute 'stoi'. Since the article was written, the method .stoi has changed to get_stoi() → Dict[str, int] according to the torchtext documentation (https://pytorch.org/text/stable/vocab.html). When I attempt to change the line to the following, however, I get the error "Counter object has no attribute 'get_stoi'."

tokens = [BOS_IDX] + [src_vocab.get_stoi()[tok] for tok in src_tokenizer.encode(src, out_type=str)]+ [EOS_IDX]

The same goes for the itos and get_itos() method. If I try to use the method as Any help for how to make this work would be greatly appreciated as I'm very dumbfounded at the moment.

A similar question was asked here but I don't see how to implement the answer or make it work in this case. 'Vocab' object has no attribute 'itos'

Edit: This function seems suspect as it is creating vocab out of a counter... is there a better way to do this?

def build_vocab(sentences, tokenizer):
  counter = Counter()
  for sentence in sentences:
    counter.update(tokenizer.encode(sentence, out_type=str))
  return Vocab(counter)

Thank you!

Original Q&A

There are 2 answers

**Azury** · Answer 1 · 2023-08-11T02:31:49+00:00

Hmm... After reading the Medium article, I think you're encountering this error due to changes in the torchtext library's methods for vocabulary handling. The 'Vocab' object no longer has the 'stoi' attribute; it has been replaced with 'get_stoi()'. Similarly, 'itos' is now 'get_itos()'.

To fix the error, update your code:

Original code:

tokens = [BOS_IDX] + [src_vocab.stoi[tok] for tok in src_tokenizer.encode(src, out_type=str)] + [EOS_IDX]

Updated code:

tokens = [BOS_IDX] + [src_vocab.get_stoi()[tok] for tok in src_tokenizer.encode(src, out_type=str)] + [EOS_IDX]

Make the same change for 'itos' to 'get_itos()', if applicable.

**dust as** · Answer 2 · 2023-11-27T12:58:54+00:00

dust as On 27 November 2023 at 12:58

update code: tokens = [BOS_IDX] + [src_vocab[tok] for tok in src_tokenizer.encode(src, out_type=str)] + [EOS_IDX]

TechQA.

Updating old code with new torch vocab methods (stoi and itos methods changed)

There are 2 answers

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in NLP

Related Questions in TORCHTEXT

Popular Questions

Trending Questions