如何解决如何在烧瓶应用程序中使用带有词汇和向量的Torchtext模型?
我的模型具有以下代码,该代码使用IMDB集中的向量和词汇。
TEXT = data.Field(tokenize="spacy",include_lengths=True)
LABEL = data.LabelField(dtype=torch.float)
from torchtext import datasets
train_data,valid_data = train_data.split(random_state=random.seed(SEED))
train_data,test_data = datasets.IMDB.splits(TEXT,LABEL)
TEXT.build_vocab(train_data,vectors="glove.6B.100d",unk_init=torch.Tensor.normal_)
LABEL.build_vocab(train_data)
BATCH_SIZE = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_iterator,valid_iterator,test_iterator = data.BucketIterator.splits(
(train_data,valid_data,test_data),batch_size=BATCH_SIZE,sort_within_batch=True,device=device,)
class RNN(nn.Module):
def __init__(
self,vocab_size,embedding_dim,hidden_dim,output_dim,n_layers,bidirectional,dropout,pad_idx,):
super().__init__()
self.embedding = nn.Embedding(vocab_size,padding_idx=pad_idx)
self.rnn = nn.LSTM(
embedding_dim,num_layers=n_layers,bidirectional=bidirectional,dropout=dropout,)
self.fc = nn.Linear(hidden_dim * 2,output_dim)
self.dropout = nn.Dropout(dropout)
def forward(self,text,text_lengths):
# text = [sent len,batch size]
embedded = self.dropout(self.embedding(text))
# embedded = [sent len,batch size,emb dim]
# pack sequence
packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded,text_lengths)
packed_output,(hidden,cell) = self.rnn(packed_embedded)
# unpack sequence
output,output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)
# output = [sent len,hid dim * num directions]
# output over padding tokens are zero tensors
# hidden = [num layers * num directions,hid dim]
# cell = [num layers * num directions,hid dim]
# concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:]) hidden layers
# and apply dropout
hidden = self.dropout(torch.cat((hidden[-2,:],hidden[-1,:]),dim=1))
# hidden = [batch size,hid dim * num directions]
return self.fc(hidden)
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
N_LAYERS = 2
BIDIRECTIONAL = True
DROPOUT = 0.5
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
model = RNN(
INPUT_DIM,EMBEDDING_DIM,HIDDEN_DIM,OUTPUT_DIM,N_LAYERS,BIDIRECTIONAL,DROPOUT,PAD_IDX,)
pretrained_embeddings = TEXT.vocab.vectors
print(pretrained_embeddings.shape)
model.embedding.weight.data.copy_(pretrained_embeddings)
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]
model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)
optimizer = optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()
model = model.to(device)
criterion = criterion.to(device)
*training and evaluation functions and such*
nlp = spacy.load("en")
def predict_sentiment(model,sentence):
model.eval()
tokenized = [tok.text for tok in nlp.tokenizer(sentence)]
indexed = [TEXT.vocab.stoi[t] for t in tokenized]
length = [len(indexed)]
tensor = torch.LongTensor(indexed).to(device)
tensor = tensor.unsqueeze(1)
length_tensor = torch.LongTensor(length)
prediction = torch.sigmoid(model(tensor,length_tensor))
return prediction.item()
保存和加载的方式与此相同:
torch.save(model.state_dict(),"Finished Models/Pytorch/LSTM_w_vectors.pt")
model.load_state_dict(torch.load("Finished Models/Pytorch/LSTM_w_vectors.pt"))
如何使用导入/部署此模型?我需要泡菜和PAD_IDX吗?还是Pytorch本身有功能?
我在复制类RNN时使用了load_state,但这没用。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。