Dissecting deepspeech.pytorch Part 2

Architecture

Input Data

first_batch = next(iter(train_loader))
inputs, targets, input_percentages, target_sizes = first_batch
inputs.shape
> torch.Size([16, 1, 161, 457])

torch.Size([16, 1, 161, 457])

MaskConv

self.conv = MaskConv(nn.Sequential(
nn.Conv2d(1, 32, kernel_size=(41, 11), stride=(2, 2), padding=(20, 5)),
nn.BatchNorm2d(32),
nn.Hardtanh(0, 20, inplace=True),
nn.Conv2d(32, 32, kernel_size=(21, 11), stride=(2, 1), padding=(10, 5)),
nn.BatchNorm2d(32),
nn.Hardtanh(0, 20, inplace=True)
))

But why do we need mask thing in the first place?

RNN

Fully Connected

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store