A Flow-based Generative Network for Speech Synthesis
WaveGlow: a Flow-based Generative Network for Speech Synthesis
Ryan Prenger, Rafael Valle, and Bryan Catanzaro
In our recent [paper], we propose WaveGlow: a flow-based network capableof generating high quality speech from mel-spectrograms. WaveGlowcombines insights from [Glow] and [WaveNet] in order to provide fast,efficient and high-quality audio synthesis, without the need forauto-regression. WaveGlow is implemented using only a single network,trained using only a single cost function: maximizing the likelihood ofthe training data, which makes the training procedure simple andstable.
Our [PyTorch] implementation produces audio samples at a rate of more than500 kHz on an NVIDIA V100 GPU and Mean Opinion Scores show that it deliversaudio quality as good as the best publicly available WaveNetimplementation.