논문 리뷰
-
-
EMFORMER & AM-TRF 간단히 보기논문 리뷰 2023. 5. 2. 15:29
Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition | IEEE Conference Publication | IEEE Xplore Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition This paper proposes an efficient memory transformer Emformer for low latency streaming speech recognition. In Emformer, the long-range history c..
-
DiffWae: A Versatile Diffusion Model for Audio Synthesis논문 리뷰 2023. 3. 13. 10:01
GitHub - lmnt-com/diffwave: DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. 깃허브의 unofficial code인데, 코드가 잘 짜여있어 참고하면서 모델을 보면 좋을 것 같습니다. GitHub - lmnt-com/diffwave: DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. - GitHub - lmnt-com/diffwave: DiffWave is a fast, high-qualit..
-
Cycle GAN VC 3 and Mask Cycle GAN VC논문 리뷰 2022. 5. 11. 21:02
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion (arxiv.org) CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have..
-
TRPO: Trust Region Policy Optimization논문 리뷰 2022. 5. 11. 12:45
Trust Region Policy Optimization (mlr.press) Trust Region Policy Optimization In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified scheme, we develop a pr... proceedings.mlr.press
-
MUTE: Multitask Training with Text Data for End-to-End Speech Recognition논문 리뷰 2022. 5. 11. 12:43
[2010.14318] Multitask Training with Text Data for End-to-End Speech Recognition (arxiv.org) Multitask Training with Text Data for End-to-End Speech Recognition We propose a multitask training method for attention-based end-to-end speech recognition models. We regularize the decoder in a listen, attend, and spell model by multitask training it on both audio-text and text-only data. Trained on th..