'논문 리뷰' 카테고리의 글 목록

AudioLM

논문 리뷰 2023. 8. 10. 14:24

Wav2Vec2.0

논문 리뷰 2023. 8. 10. 14:21

EMFORMER & AM-TRF 간단히 보기

논문 리뷰 2023. 5. 2. 15:29

Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition | IEEE Conference Publication | IEEE Xplore Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition This paper proposes an efficient memory transformer Emformer for low latency streaming speech recognition. In Emformer, the long-range history c..

DiffWae: A Versatile Diffusion Model for Audio Synthesis

논문 리뷰 2023. 3. 13. 10:01

GitHub - lmnt-com/diffwave: DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. 깃허브의 unofficial code인데, 코드가 잘 짜여있어 참고하면서 모델을 보면 좋을 것 같습니다. GitHub - lmnt-com/diffwave: DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. - GitHub - lmnt-com/diffwave: DiffWave is a fast, high-qualit..

Wav2Vec

논문 리뷰 2022. 12. 22. 17:36

Cycle GAN VC 3 and Mask Cycle GAN VC

논문 리뷰 2022. 5. 11. 21:02

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion (arxiv.org) CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have..

TRPO: Trust Region Policy Optimization

논문 리뷰 2022. 5. 11. 12:45

Trust Region Policy Optimization (mlr.press) Trust Region Policy Optimization In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified scheme, we develop a pr... proceedings.mlr.press

MUTE: Multitask Training with Text Data for End-to-End Speech Recognition

논문 리뷰 2022. 5. 11. 12:43

[2010.14318] Multitask Training with Text Data for End-to-End Speech Recognition (arxiv.org) Multitask Training with Text Data for End-to-End Speech Recognition We propose a multitask training method for attention-based end-to-end speech recognition models. We regularize the decoder in a listen, attend, and spell model by multitask training it on both audio-text and text-only data. Trained on th..

ABOUT ME

AI-research AI-research

티스토리툴바