분류 전체보기
-
EMFORMER & AM-TRF 간단히 보기논문 리뷰 2023. 5. 2. 15:29
Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition | IEEE Conference Publication | IEEE Xplore Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition This paper proposes an efficient memory transformer Emformer for low latency streaming speech recognition. In Emformer, the long-range history c..
-
DiffWae: A Versatile Diffusion Model for Audio Synthesis논문 리뷰 2023. 3. 13. 10:01
GitHub - lmnt-com/diffwave: DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. 깃허브의 unofficial code인데, 코드가 잘 짜여있어 참고하면서 모델을 보면 좋을 것 같습니다. GitHub - lmnt-com/diffwave: DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. - GitHub - lmnt-com/diffwave: DiffWave is a fast, high-qualit..
-
-
Subjective Test카테고리 없음 2022. 11. 18. 15:45
구글 폼에서 설문을 하실 수 있습니다. 1. sound quality test 목소리의 자연스러움을 1~5의 점수(좋을수록5)로 평가해주세요. NO 음원 NO 음원 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 2. Naturalness test A와 B중에 더 자연스러운 소리를 골라주세요 NO A B 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 3. Similary test X를 ..
-
Cycle GAN VC 3 and Mask Cycle GAN VC논문 리뷰 2022. 5. 11. 21:02
CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion (arxiv.org) CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have..
-
TRPO: Trust Region Policy Optimization논문 리뷰 2022. 5. 11. 12:45
Trust Region Policy Optimization (mlr.press) Trust Region Policy Optimization In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified scheme, we develop a pr... proceedings.mlr.press
-
MUTE: Multitask Training with Text Data for End-to-End Speech Recognition논문 리뷰 2022. 5. 11. 12:43
[2010.14318] Multitask Training with Text Data for End-to-End Speech Recognition (arxiv.org) Multitask Training with Text Data for End-to-End Speech Recognition We propose a multitask training method for attention-based end-to-end speech recognition models. We regularize the decoder in a listen, attend, and spell model by multitask training it on both audio-text and text-only data. Trained on th..