A noise robust voice conversion algorithm based on joint dictionary optimization is proposed to effectively convert noisy source speech into the target one. In composition of the joint dictionary, speech dictionary is optimized using backward elimination algorithm. At the same time, a noise dictionary is introduced to match the noisy speech. The experimental results show that the backward elimination algorithm can reduce the number of dictionary frames and reduce the amount of calculation while ensuring the conversion effect. In low SNR and multiple noise environments, the algorithm has better conversion effect than both the traditional NMF algorithm and the NMF conversion algorithm plus spectral subtraction de-noising. The proposed algorithm improves the robustness of voice conversion system.
Under the condition of limited target speaker's corpus, this paper proposed an algorithm for voice conversion using unified tensor dictionary with limited corpus. Firstly, parallel speech of N speakers was selected randomly from the speech corpus to build the base of tensor dictionary. And then, after the operation of multi-series dynamic time warping for those chosen speech, N two-dimension basic dictionaries can be generated which constituted the unified tensor dictionary. During the conversion stage, the two dictionaries of source and target speaker were established by linear combination of the N basic dictionaries using the two speakers' speech. The experimental results showed that when the number of the basic speaker was 14, our algorithm can obtain the compared performance of the traditional NMF- based method with few target speaker corpus, which greatly facilitate the application of voice conversion system.
A voice conversion algorithm,which makes use of the information between continuous frames of speech by compressed sensing,is proposed in this paper.According to the sparsity property of the concatenated vector of several continuous Linear Spectrum Pairs(LSP)in the discrete cosine transformation domain,this paper utilizes compressed sensing to extract the compressed vector from the concatenated LSPs and uses it as the feature vector to train the conversion function.The results of evaluations demonstrate that the performance of this approach can averagely improve 3.21%with the conventional algorithm based on weighted frequency warping when choosing the appropriate numbers of speech frame.The experimental results also illustrate that the performance of voice conversion system can be improved by taking full advantage of the inter-frame information,because those information can make the converted speech remain the more stable acoustic properties which is inherent in inter-frames.