The Art of Perfecting Singing Voices with AI

Evergreen Technologies
3 min readJul 30, 2023

Have you ever listened to a song and wished the singer had better vocal skills or intonation? I recently came across an interesting new AI technique that can potentially “beautify” and improve the singing quality of amateur vocalists.

Researchers from Zhejiang University in China have developed a new AI system called Neural Singing Voice Beautifier (NSVB) that can process an amateur singing recording and improve its intonation and overall aesthetic quality to sound more professional, while still retaining the original vocal timbre. This could have useful applications in music production and entertainment.

At the core of NSVB is a conditional variational autoencoder (CVAE) that generates mel-spectrograms from input conditions and latent variables. The CVAE maximizes the evidence lower bound on the log-likelihood of the mel-spectrogram during training:

log pθ(x|c) ≥ ELBO(φ, θ) ≡ Ez∼qφ(z|x,c)[log pθ(x|z, c) − log qφ(z|x, c)/p(z)]

where x is the input mel, c are the singing conditions, z is the latent tone variable, and φ, θ are model parameters.

NSVB splits singing beautification into two steps:

Pitch Correction using Shape-Aware Dynamic Time Warping (SADTW)

--

--

Evergreen Technologies

Decades of experience in collaborative Blog writing, Technical Advisory and Online Training. Read more about me @ https://evergreenllc2020.github.io/about.html