Popular voice assistants like Siri and Amazon Alexa have introduced automatic speech recognition to the wider public. Though decades in the making, ASR models struggle with consistency and reliability, especially in noisy environments.
According to techxplore.com, Chinese researchers developed a framework that effectively improves the performance of ASR for the chaos of everyday acoustic environments.
Researchers from the Hong Kong University of Science and Technology and WeBank proposed a new framework—phonetic-semantic pre-training (PSP) and demonstrated the robustness of their new model against synthetic highly noisy speech datasets.
Their study was published in CAAI Artificial Intelligence Research on August 28.
“Robustness is a long-standing challenge for ASR,” said Xueyang Wu from the Hong Kong University of Science and Technology Department of Computer Science and Engineering. “We want to increase the robustness of the Chinese ASR system with a low cost.”
ASR uses machine-learning and other artificial intelligence techniques to automatically translate speech into text for uses like voice-activated systems and transcription software.
All rights reserved. This material, and other digital content on this website, may not be reproduced, published, broadcast, rewritten or redistributed in whole or in part without prior express written permission from Aproko Vibes.