Exploring Self-supervised Embeddings and Synthetic Data Augmentation for Robust Audio Deepfake Detection
- Martín-Doñas, Juan M.
- Álvarez, Aitor
- Rosello, Eros
- Gomez, Angel M.
- Peinado, Antonio M.
ISSN: 2958-1796
Year of publication: 2024
Pages: 2085-2089
Congress: Interspeech 2024 1-5 September 2024, Kos, Greece
Type: Conference paper
Abstract
This work explores the performance of large speech selfsupervised models as robust audio deepfake detectors. Despite the current trend of fine-tuning the upstream network, in this paper, we revisit the use of pre-trained models as feature extractors to adapt specialized downstream audio deepfake classifiers. The goal is to keep the general knowledge of the audio foundation model to extract discriminative features to feed up a simplified deepfake classifier. In addition, the generalization capabilities of the system are improved by augmenting the training corpora using additional synthetic data from different vocoder algorithms. This strategy is also complemented by various data augmentations covering challenging acoustic conditions. Our proposal is evaluated under different benchmark datasets for audio deepfake and anti-spoofing tasks, showing state-of-the-art performance. Furthermore, we analyze the relevant parts of the downstream classifier to achieve a robust system.