Automatic spoken language identification (LID) using deep learning.

Overview

iLID

Automatic spoken language identification (LID) using deep learning.

Motivation

We wanted to classify the spoken language within audio files, a process that usually serves as the first step for NLP or speech transcription.

We used two deep learning approaches using the Tensorflow and Caffe frameworks for different model configuration.

Repo Structure

  • /data
    • Scripts to download training data from Voxforge and Youtube. For usage details see below.
  • /Evaluation
    • Prediction scripts for single audio files or list of files using Caffe
  • /Preprocessing
    • Includes all scripts to convert a WAV audio file into spectrogram and mel-filter spectrogram images using a Spark Pipeline.
    • All scripts to create/extract the audio features
    • To convert a directory of WAV audio files using the Spark pipeline run: ./run.sh --inputPath {input_path} --outputPath {output_path} | tee sparkline.log -
  • /models
  • /tensorflow
    • All the code for setting up and training various models with Tensorflow.
    • Includes training and prediction script. See train.py and predict.py.
    • Configure your learning parameters in config.yaml.
    • Add or change network under /tensorflow/networks/instances/.
  • /tools
    • Some handy scripts to clean filenames, normalize audio files and other stuff.
  • /webserver
    • A web demo to upload audio files for prediction.
    • See the included README

Requirements

  • Caffe
  • TensorFlow
  • Spark
  • Python 2.7
  • OpenCV 2.4+
  • youtube_dl
// Install additional Python requirements
pip install -r requirements.txt
pip install youtube_dl

Datasets

Downloads training data / audio samples from various sources.

Voxforge

/data/voxforge/download-data.sh
/data/voxforge/extract_tgz.sh {path_to_german.tgz} german

Youtube

  • Downloads various news channels from Youtube.
  • Configure channels/sources in youtube/sources.yml
python /data/youtube/download.py

Models

We trained models for 2/4 languages (English, German, French, Spanish).

Best Performing Models

The top scoring networks were trained with 15.000 images per languages, a batch size of 64, and a learning rate of 0.001 that was decayed to 0.0001 after 7.000 iterations.

Shallow Network EN/DE

Shallow Network EN/DE/FR/ES

Training

// Caffe:
/models/{model_name}/training.sh
// Tensorflow:
python /tensorflow/train.py

Labels

0 English, 
1 German, 
2 French, 
3 Spanish

Training Data

For training we used both the public Voxforge dataset and downloaded news reel videos from Youtube. Check out the /data directory for download scripts.

You might also like...
MLKit is a simple machine learning framework written in Swift.
MLKit is a simple machine learning framework written in Swift.

MLKit (a.k.a Machine Learning Kit) 🤖 MLKit is a simple machine learning framework written in Swift. Currently MLKit features machine learning algorit

Artificial intelligence/machine learning data structures and Swift algorithms for future iOS development. bayes theorem, neural networks, and more AI.

Swift Brain The first neural network / machine learning library written in Swift. This is a project for AI algorithms in Swift for iOS and OS X develo

Generate sniglets with machine learning
Generate sniglets with machine learning

Give Me A Sniglet! Give Me a Sniglet is a random word-like generator with an on-device machine learning model that validates whether the word is likel

This repo contains beginner examples to advanced in swift. Aim to create this for learning native iOS development.

iOS-learning-with-swift-22 This repo contains beginner examples to advanced in swift. Aim to create this for learning native iOS development. Oh, you

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models

Mobile-ios-ml - SBB Mobile Machine Learning for iOS devices
Mobile-ios-ml - SBB Mobile Machine Learning for iOS devices

ESTA library: Machine Learning for iOS This framework simplifies the integration

Scutil - The swift version of my ASOC scutilUtil application. An interesting learning excercise

scutil this is the swift version of my ASOC "scutilUtil" application. An interes

Conjugar is an app for learning Spanish verb conjugations.
Conjugar is an app for learning Spanish verb conjugations.

Introduction Conjugar is an iPhone™ app for learning Spanish verb conjugations. Conjugar conjugates most Spanish verbs, regular and irregular, in all

CloneCorp - Data corpus for the evaluation of cross-language clone detection algorithms

CloneCorp - Data corpus for the evaluation of cross-language clone detection algorithms

Comments
  • ValueError: need more than 0 values to unpack

    ValueError: need more than 0 values to unpack

    When i ran the train.py in the tensorflow, i got this issues. What this mean? Do i need some other things to be done, when I start to run the train program?

    opened by AispeakDK 1
  • umlaut

    umlaut

    Hi Thomas, one question: did you get any insight regarding to your question here? https://github.com/mozilla/TTS/issues/232#issuecomment-513465470 Do you mind if you share with us what solution you found for espeak-ng for the german umlauts? (if you found it)

    Thank you!

    opened by gsilos 0
  • IndexError in evaluation/predict.py (classifier.py)

    IndexError in evaluation/predict.py (classifier.py)

    Hello twerkmeister, sorry if is trivial, is my first experience with caffe, I succede to preprocessing the audios, trained it with Berlin_net model, but at the evaluation step I get:

    Traceback (most recent call last): File "predict.py", line 47, in predict(args.input, args.proto, args.model, args.output) File "predict.py", line 20, in predict raw_scale=255 # convert 0..255 values into range 0..1 File "/home/sylvain/caffe/python/caffe/classifier.py", line 29, in init in_ = self.inputs[0] IndexError: list index out of range

    I put the complete log in attachment... If someone can help me. Thanks output.log

    opened by sgagnon-tootelo 1
  • Are there other advices to preprocess audio files?

    Are there other advices to preprocess audio files?

    Hi, twerkmeister. I'm following your excellent work 'iLID' recently. The approach shows good performance when tested on the dataset consisting of lots of clean audios. However, when tested on the audios recorded in natural scenes, it doesn't perform as well as before. In your project, I've seen the loudness normalization operation. Are there other advices to preprocess the audio to make it more clean?

    many thanks.

    opened by attitudechunfeng 3
Owner
Thomas Werkmeister
Thomas Werkmeister
On-device wake word detection powered by deep learning.

Porcupine Made in Vancouver, Canada by Picovoice Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening

Picovoice 2.8k Dec 30, 2022
A Swift deep learning library with Accelerate and Metal support.

Serrano Aiming to offering popular and cutting edge techs in deep learning area on iOS devices, Serrano is developed as a tool for developers & resear

pcpLiu 51 Nov 17, 2022
DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

DL4S Team 2 Dec 5, 2021
A simple deep learning library for estimating a set of tags and extracting semantic feature vectors from given illustrations.

Illustration2Vec illustration2vec (i2v) is a simple library for estimating a set of tags and extracting semantic feature vectors from given illustrati

Masaki Saito 661 Dec 12, 2022
This is an open-source project for the aesthetic evaluation of images based on the deep learning-caffe framework, which we completed in the Victory team of Besti.

This is an open-source project for the aesthetic evaluation of images based on the deep learning-caffe framework, which we completed in the Victory team of Besti.

The Victory Group of Besti 102 Dec 15, 2022
Accelerated tensor operations and dynamic neural networks based on reverse mode automatic differentiation for every device that can run Swift - from watchOS to Linux

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

Palle 87 Dec 29, 2022
Running Swift automatic differentiation on iOS

Differentiation Demo This is an example of Swift's automatic differentiation running on iOS. It is a modified version of the game from ARHeadsetKit tu

Philip Turner 7 Apr 27, 2022
Shallow and Deep Convolutional Networks for Saliency Prediction

Shallow and Deep Convolutional Networks for Saliency Prediction Paper accepted at 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVP

Image Processing Group - BarcelonaTECH - UPC 183 Jan 5, 2023
The source code of 'Visual Attribute Transfer through Deep Image Analogy'.

Deep Image Analogy The major contributors of this repository include Jing Liao, Yuan Yao, Lu Yuan, Gang Hua and Sing Bing Kang at Microsoft Research.

MSRA CVer 1.4k Jan 6, 2023
The Swift machine learning library.

Swift AI is a high-performance deep learning library written entirely in Swift. We currently offer support for all Apple platforms, with Linux support

Swift AI 5.9k Jan 2, 2023