Automatic spoken language identification (LID) using deep learning.

Thomas Werkmeister

Last update: Apr 3, 2022

Related tags

Machine Learning iLID

Overview

iLID

Automatic spoken language identification (LID) using deep learning.

Motivation

We wanted to classify the spoken language within audio files, a process that usually serves as the first step for NLP or speech transcription.

We used two deep learning approaches using the Tensorflow and Caffe frameworks for different model configuration.

Repo Structure

/data
- Scripts to download training data from Voxforge and Youtube. For usage details see below.
/Evaluation
- Prediction scripts for single audio files or list of files using Caffe
/Preprocessing
- Includes all scripts to convert a WAV audio file into spectrogram and mel-filter spectrogram images using a Spark Pipeline.
- All scripts to create/extract the audio features
- To convert a directory of WAV audio files using the Spark pipeline run: ./run.sh --inputPath {input_path} --outputPath {output_path} | tee sparkline.log -
/models
- All our Caffe models: Berlin_net, Topcoder, VGG_M
- Berlin_net: 3Conv + Batch Normalisation, 2 FullyConnected Layer (Shallow Architecture)
- Topcoder_net: (Deep Architecture) inspired by Topcoder's spoken language identification challenge
- Finetuning of VGG_M
/tensorflow
- All the code for setting up and training various models with Tensorflow.
- Includes training and prediction script. See train.py and predict.py.
- Configure your learning parameters in config.yaml.
- Add or change network under /tensorflow/networks/instances/.
/tools
- Some handy scripts to clean filenames, normalize audio files and other stuff.
/webserver
- A web demo to upload audio files for prediction.
- See the included README

Requirements

Caffe
TensorFlow
Spark
Python 2.7
OpenCV 2.4+
youtube_dl

// Install additional Python requirements
pip install -r requirements.txt
pip install youtube_dl

Datasets

Downloads training data / audio samples from various sources.

Voxforge

Downloads the audio samples from www.voxforge.org for some languages

/data/voxforge/download-data.sh
/data/voxforge/extract_tgz.sh {path_to_german.tgz} german

Youtube

Downloads various news channels from Youtube.
Configure channels/sources in youtube/sources.yml

python /data/youtube/download.py

Models

We trained models for 2/4 languages (English, German, French, Spanish).

Best Performing Models

The top scoring networks were trained with 15.000 images per languages, a batch size of 64, and a learning rate of 0.001 that was decayed to 0.0001 after 7.000 iterations.

Shallow Network EN/DE

Shallow Network EN/DE/FR/ES

Training

// Caffe:
/models/{model_name}/training.sh

// Tensorflow:
python /tensorflow/train.py

Labels

0 English, 
1 German, 
2 French, 
3 Spanish

Training Data

For training we used both the public Voxforge dataset and downloaded news reel videos from Youtube. Check out the /data directory for download scripts.

You might also like...

MLKit is a simple machine learning framework written in Swift.

MLKit (a.k.a Machine Learning Kit) 🤖 MLKit is a simple machine learning framework written in Swift. Currently MLKit features machine learning algorit

152 Nov 17, 2022

Artificial intelligence/machine learning data structures and Swift algorithms for future iOS development. bayes theorem, neural networks, and more AI.

Swift Brain The first neural network / machine learning library written in Swift. This is a project for AI algorithms in Swift for iOS and OS X develo

331 Oct 14, 2022

Generate sniglets with machine learning

Give Me A Sniglet! Give Me a Sniglet is a random word-like generator with an on-device machine learning model that validates whether the word is likel

4 Mar 3, 2022

This repo contains beginner examples to advanced in swift. Aim to create this for learning native iOS development.

iOS-learning-with-swift-22 This repo contains beginner examples to advanced in swift. Aim to create this for learning native iOS development. Oh, you

0 Jan 9, 2022

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models

77.1k Dec 31, 2022

Mobile-ios-ml - SBB Mobile Machine Learning for iOS devices

ESTA library: Machine Learning for iOS This framework simplifies the integration

9 Jul 16, 2022

Scutil - The swift version of my ASOC scutilUtil application. An interesting learning excercise

scutil this is the swift version of my ASOC "scutilUtil" application. An interes

1 Feb 15, 2022

Conjugar is an app for learning Spanish verb conjugations.

Introduction Conjugar is an iPhone™ app for learning Spanish verb conjugations. Conjugar conjugates most Spanish verbs, regular and irregular, in all

34 Oct 5, 2022

CloneCorp - Data corpus for the evaluation of cross-language clone detection algorithms

1 Jan 31, 2022

Comments

ValueError: need more than 0 values to unpack

When i ran the train.py in the tensorflow, i got this issues. What this mean? Do i need some other things to be done, when I start to run the train program?

opened by AispeakDK 1
umlaut

Hi Thomas, one question: did you get any insight regarding to your question here? https://github.com/mozilla/TTS/issues/232#issuecomment-513465470 Do you mind if you share with us what solution you found for espeak-ng for the german umlauts? (if you found it)

Thank you!

opened by gsilos 0
IndexError in evaluation/predict.py (classifier.py)

Hello twerkmeister, sorry if is trivial, is my first experience with caffe, I succede to preprocessing the audios, trained it with Berlin_net model, but at the evaluation step I get:

Traceback (most recent call last): File "predict.py", line 47, in predict(args.input, args.proto, args.model, args.output) File "predict.py", line 20, in predict raw_scale=255 # convert 0..255 values into range 0..1 File "/home/sylvain/caffe/python/caffe/classifier.py", line 29, in init in_ = self.inputs[0] IndexError: list index out of range

I put the complete log in attachment... If someone can help me. Thanks output.log

opened by sgagnon-tootelo 1
Are there other advices to preprocess audio files?

Hi, twerkmeister. I'm following your excellent work 'iLID' recently. The approach shows good performance when tested on the dataset consisting of lots of clean audios. However, when tested on the audios recorded in natural scenes, it doesn't perform as well as before. In your project, I've seen the loudness normalization operation. Are there other advices to preprocess the audio to make it more clean?

many thanks.

opened by attitudechunfeng 3

Owner

Thomas Werkmeister

GitHub

On-device wake word detection powered by deep learning.

Porcupine Made in Vancouver, Canada by Picovoice Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening

2.8k Dec 30, 2022

A Swift deep learning library with Accelerate and Metal support.

Serrano Aiming to offering popular and cutting edge techs in deep learning area on iOS devices, Serrano is developed as a tool for developers & resear

51 Nov 17, 2022

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

2 Dec 5, 2021

A simple deep learning library for estimating a set of tags and extracting semantic feature vectors from given illustrations.

Illustration2Vec illustration2vec (i2v) is a simple library for estimating a set of tags and extracting semantic feature vectors from given illustrati

661 Dec 12, 2022

This is an open-source project for the aesthetic evaluation of images based on the deep learning-caffe framework, which we completed in the Victory team of Besti.

102 Dec 15, 2022

Accelerated tensor operations and dynamic neural networks based on reverse mode automatic differentiation for every device that can run Swift - from watchOS to Linux

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

87 Dec 29, 2022

Automatic spoken language identification (LID) using deep learning.

Related tags

Overview

iLID

Motivation

Repo Structure

Requirements

Datasets

Voxforge

Youtube

Models

Best Performing Models

Training

Labels

Training Data

You might also like...

MLKit is a simple machine learning framework written in Swift.

Artificial intelligence/machine learning data structures and Swift algorithms for future iOS development. bayes theorem, neural networks, and more AI.

Generate sniglets with machine learning

This repo contains beginner examples to advanced in swift. Aim to create this for learning native iOS development.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Mobile-ios-ml - SBB Mobile Machine Learning for iOS devices

Scutil - The swift version of my ASOC scutilUtil application. An interesting learning excercise

Conjugar is an app for learning Spanish verb conjugations.

CloneCorp - Data corpus for the evaluation of cross-language clone detection algorithms

Comments

ValueError: need more than 0 values to unpack

umlaut

IndexError in evaluation/predict.py (classifier.py)

Are there other advices to preprocess audio files?

Owner

Thomas Werkmeister

On-device wake word detection powered by deep learning.

A Swift deep learning library with Accelerate and Metal support.

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.

A simple deep learning library for estimating a set of tags and extracting semantic feature vectors from given illustrations.

This is an open-source project for the aesthetic evaluation of images based on the deep learning-caffe framework, which we completed in the Victory team of Besti.

Accelerated tensor operations and dynamic neural networks based on reverse mode automatic differentiation for every device that can run Swift - from watchOS to Linux

Running Swift automatic differentiation on iOS

Shallow and Deep Convolutional Networks for Saliency Prediction

The source code of 'Visual Attribute Transfer through Deep Image Analogy'.

The Swift machine learning library.