July 24th 2019

Our team continues to explore new technologies and innovative opportunities in development. This time we want to share our interesting experience with the creation of the GenreDetector app. Essentially, we trained the Model to recognize music genres by uploading samples via the Create ML application.

The Ultimate Guide to Cloud Gaming: D…
best projectors for home
Key Tips for Renting an Electric Vehi…
Vivo V30 & V30 Pro Sale in India: 10%…

At first, 10 genres of music were used as a basis and one dissimilar genre was used for background sounds recognition (noise, crackling, humming, silence). The sound model was educated to recognize specific sounds (for example, the sound of a guitar, applause, sneezing, drums), but not the entire musical composition. Therefore, in each folder there are several sound files specifically related to each genre. A larger number of files is better because it means that the quality of the trained model will be high.

Second, we dragged the sample folder into Create ML and then clicked on the play button, so the model would start learning. After 5-10 minutes, our model was trained and it was already possible to test it, input a music file, and have the model start listening. Sound recognition worked well in online mode. The name appears similar to the name you called the folder with the current genre.

Here is an important note: if the results are not satisfactory, it’s better to select more suitable samples.

As a next step, we implemented the model, dragged the model into the project itself, and created a class with the same name as the model.

let instrumentClassifier = MySoundClassifier()
 let model: MLModel = instrumentClassifier.model

Then, we connected the SoundAnalysis framework for sound analysis and recognition.
To work on a task like this, you also need the AVKit Framework for working with media. We used AVAudioEngine, and with the help of this tool, we implemented the recognition of sound from the device’s microphone online.

private func startCapturingAudio() {
        do {
            let request = try SNClassifySoundRequest(mlModel: soundClassifier.model)
            try analyzer.add(request, withObserver: resultsObserver)
        } catch {
            print("Unable to prepare request: \(error.localizedDescription)")
            return
        }
        
        engine.inputNode.installTap(onBus: 0,
                                    bufferSize: 8192, // 8k buffer
        format: inputFormat) { buffer, time in
            let channelDataValue = buffer.floatChannelData!.pointee
            let channelDataValueArray = stride(from: 0,
                                               to: Int(buffer.frameLength),
                                               by: buffer.stride).map{ channelDataValue[$0] }
            
            let temp = Float(buffer.frameLength)
            let rms = sqrt(channelDataValueArray.map{ $0 * $0 }.reduce(0, +) / temp)
            let avgPower = 20 * log10(rms)
            if avgPower > -40.0 {
                self.analyzer.analyze(buffer, atAudioFramePosition: time.sampleTime)
            }
        }
    }

Additionally, we translated RMS to decibels. The decibel values in iOS have a range from -160 dB, close to silent, to 0 dB, the maximum power. We analyzed a sound that exceeded -40 decibels to weed out extraneous sounds.

The task was to recognize the finished audio file, and thus all of the abovementioned steps were enough to immediately begin the analysis.

import SoundAnalysis

     if FileManager.default.fileExists(atPath: fileUrl().path) {
            let audioFileURL = fileUrl()
            let audioFileAnalyzer = try? SNAudioFileAnalyzer(url: audioFileURL)
            let observer = ResultsObserver()
            try? audioFileAnalyzer?.add(SNClassifySoundRequest(mlModel: model), withObserver: observer)
            
            audioFileAnalyzer?.analyze()
        }

Finally, an observer was required for processing results. As soon as the analysis was completed, we calculated the percentage of the genre in the original sound recording.

For the finished sound file:

class ResultsObserver : NSObject, SNResultsObserving {
    var genres = [String: Double]()
    
    func request(_ request: SNRequest, didProduce result: SNResult) {
        
        // Get the top classification.
        guard let result = result as? SNClassificationResult,
            let classification = result.classifications.first else { return }
        
        // Determine the time of this result.
        let formattedTime = String(format: "%.2f", result.timeRange.start.seconds)
        print("Analysis result for audio at time: \(formattedTime)")
        
        let confidence = classification.confidence * 100.0
        let percent = String(format: "%.2f%%", confidence)
        
        // Print the result as Instrument: percentage confidence.
        print("\(classification.identifier): \(percent) confidence.\n")
        genres.updateValue((genres[classification.identifier] ?? 0) + 1, forKey: classification.identifier)
    }
    
    func request(_ request: SNRequest, didFailWithError error: Error) {
        print("The the analysis failed: \(error.localizedDescription)")
    }
    
    func requestDidComplete(_ request: SNRequest) {
        var count = 0.0
        genresData = [String: Double]()
        
        for genre in genres {
            count += genre.value
        }
    
        for genre in genres {
            
            let genrePercent = (Double(genre.value) / Double(count) * 100).rounded(toPlaces: 1)
            genresData.updateValue(genrePercent, forKey: genre.key)
        }
    }
}

For online recognition:

class ReadOnlineResults: NSObject, SNResultsObserving {
    var delegate: ReadOnlineResultsDelegate?
    func request(_ request: SNRequest, didProduce result: SNResult) {
        guard let result = result as? SNClassificationResult,
            let classification = result.classifications.first else { return }
        
        let confidence = classification.confidence * 100.0
        if confidence > 70 {
            delegate?.determinedSoundType(identifier: classification.identifier, confidence: confidence)
        }
    }
}

protocol ReadOnlineResultsDelegate {
    func determinedSoundType(identifier: String, confidence: Double)
}

Then, we updated Lable with genre name:

extension ProcessRecordViewController: ReadOnlineResultsDelegate {
    func determinedSoundType(identifier: String, confidence: Double) {
        changeSoundTypeLabel(soundType: identifier)
    }
}

This post first appeared on SoftTeco - Enterprise Mobile App Development Company, please read the originial post: here