Design, Improvement and Investigation of Neural Frequency Compression Method in Hearing Aid for Marathi Speech

Objectives: To design and evaluate neural frequency compression method to improve speech intelligibility for Marathi language hearing aid users. Methods/Statistical Analysis: In Recurrent Neural Network Frequency Compression algorithm (RNN-FC), classification and processing are two important stages. After segmentationof input speech into discrete frames. Features are extracted in terms of signal to noise ratio, Pitch, formant frequency and gain frequency spectral coefficient. Extracted features will classify into two segments for processing and improvement in SNR level. Based on classification sample data is divided in two categories; wanted and unwanted samples for processing. Findings: Extracted feature vectors, Training date rate are key performance parameter of RNN-FC method. Testing of RNN-FC was performed on Marathi spoken HA user. In regional Marathi language 14-15 consonants are located over frequency band 7-13.5 kHz. Proposed algorithm shows improvement of classifier with Min 94Max 96% sensitivity, Specificity and Accuracy. Results reports improved recognition rate of Marathi vowel, Consonants and short words. Unwanted vowel, consonants process reduced from 5.67% to 3.56%. The inability to access the high-frequency speech contents in terms of speech and consonant recognition ability enhanced for Marathi HA users. Application/Improvements: Frequency Compression method is extensively adopted by researchers in which high frequency speech is compressed by certain compression factor which causes distortion at lower speech frequencies. Distortion occurred during processing will results in loss of information in Lower band of speech. This challenge is overcomes by using frequency compression approach with neural network classifier (RNN-FC).


Introduction
Speech is a method of communication between people. From last 6-7 decades speech Enhancement with Processing and recognition is an attractive region for research. Speech is at the heart of human activity, as it helps humanity to collaborate more commonly and effectively.Hearing loss strongly affects society, which causes diseases at the initial detection level that affect the middle ear, or inner ear, age or slackness.A survey shows that India suffered from hearing loss in 1.06% of the total population.In 2014, the National Sample Survey (NSS) shows that the second major disability in India is hearing impairment. High -frequency hearing loss in urban areas is about 9% and 10.50% of total disability in rural areas.The degree of hearing impairment was determined depending on the level of a person's inability to hear. According to the NSS report in Maharashtra, up to 2.291% of total disability is the number of people with high frequency hearing impairment. In rural areas, the percentage was higher (2.310%) than in urban areas (2.236%).The survey Implies; about 0.4% infants suffered from hearing loss. Hearing is the primary sense of action through which we first learn speech and mother tongue.It is important to learn speech, language and auditory processing skills to hear clearly after 6 to 8 months of birth.

Hearing Aid and Current Technology
Hearing aid is widely used to recover the communication capability in hearing disabled persons. Hearing disabled persons with high-frequency hearing loss is not significantly benefitted from conventional HA. Limited bandwidth of the hearing device will hamper HA Accessibility of listeners to high-frequency signals which lead to difficulty for the identification of Consonants with a spectral crest higher than 5-7 kHz 1 .
Commonly a frequency Compression technique is used, which is categorized in Linear Frequency Compression (LFC) and non linear frequency compression (NLFC). LFC reduces all components of high frequency by the same constant (CF) 2 . New frequencies can be derived by multiplying a constant factor from the original frequencies (Compression factor).In this method high frequency speech is lowered without preserving low frequency speech contents.Vowels positioned after consonants are not preserved in LFC processing. Mean in NLFC as originally proposed 3 . High-frequency speech the components of the original speech are compressed disproportionately more than the low-frequency components. NLFC method was progressed and Modified in last few decades, now being widely used in numerous commercial Hearing aid Devices 4,5 . In case of multi frequency band input speech spectrum is sequentially segmented into a number of target bands where spectral samples in each band are compressed by a constant compression factor 6 , which results affect increased intra speech spectral masking 7 . Wen-Hsuan implemented NLFC approach for Mandarin speech 8 in which Consonants relates to high frequency information is important for recognition of speech. Mandarin language has seven consonants located near to range of 10-16 kHz. Xianbo Xiao Implemented two FC and FT frequency-lowering methods and conducted weekly hearing tests to track the benefits of such methods, finally speech intelligibility improvement was found for Chinese language in their experiment 9 .
Compression based auditory critical bandwidth using spectral segment mapping shows better results with improvement up to 10-20% in the recognition score. Extended-Bandwidth (EB) using NLFC approach provides better results for recognition of Mandarin words 10 . They investigate the effects of NLFC and EB -NLFC on the recognition of mandarin words for high frequency hearing disabled 11 . Spectral Subtraction is a method where speech spectrum is divided into different continuous frequency bands with uniform spacing and spectral over -subtraction in each band, which is useful to improve speech recognition 12 .
Tobias Goehring, Federico Bolner Proposed neural based speech Enhancement in which an estimation of frequency channels contain more perceptually information (higher signal-to-noise ratio) 13 .Speech enhancement using NN shows better results, While NNSE was tested for noise-specific purpose.
In proposed methodology input speech is segmented into number of frames. From each frames feature vectorsare extracted using NN. Signal to noise power of each frame is calculated by using spectral subtraction method, which will gives energy related each segmented frame. Higher Signal to noise power frame will be considered for compression, while lower SNR power will neglected for processing. Corresponding frame feature vector relates information to critical band FFT filter with Neural network training plays vital role in estimation of accuracy, Specificity, Sensitivity and False acceptance, Rejection rate 14 . The results from the above studies are promising but limited too few extend.Existing Methods was tested for English and mandarin speaking HA users. Our Proposedhybrid frequency Compression with Feed Forward Neural Network basedtechnique shows improvement in sensitivity, Specificity etc.

Recurrent Neural Network and Frequency Compression Algorithm (RNN-FC)
The Recurrent neural network based frequency compressor (RNN-FC) was designed by using MATLAB Block set as shown in Figure 1.
Receiving speech from microphone is fed to Preemphasis, which increases dBSPL of high frequency bands and decrease the amplitudes of lower bands.
(1) After pre-emphases input speech is segmented into frames with the range from 20 ms to 40 ms with 50% (+/-10%) overlap between consecutive frames.If the frame length is short, then resolution of narrow band components is sacrificed which affects frequency resolution and if it is longer, signal properties changes that affects time resolution Therefore standard 25 m-sec frame length selected 15 . Feature extraction is the most important step for speech intelligibility. Consider the speech signal sampled at 16 kHz and quantized in 16 bits which utilized for the feature extraction. Feature extraction was performed on each segment of the noisy signal, and the output was fed to the RNN. It takes the high dimensional characteristic information into the low dimensional characterized by the method of mapping or transformation. The transformation from the input signal space to the feature space is domain specific.
Linear Predictive Analysis method is used to extract features from speech;it is used to compress the signal without any loss of information.The prediction of current sample as a linear combination of past 'p' samples form the basis of linear prediction analysis where 'p' is the order of prediction. The predicted sample s(n) can be represented in eq. (2).Hamming window function used for silence detection & pitch detection, Window function expressed in eq. (3).  The normalized error V(n) can be represented as The LPC gain Coefficient is given by E(n) is the minimum mean squared error prediction.

Key Performance Parameter
Following parameters will decide performance of neural compression technique: • Input/Output Function-Keeps processed speech at certain dBSPL with help of certain added Gain (G). • Input/Gain Function-Gain is variable parameter which is decided by Input dBSPL. • Frequency Response of processed speech with keeping unchanged shape. • Role of Frequency and Gain needs to maximize speech intelligibility. • Loudness Limiter to avoid uncomforted situation for HA user. The Input speech Frequency range is divided into six bands according to following octaves shown in Table1.

Design of Frequency Compressor with NN Approach
The key role of the RNN is to transform the inputs into meaningful outputs. Figure 3 shows Recurrent Neural network architecture; it consists of an input layer, 5 hidden layers and 1 linear featured output corresponding to input. Back propagation was used for training the NN in full-batch mode over 500 epochs with a variable learning rate of 0.01-0.03 and weight 1.  Figure 3. General structure of multilayer neural network.
In case of back propagation network, we are for changing the weights to classify the input patterns correctly. The selected hidden layer and number of neurons are utilized in this neural network. In back propagation Neural network first, we check error value (E) lies in range of threshold or not.
If E > Threshold Value (Th) …………..update all Weights, Else repeat learning Procedure, These six band octaves incorporate with frequency compressor, feature vector obtained from trained neural network will gives solution for selecting related octave for further processing. In this methodology training of neural network plays key role. Marathi Vowel and Consonant data Set is used for different training and testing Neural Network. Training and testing conditions are given from 50% -90 % in the incremental stage of 10% in training, decrement of 10% in testing up to maximum level of 80:20. Figure 4(a) Shows spectrogram for Marathi short word "aaj" on time scales of 1.2 sec. It was segment into a timefrequency frame unit of 0.64 sec. with 0-5000 Hz Speech frequency. Input Word was processed by frequency compression technique, the spectrogram of frequency compressed word shown in Figure 4(b) frequency above 4000 Hz compressed where overlapping of spectrum occurs over low speech frequency. Figure 4(c) shows spectrogram of Marathi word compressed by neural compression technique. Using these method frequencies above 4000 Hz is compressed with preserving lower speech frequency contents.   RNN-FC algorithm were developed in MATLAB (The Math Works) and tested on regional Marathi spoken Hearing aid users.

Performance Parameters of System
Marathi consonant "cha" Spectrum spoken by Female Speaker shown in Figure 5(a) which has pitch frequency range from 2KHz to12.5 kHz. Figure 5(b) shows segmented speech by using RNN+FC method and Figure  5(c) shows spectrum difference between original and processed spectrum using proposed method.Sensitivity and specificity are statistical measures of the performance of RNN-FC system, classification of processed and unprocessed vowel/consonant is related to sensitivity, Specificity. Sensitivity is an ability of detection which measures the proportion of positives processed alphabets, which are correctly identified for the compression purpose. Using this methodology sensitivity changes from 94.23% to 96.552% (Unwanted vowel, consonants processed from 5.67% to 3.56% of Trained Data). Mean while Specificity measures the proportion of negatives that are correctly identified, specificity changes from 5.26% to 31.25% (amount of Vowels and consonants left from processing) average accuracy for proposed methodology in the range of 68-76%. The Matthews's Correlation Coefficient (MCC) is used learning as a measure of the quality of classifications in terms of vowel and consonants. Training NN from lowest to highest training data gives MCC from 0.21-0.37. The objective of this methodology was to design and test the usefulness of Neural network based multi-band frequency compression in improving speech Recognition by Hearing aid listeners, HA listeners was selected with high frequency hearing loss ranging between 2 to 4 KHz. Developed Matlab based algorithm was tested on Intel core-2 processor laptop with help of earphone. A group of six listeners were involved in this experimentation; recognition rate of proposed method was compared with frequency compression, frequency transposition as shown in Table 2. In Marathi language 2-3 group of consonants has same pronunciation; lip movement which is difficult to understood by HA users. This consonant group recognition is tested by using RNN-FC method. Each vowel from 15 vowels (Total 45 vowels using FC, FT, NN-FC) are play backed randomly to measure recognition score. Same method was carried for consonant and other words.

Conclusions and Outcomes
A neural network based frequency approach shows improvement in terms of specific characteristics of classification for processed and unprocessed alphabets. Training given to neural network is complex and time consuming task but it shows remarkable improvement in way of classification and decision making for processing. In Marathi language many consonants lies in same range of pitch frequency, so distinguish them for further processing is challenging issue. Using the proposed algorithm, the classification of these consonants was done effectively. This approach is useful to avoid unwanted processing of Marathi alphabets; this will be helpful to improve speech intelligibility while recognizing vowel, consonants, words, short sentences and confusing words for Hearing aid users.

Acknowledgement
This research work was carried out at the Priyadarshini Deaf Residential School, Shirpur Dist-Dhule (India). We have obtained all ethical approvals from an appropriate ethical committee of institutional review board. If any issue arises hereafter then we will be solely responsible. The Scientific responsibility is assumed by the authors. The Institute had given the researcher permission to use our data as part of their experimental study. Institute has no objection to publish the experimental data in conference or journal Paper.