EEG is an electrophysiological signal that reflects unique cognitive and neurological information of the person; thus, EEG possesses the potential to be used for robust bio-metrics. EEG provides a high level of security since it is very difficult to reproduce the EEG patterns of a specific person. In addition, in the event that the person is subjected to a forced EEG reading against his/her will, the EEG biometric system may detect the person’s stress and deny access. Deep learning-based models have achieved state-of-the-art results in a wide range of clinical applications and biometric identification problems, ranging from phone authentication to bank security systems. The use of DL models for biometric identification has been increasing in recent years. Within the field of EEG bio-metrics, DL models have been leveraged to improve the accuracy of EEG biometric identification systems. Most Machine Learning projects are performed on ordinary computers or Graphics Processing Units because of the computing power they offer. However, since the purpose of this project is to develop a portable Internet of Things system, it is not feasible to use common computers. In this work, we use Raspberry Pi as the core, which is compact, portable, and Wi-Fi capable. Raspberry Pi is widely used for portable IoT applications because of its versatility. The Raspberry Pi has been used in a number of projects for a variety of purposes, including monitoring the health status of patients, a security system, and a testing system device to name a few. The Raspberry Pi has also been used for projects based on EEG signals. For example, in [12], EEG is used to control a car, and, in [13], gutter berries it is used to monitor the depth of anesthesia. The purpose of this work is to develop a first approximation to a fully functional portable system for subject identification that uses trained DL models to process the signals.
This paper is organized as follows: The data description is presented in Section 2. In Section 3, we describe the software implementation for the subject identification using EEG signals. Section 4 shows the hardware implementation of the system. Section 5 reports the results. Finally, the discussion and conclusion will be drawn in Section 6. The BED is a dataset specifically designed to test EEGbased biometric approaches that use relatively inexpensive consumer-grade devices. The dataset includes EEG responses from 21 subjects to 12 different stimuli, which were broadly divided into four different types, namely affective stimuli, cognitive stimuli, visual evoked potentials, and resting-state. Each stimulus contains data across three different chronologically disjointed sessions. Fourteen-channel EEG signals containing the channels—AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4 as shown in Figure 1—were collected at a sampling rate of 256 Hz. The BED dataset includes the raw EEG recordings with no preprocessing and the log files of the experimental procedure, in text format. The EEG recordings were segmented, structured, and annotated according to the presented stimuli, in Matlab format. The BED dataset also includes Autoregression Reflection Coefficients , Mel-Frequency Cepstral Coefficients , and Spectral features that were extracted from each EEG segment. In this work, however, only raw EEG recordings without any manually extracted features were used. Out of the 12 different stimuli, we only used EEG signals recorded during the Rest-Closed Stimulus since the action of “closing your eyes” is an easy and natural action and could be best replicated in a realworld scenario for biometric applications with ease. Having stimuli in the experimental procedure will require additional devices to present them to the individuals, which in turn will increase the complexity of the whole setup and practical use in the real world. The BED dataset was simulated using our real-time Raspberry Pi-based system as a proof of concept to test the feasibility of this work for any real independent practical use. The simulated data acquisition in our Raspberry Pi-based system is performed by an analog-to-digital converter that reads the analog output from the digital-to-analog converter that receives the input from the original saved BED dataset. More details will be discussed in Section 4.
The preprocessing step of EEG signals is a crucial step in the DL pipeline because of its impact on the EEG analysis process. Without the preprocessing step, there may be noisy data and artifacts that can mask distinct features in the EEG signals. This can cause the model to have a harder time distinguishing between relevant EEG features, resulting in poorer performance of the model. In addition, we must pay attention to the quality of the preprocessing step as it can introduce unwanted artifacts if the early stages of the pipeline are not properly addressed. For example, although ordinary average referencing improves the signal-to-noise ratio, noisy channels which depend on the reference can contaminate the results. Figure 2 shows the different steps of EEG signal preprocessing. The well-known preprocessing technique—PREP pipeline —introduces specific important functionality for referencing the data, removing line noise, and detecting bad channels in order to deal with noisy channel-reference interactions. The PREP pipeline also removes artifacts such as muscle movement, jaw clenching, and eye blinking. The pipeline consists of various steps. First, the signal is filtered using a 1 Hz high pass filter followed by line noise removal using a notch filter at 60 Hz. In addition, finally, the signal is robustly referenced with respect to an estimate of the true mean reference, thereby enabling the detection of faulty channels. These channels are then interpolated relative to the same reference. We then use a low pass filter of 50 Hz and divide the EEG data into overlapping epochs with an overlap rate of 90 percent. Finally, we standardize the EEG signals for each channel using StandardScaler. Figure 3 depicts an EEG epoch before and after preprocessing.The deep learning models used for subject identification are based on ResNet, an extension of the neural network into internal structures that add direct connections to the internal residual blocks to allow the gradient to flow directly through the lower layers; Inception, a convolutional neural network architecture that executes multiple operations with multiple filter sizes in parallel to avoid facing any compensation and allows the network to automatically extract relevant features from the time series; and EEGNet, a compact convolutional neural network that has been designed to build an EEG-specific model as it includes concepts and tools specific to EEG signals such as feature extraction and optimal spatial filtering to reduce the number of trainable parameters. The utilization of these three DL models was based on the fact that they are state-of-the-art models that have achieved good results for various other applications. For example, ResNet and InceptionTime were designed for Time Series Classification. EEGNet is a DL model that was designed for EEG-based Brain-Computer Interfaces.
As a result, strawberry gutter system we wanted to examine the use of these models for the EEG Biometrics application. Modifications for ResNet include adding additional residual blocks, to understand whether a more complex model that could extract complex features would perform better. For the Inception model, additional dropout layers and inception blocks were added. The activation function was changed from Rectified Linear Unit to Exponential Linear Unit as the model was overfitting. For the EEGNet model, we fine-tuned the length of temporal convolution in the first layer and the number of channels after trial and error. Other modifications, such as adding GlobalAveragePooling2D layer, varying the dropout rate from 40 percent to 60 percent, and rearranging the order of layers, did not significantly improve the model performance. In addition, within all our models, we added a callback function that reduces the learning rate based on the training loss. Specifically, we added the hyperparameter called patience, which is the number of epochs of non-decreasing loss values that the model runs before reducing the learning rate by half. Because we want to check the feasibility of EEG-based Biometric identification over long periods of time, these presented models were trained using the first two weeks of the three chronologically disjoint sessions in the BED dataset. The third week of data was used for testing the trained model. Once the models were trained, we had their hyperparameters, including learning rate, batch size, number of filters, kernel size, and number of epochs fine-tuned. The hyperparameter learning rate was set to 0.003 for the ResNet model and 0.009 for the Inception and EEGNet models. In addition, the number of epochs has been modified, 150 for Inception and 400 for ResNet. Figures 4–6 briefly show the overall architectures of our modified Resnet-based, Inception-based, and EEGNet-based framework used in this work. The model was evaluated using the third session of the three chronologically disjoint sessions provided in the dataset. The trained model would take one EEG segment from the testing dataset at a time, and output the prediction of which person it belongs to. Once all the EEG segments in the testing dataset have been predicted, we would use the confusion matrix to evaluate the model performance. The evaluation of the machine learning algorithms was carried out by comparing the most relevant indices for the prediction of the subjects. The accuracy of the models calculates the ratio of correct predictions over the total number of instances evaluated. Precision is used to measure the positive patterns that are correctly predicted from the total predicted patterns in a positive class. Recall is used to measure the fraction of positive patterns that are correctly classified. F1 Score is calculated to represent the harmonic mean between recall and precision values. Finally,the Precison vs. Recall curves are obtained to provide a graphical representation of the DL models’ performance. In this experimental work, data acquisition is simulated by using data from an existing database as mentioned in Section 2. Real-time EEG acquisition is in the scope of the future where we intend to implement the best deep learning model obtained from this study along with real-time signal acquisition for biometric application. Keeping in mind our goals for future work and existing time constraints, we decided to simulate analog EEG input in this work as mentioned later in this section. The system consists of a Digital-to-Analog Converter in charge of converting the stored data to analog signals mimicking real EEG acquisition scenarios. The DAC is the 12-bit MCP4725 chip with an Inter-Integrated Circuit communication bus. Since the converter is a 12-bit converter, the range of digital values that can be converted is from 0 to 4095. Therefore, before converting the EEG signals stored in the memory of the Raspberry Pi to analog signals, the data were transformed by scaling each value to the range from 0 to 4095. The design includes an electronic loop with an Analog-to-Digital Converter that transforms the analog EEG signal from the subjects—in this case, the DAC, to a digital signal for its processing by machine learning algorithms. The ADC is the 10-bit MCP3008 chip with Serial Peripheral Interfac. The analog data are converted to digital signals by 10-bit architecture, so the digital data range is from 0 to 1023. As a result, we simulated the acquisition of EEG signals by means of an electronic loop between an ADC and a DAC using the data from the BED dataset. Figure 7 illustrates the complete hardware of the system and its connections.The acquired signals are processed by the system controller, which is based on the Raspberry Pi 4 Model B 4 GB RAM, which performs the tasks of capturing the EEG signals, pre-processing them using a pipeline, and processing them through machine learning algorithms for the identification of the subject. The Raspberry Pi is a single-board computer based on Linux . The programming language used is Python as it is the most suitable for machine learning applications owing to a large number of available libraries. Task management is carried out by threads that run in parallel. The first thread consists of constant EEG data acquisition by the ADC, which is stored in a buffer for their processing, and the second thread is in charge of the pre-processing technique and the classification in real time of the samples stored in the buffer. Therefore, the result of the subject identification can be obtained in other edge devices, such as a PC or a server. Here, the result is displayed on the terminal indicating the identification of the subject.