Eye-tracking based quantification of the safety of human-machine interfaces of complementary protective system functions

Complementary protective measures are of increasing importance with rising degree of automation. As free robots become part of our daily life in industry, on shop floors and beyond, the overall safety of persons has to be ensured. However, assessing the reliability of complementary safety functions remains a challenge, particularly when humans are in the loop. The paper shows how to use the eye-tracking methodology to gain data for assessing the reliability of the human interaction with machine interfaces for complementary protective measures. The paper first identifies factors relevant for eye-tracking, then selects related eye tracking test parameters and finally provides a systematic procedure to assess both, in particular regarding visibility and susceptibility. The methodology is applied and the parameter selection is validated. It is found that in particular the identified and measured parameters fixation count for area of interest (AOI) and the associated average visit duration can be used to assess the factor perceptibility. The parameter deviation of fixation can thereby be used to assess usability. Based on this, a full-scale eye-tracking assessment is proposed for the reliability of the interaction of humans with the machine interfaces of supplementary protective measures. In summary, the preliminary test run execution shows that eye-tracking technology is a promising method for measuring and quantifying the human reliability when interacting with safety-related human-machine interfaces.


INTRODUCTION
Due to technological evolution, the human role changes form a part of the production chain towards the control system. In a similar way, humans have ultimately to control the ever-increasing number of automated systems (robots) that surround them and serve their purposes, in future even at home.
As the level of automation is increasing, control elements that include human-machine interfaces (HMI) become even more important. Such elements include complementary protective measures that allow to ultimately safely stopping autonomous systems. It was shown that HMIs need to be considered in the assessment of the reliability of safetyrelated system functions in the same way as technical elements as they are integral part of the generalized sensor-logic-actor chain ( Figure 1) (Zehetner, Weber, Häring, & Riedel, 2018).
In this context, the question arises which overall safety assessment approach or methods can be used. They need to be appropriate, reproducible, valid and acceptable by experts and practitioners. Quantified expert estimates and Human Reliability Analysis (HRA) methods of the first and second generation certainly support such a safety assessment process. However, their use for a specific human-machine interface is limited due to their focus on human error prediction within more generic human performance assessment in organizational contexts. Still, they can be expected to be used for a first quantitative estimate assessment and to define test scenarios (Zehetner et al., 2018). Hence, an extended chain of methods is needed for the assessment of safety related system functions with humans in the loop. In particular, eye-tracking technology is identified as one part of the extended method chain (Zehetner et al., 2018). This approach is substantiated by Jianjun Jia (2017) who notes that errors can also be caused by adaptive (unintended) capabilities and that bad interfaces lead to mishandling. In this context, the importance of HMIs is discussed in particular by some authors (Anuar & Kim, 2014;Jia, Huang, & Zhang, 2014;Liu, Hwang, Hsieh, Max Liang, & Chuang, 2016).
Therefore, there is in particular a need to assess the design of the human-machine interfaces with respect to their reliable use. To that aim, the user perception has to be considered. Current methods, techniques and tools for evaluating and measuring user experience are mainly drawn from usability research (Fu Guo, Yi Ding, Weilin Liu, Chang Liu, & Xuefeng Zhang;van der Laan, Hooge, Ridder, Viergever, & Smeets, 2015). The aim of this type of research is to identify contributing factors in order to determine good and well-perceived interaction of user and interface. However, no strong focus is laid on reliability and the time needed for interaction.
Conventional methods, such as interviews, focus group approaches and questionnaires have generally been unsuccessful, as described for instance in Guo et al. (2016). They depend on the willingness and often also lacking capability or expertise of the users to describe the interaction with the test object. Nevertheless, the mentioned methods are seen at least as a part of an overall tool chain to ensure a broad basis for safety assessment.
The user's perception can be analyzed using eye-tracking technology. Several studies were conducted in which eye-tracking was used, cf. Peschel & Orquin (2013) or van der Laan et al. (2015). In the light of the above context and existing research, it is a promising method for the assessment of HMIs of safety related system functions.
For this reason, this paper first gives in Section 2 a methodological overview of the eyetracking technology and its application in research. This shows the various areas of similar applications and the parameters used for evaluation.
Based on this review, Section 3 presents the experimental design, focusing on the materials used and the implementation of the preliminary test run.
Section 4 identifies and discusses a measurable set of parameters regarding their use for assessing the reliability of HMIs. A preliminary test run is proposed, conducted and evaluated for candidate promising parameters. Section 4 finally selects a set of parameters for reliability assessment of the interaction of humans with safety elements of supplementary protective measures of machinery.
In this way, the characteristics of the parameters for the described HMI context can be identified. Background is that the methodology consists in the pre-selection of eyetracking approaches and parameters expected to be suited to assess the safety of supplementary protective measures of machinery from literature, which is a new application.
Finally, conclusions are drawn regarding the suitability of the proposed eye-tracking approach to better assess the safety of supplementary protective measures in Section 5.

METHODOLOGY
The basis of this paper was a broad literature research. This was among others conducted via the online databases ScienceDirect®, Springer-Link and Google Scholar. Keywords covering the eye-tracking method, real test procedures and the safety context were used as search terms. In a first step, the use of the eye-tracking technology in research was reviewed to identify similar applications, to infer the potential of the methodology and promising application modes.
Therefore eye-tracking parameters were divided into basic and combined parameters. Basic (BP) represents the raw data as measured: time, fixation, saccade and pupil diameter.Combined parameters (CP) are a combination of the basic parameters.
These four basic parameters can be correlated to the so called "areas of interest" (AOI). The used term differs in literature from "area of interest" to "region of interest" to "interest area" to "LookZone". For the present paper, the term "area of interest" (AOI) is used.
AOI defines a specific area that is defined during the assessment of the eye-tracking data after the test runs. Typically, several zones are defined before the experiment, but can usually only be included in the evaluation after the data were recorded. The objects to be assessed are defined as AOIs. The quality of the evaluation depends on the quality of their definition. With a simple graphical representation of the areas usually only a graphical representation of the output is possible. If, however, finer parameters such as the spatial resolution of the AOI are available, the raw data can also be evaluated in greater detail. An AOI is therefore not directly a measurable parameter.
Other parameters are analyzed with respect to these zones. Examples include the total fixation time of an AOI and the time the test person spends revisiting it.
In the following text sections measurable eye-tracking parameters as documented in literature are listed that are identified as relevant for the reliability assessment of HMIs. A classification into basic and combined parameters is made. They are deduced from related fields and comparable applications. A detailed table with all parameters inspected and their classifications according to the just introduced taxonomy are given in the Apendix to this paper. Xiao et al. (2018) used salient object detection to identify the region of interest in an image and provided an efficient solution for its semantic understanding. It was shown that eye-tracking data can accurately track where humans are interested in an image. This approach has been introduced for improving image processing. In their study, they propose a new saliency detection model with a combination of super pixel segmentation (i.e. pre-marking of potential areas of interest) and eye-tracking data. They use the basic parameter fixation and the combined parameters (first) fixation duration. Bhavar et al. (2017) used eye-tracking to develop proactive strategies to prevent human error. The aim was a better understanding of the situation awareness of control room operators. The results demonstrated that the proposed measurement of the combined parameters (first) fixation duration reliably identify the situation awareness of the participants during various phases of abnormal situation management. Summarized Bhavsar used in his paper the combined parameters (first) fixation duration.
In a previous study, the same authors identified the existence of specific eye gaze patterns that reveal operators' cognitive processes . The paper further develops this cognitive engineering-based approach and proposes novel quantitative measures for operators' situation awareness. The proposed measures are based on eye gaze dynamics and are evaluated using experimental studies. Dzeng et al. (2016) created a digital building construction site and designed a hazardidentification experiment involving four workplaces featuring obvious (e.g. electrical shocks) and unobvious hazards (e.g. person falls or collapses). Eye-tracking was used to compare the fixation patterns of experienced and novice workers. They employed the basic parameters fixation and saccade and the combined parameters (first) fixation duration and scan path. Ha et al. (2016) used eye movement data to evaluate operator thoughts in a nuclear power plant simulator. Using this method, about 80 percent of their considerations could be inferred correctly. The basic parameter fixation and the combined parameters (first) fixation duration and dwell time on object were used. Kodapully et al. (2016) demonstrated that eye-tracking can serve as a reliable sensor of various cognitive tasks performed by the operator while managing process abnormalities. For their study, they measured eye gaze movements in the context of individually defined areas of interest (AOIs). They measured the time spent on each AOI (dwell time), the transition between the AOIs, AOI count and entire length of the task. This corresponds to the basic parameter fixation and the combined parameters (first) fixation duration and scan path. Park et al. (2016) used eye movement data to infer human behavioral intentions in the context of viewing pictures while operating under different intentions, which necessitated cognitive search and affective appraisal. The study selected the following basic parameters: fixation counts and duration as well as pupil diameter between nonspecific and specific task groups. Summarized the basic parameters fixation and saccade and pupil diameter and the combined parameter fixation count and fixation duration were analyzed Sharma et al. (2016) used eye-tracking to assess the behavior of the operator in a typical control room of a chemical plant. Experimental studies conducted on 72 participants reveal that fixation patterns contain signatures about the operators learning and awareness within various situations. Implications of these findings on human error in process plant operations are discussed. The fixation pattern included the following sets of parameters: gaze position, fixation, horizontal and vertical positions of both eyes, distance from eye tracker and pupil diameter. In summary, they used in his paper the basic parameter fixation and pupil size and the combined parameter coordinates. Khalighy et al. (2015) provided a methodology to quantify the qualasities of visual aesthetics in product design by applying the eye-tracking technology. They used the number of fixations, the standard deviation of duration of fixations and the common area of fixations. Summarized they used in his paper the combined parameters number of fixations, deviation of fixation and fixation density. Tsai et al. (2012) used sequential analysis of fixated "LookZones" to compare the scan patterns between successful and unsuccessful problem solvers. These LookZones can be seen as areas of interest. The sequential analysis comprised fixation count, fixation duration and average fixation duration. This allowed to determine the combined parameters (first) fixation duration and fixation density. Underwood et al. (2002) found that novice drivers tend to restrict their search of the road on dual-carriageways, relative to the scanning observed in experienced drivers. The study determined whether the difference was the result of novices having limited mental capacity remaining besides maintaining vehicle control, or whether it resulted from an impoverished mental model of the events likely to occur on a dual-carriageway. The combined parameter (first) fixation duration was used. Hyönä et al. (2002) used eye-tracking to identify reading strategies of adults. In total eight eye movement parameters were computed for each sentence in two distinct text types. College students were recorded as they were reading two multiple topic expository texts for the purpose of summarizing each text from memory. The frequency and duration of fixations were classified into four categories for each sentence in each text: forward fixations, re-inspections, look backs and look froms. With respect to the processing of a texts' topic structures, the processing measures were computed for six types of sentences. The first five sentence types all had potentially unique relevance to processing of the topical structure of an expository text. The sixth sentence type had no particular relevance and thus serves as a type of baseline of comparison to the first five sentence types. They used the basic parameter fixation and the combined parameter (first) fixation duration.
In summary, in neighboring fields eye-tracking approaches have been applied and parameters were identified that should be applicable also for the quantitative assessment of the interaction of humans with safety elements of supplementary protective measures, see the table in the appendix. Next section 3 presents the equipment used for test trials and section 4 lists the finally selected candidate parameters.

Materials
Test object was a DMG Mori DMU 50 eco at the mechanical workshop of Furtwangen University of applied sciences. The processing machine tool was equipped with two additional emergency stop buttons (Figure 2) on the front and rear side. Tobii Pro Glasses 2® (Tobii AB [publ], 2020a) were used as the measuring instrument to quantify the eye-tracking. The sampling rate was adjusted to 100 Hz. The exactness of the values cannot be given, as this depends on many individual factors. the manufacturer himself only gives values for optimal conditions (300 lux illumination, 1.5 m distance, < 15° viewing angle, black/white (target/background). However, this is not transferable for the test scenario carried out.
The following evaluation was done with the software Tobii Pro Lab® (Tobii AB [publ], 2020b). Parameters that were not compiled by the software were manually combined with the exported raw data and Microsoft Excel®. Movements or fixations not measurable through the glasses were not recorded and were represented in the raw data through an empty field.

Testing method
The preliminary test run was carried out to verify the feasibility of the allocation of the proposed eye-tracking parameters and to validate the concept for a later round with a large number of test persons. Two different types of emergency-stop buttons were tested: A standard emergency stop button and an emergency button with a protective shroud (Figure 2).
An inspection method following Duchowski (2007) was used as basis for the preliminary test run. Within the different methods, the standard inspection was selected with the allocated factors ( Figure 3).g A combination of "thinking aloud protocol" and performance measurement was used as test method. The participants were therefore asked to verbalize their thoughts, feelings, and opinions while interacting with the test object as part of a user scenario. Due to the required experience and the methodical approach, test persons with an appropriate professional background and experience were selected (Duchowski, 2007).
Performance measurement according to ISO 9241 (ISO, 2002) provided in particular a usability evaluation approach based on the measured performance of pre-determined usability metrics. With this two-way approach, both conscious (c) and unconscious (uc) attention was recorded and evaluated The preliminary test run was conducted on Saturday 6 th of April 2019 from 9 am to 1pm in the central mechanics workshop of Furtwangen University of Applied Sciences, in a real-life setting. In order to create constant conditions, the natural light source was blocked off by closed shalossia. The light source was the room illumination, which had a luminous intensity of 1500 lux. In order to guarantee comparability of the results, care was taken to ensure that the test persons were in a apparently normal physical condition. Seven persons, all male and aged from 21 to 37, performed the test. None of the test subjects wore glasses. All subjects had a completed apprenticeship in a metalworking profession and at least 2 years of work experience. Their professions in metal working ensured the suitable technical background and experience. The test run was conducted in two steps (Figure 3). It was assumed that the usual regular occupational medical examinations as required by law, which also concern the eyesight, took place. Before the test procedure was started, the eye-tracking glasses were initially calibrated on the subjects. For this purpose, the test person put on the glasses and the calibration process in the user software was started. The test person had to hold a calibration card in front of himself at a distance of 1 meter and keep it fixed with his eyes until the software signaled successful calibration.
In the first step of the test procedure, unconscious attention (uc) was measured. For this purpose, the test person was told in the phase "test uc briefing" (figure 3) that a general assessment of a machine is carried out. The following tasks referred very generally to the design of the machine. This ensured that the test person could not deduce the test purpose from the questions. By querying the superordinate type of control elements, data sets on emergency stop buttons were also generated. In the subsequent evaluation, only the relevant data records for emergency stop buttons were evaluated.
After the uc test, the "test c briefing" followed. The test person was fist informed about the purpose of the test procedure and the actual test object. With this knowledge, the test person was able to focus attention fully on the test objects in the second part of the test procedure. Several questions about the test object were asked and in the analysis the eye-tracking data while the test person looked at the test object was evaluated.
Data sets by video analysis and structured interviews were gained in this preliminary test round. This paper focuses on the use of the eye-tracking method and hence only uses selected information regarding the test persons.

Selection of suitable parameters
As shown in section 2, eye-tracking is suitable to measure and quantify human reaction as well as behavior and also the potential interaction in various similar contexts. The classification taxonomy of relevant methods proposed can be applied and promising parameters of the eye tracking approaches can be identified and classified according to the taxonomy.
As the paper focuses on the use of eye-tracking to assess human reaction, behavior and interaction for the better assessment of the reliability of human-machine interfaces for safety related functions, an appropriate allocation to the assessment factors was done ( Figure 4). The Annex to this paper gives a broad overview of basic and combined parameters. Within the following text only the parameters that were used in the preliminary test run are shown.  Table 1, the measurable parameters accessible with eye-tracking were assigned to the corresponding factors with their quantitative characteristics.

Deviation of fixations Evaluation in relation to all AOIs
Here no single assessment factor can be given

Average visit duration on AOI Evaluation in relation to all AOIs
For the listed finally selected parameters of Table 1, a preliminary test run was conducted to determine the parameters and to evaluate their relevancy for the factors.

Results of preliminary test run
The results of the preliminary test run are shown in Table 2. Both, average and median values are given. As it was conducted as a preliminary test run with seven persons, no variances of the values are given.
The obtained numbers are consistent. For instance, the first fixation times are always shorter as the average fixation times. The total visit duration time allows for several visits.
Both, the values of the first fixation duration and the values of the fixation duration for the AOI, are less than 400 ms. According to Ha et al. (2016) this indicates that all objects are consciously and unconsciously well visible. This assumption is supported by Sharma et al. (2016) and their interpretation of first fixation duration of AOIs.
In comparison with each other, the emergency stop button with protective shroud (right hand side of figure 2) seems to perform slightly better during unconscious perception than the standard pushbutton.
The determined rather small maximum pupil diameters corresponded with the values determined in the literature in related contexts (see Table 2). A good visibility can be derived from this.
The fixation density is not evaluated within this preliminary test, as the exact automatic recognition of the fixations were not yet performed. However, the minor improvement needed for future test runs have been identified in terms of an improvement of the automatic recognition of AOIs and the corresponding data transfer. Because the automatically transmitted fixation values are also required for the evaluation of the deviation of fixation, this parameter is not evaluated for the preliminary test run.
According to Bhavsar et al. (2017) a fixation count for the AOI of 3.63 and the corresponding average visit duration on AOI of 1.47 s for the emergency stop button design during test of type unconscious (uc, Figure 3) indicates a good perceptibility. In comparison, a fixation count for the AOI of 6.50 and the corresponding average visit duration on the AOI with 4.15 s for the emergency stop button with protective shroud design during test of type uc ( Figure 3) indicates a bad perceptibility.
When compared to the initial list of potential parameters, see Table 2, Table 3 lists parts of the finally proposed parameters based on the test run as well their quantitative results for an initial set of seven test persons. In addition, the following composed parameters are proposed to be used within a test procedure: fixation density and deviation of fixation, see Table 2. They were shown to be within reach of the procedure. Fixation density contributes to the factors visibility. Deviation of fixation to the factors usability.
The tests showed that the parameters can be determined. Eye-tracking parameters were allocated to all measurement factors of interest, see table 3. The two designs of Figure  2 resulted in parts in significant changes of eye-tracking parameters, which could be interpreted. However, most other parameters only change slightly.

CONCLUSIONS
In an increasingly technological workplace, the roles of humans shift from a part of the production chain towards integration into the control system. In this context HMIs of complementary protective system functions become more and more important and have to be included in the reliability assessment. This requires an extension of existing available methods. The eye-tracking technology has turned out to be a very promising additional method, as it does not depend on the conscious willingness of the test persons or any methodological pre-training.
In order to establish this approach, this paper reviews any similar contexts in which eyetracking technology has been used so far. Relevant parameters were selected from all potential parameters and were allocated to a previously developed taxonomy of factors. Also, for the parameters a taxonomy was used throughout basic and composed parameters.
The following parameters were considered relevant for reliable human operation of the supplementary protective measures: fixation duration, average fixation duration, pupil diameter, fixation density, fixation count and deviation of fixation. For these basic and combined parameters, characteristic values and interpretations were reviewed in the literature and applied to assess the numbers as obtained from the experimental tests (see table 3). Base on the test run in addition the parameters were shown to be within reach of the procedure and to be relevant.
As a proof-of concept, a preliminary test run was carried out to determine the parameters demonstrating their suitability for the assessment of safety-related HMIs.
Regarding the results in Table 3, it was evident that the identified and measured parameters could be used for evaluation of the factors identified as relevant for the assessment of the reliability of the human interaction with the HMIs. They were within the expected ranges and show reasonable modifications with respect to different supplementary protective measures. The parameters (first) fixation duration, fixation duration for the AOI and the pupil diameter were found to assess the factor visibility (see also Table 3). The parameters fixation count for AOI and average visit duration on AOI are suitable to assess the factor perceptibility. The parameter deviation of fixation assesses usability.
In summary, the preliminary test run execution showed that eye-tracking technology is a very promising method for measurement and quantification of the reliability of safetyrelated HMIs.
Within future work, beyond leveraging the approach to much greater sets of test persons, in addition the learning effect could be considered. For this reason, the test procedure needs to be repeated at intervals. Furthermore, the data sets of the eyetracking method could be compared with the data sets of the other methods reviewed in the introduction section and be used to make cross-comparisons.
A further quantitative analysis that could be conducted in parallel is video analysis. In particular, the actuation time is accessible and is one of the most important parameters in addition to those of the eye-tracking method. Actuation time is the time required for the actual interaction from disturbance signal to pushing the button. This includes the recognition of the situation, the finding of solutions as well as the realization of the action. Whereas the present eye-tracking approach covers the latter critical step, video analysis would allow embedding the whole process in the overall human task.
Conflict of Interest: No conflict of interest is declared.