Machine learning in occupational safety and health: protocol for a systematic review

Industry 4.0 has shaped the way people look at the world and interact with it, especially concerning the work environment with respect to occupational safety and health (OSH). Machine learning (ML), as a branch of Artificial Intelligence (AI), can be effectively used to create expert systems to exhibit intelligent behavior to provide solutions to complicated problems and finally process massive data. Therefore, a study is proposed to provide the best methodological practice in the light of ML. Alongside the review of previous investigations, the following research aims to determine the ML approaches appropriate to OSH issues. In other words, highlighting specific ML methodologies, which have been employed successfully in others areas. Bearing this objective in mind, one can identify an appropriate ML technique to solve a problem in the OSH domain. Accordingly, several questions were designed to conduct the research. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) extension for Protocols and Systematic Reviews were used to draw the research outline. The chosen databases were SCOPUS, PubMed, Science Direct, Inspect, and Web of Science. A set of keywords related to the topic were defined, and both exclusion and inclusion criteria were determined. All of the eligible papers will be analyzed, and the extracted information will be included in an Excel form sheet. The results will be presented in a narrative-based form. Additionally, all tables summarizing the most important findings will be offered.


Background
Throughout the years, occupational health and safety practitioners, researchers, and managers have addressed the physical, chemical, biological, ergonomic, and organizational environment issues using traditional methods. However, with the rise of the Industry 4.0 era, this picture started to change. Industry 4.0 aims to improve workplaces within the digital revolution, including artificial intelligence (AI), the Internet of Things (IoT), and "smart" devices (Badri et al., 2018). AI is a computer science branch aiming to turn computers into intelligent machines using algorithms (Eicker et al., 1991). This term was first introduced in Dartmouth College in 1956, during a scientific conference (Hsueh and Oliveira, 2019). The main AI applications employ machine learning (ML) mathematical methods (algorithms), which provide statistical (either linear or nonlinear) models to perform predictions, classifications, and regression, on problems to be solved (Nicholson et al., 2019). Different ML methods have been integrated into everyday life and workplaces (Can et al., 2019;Moore, 2019). The most common applications are those being embedded in physical objects such as sensors (Braun, 2009), robotic devices (Wallach and Marchant, 2019), or intelligent decision support systems (Lee et al., 2019). As an instance for physical sensors, considering the importance of monitoring and controlling particulate matter (PM) 2.5, the research presented in (Loh and Choi, 2019) targeted multiple affordable sensors instead of high-end ones via employing AI methods to control the scattering of light. For robotic devices, Eder et al. (2014) studied the techniques encompassing the presence of robots performing parallel with operators and the dangers and the responsibilities for human safety. They concluded their research by recommending safety considerations into the design of robotic co-workers from a learning algorithms perspective. One study even addressed the potential ML effects on automation and human-machine interactions (Kong et al., 2019), as generated and stored information can also help measure, assess, and control health and safety issues effectively. In the same scope, hazard prediction in complex industrial environments can also be addressed by the studying field (Zhao et al., 2019).
ML methods have also been used for safety training (Cho et al., 2018), personal protective equipment improvement (Ellena et al., 2016), environmental health surveillance (Xie and Chang, 2019), wellness and health promotion in different occupations like construction, petroleum, firefighting, office workers, drivers, among others (Jin et al., 2019). It could improve occupational safety and health (OSH) measures and equipment from one side, and influence the quality of new methods, used to measure, assess, and control hazards from the other side (Moore, 2019).
Considering the explained scope, in a recent study by Akanmu et al. (2020), a cyberphysical postural training environment has been designed and equipped with ML methods to train workers to work with less ergonomic risks. The system involves wearable sensors, VIVE trackers benefiting from ML methods to train workers in a virtual reality environment while tracking their body kinematics. Other similar works are also the study by Amrollahibuki et al. (2018) and Grabowski et al. 2014Grabowski et al. (2014 for training mine workers. While new technologies and new employment patterns have shaped the world of work, it has become more critical than ever to identify the emerging work-related occupational safety and health risks (Sarkar et al., 2017). Recognizing the hazards and risks is crucial and critical to effectively manage them and develop practical preventive actions (Howard, 2019). As an instance, the research done by Albert et al. (2017) can be considered, which include both recognizing micro-level hazard among different types and evaluating their impacts on the studied interventions. Novel ML applications in occupational health and safety raise important issues for understanding the limitation and possibilities and assess user interface effectiveness and usability. A review concerning the impact and best practices on the application of ML related to OSH is still missing from the literature. In light of that, it is proposed research that considers the main applications of ML in OSH and their potential and future challenges. The systematic review aims to answer a list of relevant topics offering a detailed analysis regarding the innovative and automated methods in this field, enabling a clearer view of the stakeholders generating and utilizing ML environments.

Research questions
This protocol aims to define the research methodologies to conduct a Systematic Review within the scope of ML, and more specifically, in the field of OSH. Several research questions were raised to accomplish this primary objective: 1.
What are the ML methods implemented in the occupational setting? 2.
What is the scope of each ML implementation for a specific OHS issue? 3.
In which occupational activities are ML methods being implemented? 4.
How effective and usable are ML methods for improving OSH? 5.
Are the ML methods improving traditional approaches? 6.
What are the limitations of the ML methods?

METHODOLOGY
The Preferred Reporting Items for Systematic reviews and Meta-Analyses (Page et al., 2021) was used to draw the guidelines for the systematic review protocol, as well as the Preferred Reporting Items for Systematic reviews and Meta-Analysis Protocol (PRISMA-P) checklist (Shamseer et al., 2015).

Eligibility criteria
To be considered eligible, a study has to address at least one of the topics referred to in the list of relevant questions presented in the Objectives (Section 1.2). All papers that consider the effect of ML applied in the public health and clinical field will be excluded. Participants of the study will include workers general population. Literature reviews will be excluded in the first phase. However, after the first set of eligible papers is found, their citations will be screened to identify other (grey) sources. Finally, it is important to mention that only publications written in English will be considered.

Information sources
Not to narrow the research in the first phase, only generalist electronic databases will be considered in this phase: IEEE, SCOPUS, PubMed, Science Direct, Inspect, and Web of Science. Additionally, to allow for the inclusion of as much evidence as possible, other sources will be considered to search for grey literature.

Search strategy
The keywords chosen to carry the primary research were "machine learning" and "expert system" and "Cognitive system" that are going to be sequentially combined with the main topics of interest: "occupational safety", "occupational health" and "work environment". The combinations will be as follows: "machine learning" AND "occupational health" "machine learning" AND "occupational safety" "machine learning" AND "work environment" "expert system" AND "occupational health" "expert system" AND "occupational safety" "expert system" AND "work environment" "cognitive system" AND "occupational health" "cognitive system" AND "occupational safety" "cognitive system" AND "work environment" In every database, the research will be carried out in the following way: the main combinations will be sought for in the "Title+Abstract+Keywords" field (Scopus) or the "Topic" field (Inspec, Science Direct, Web of Science) and in the" "Title" and "Abstract" ( Pubmed).
After screening the selected manuscripts in the second stage, new keywords will be identified concerning the subject. They will be used in new search combinations with the keywords previously used.
Furthermore, references will also be scanned to find more articles that can provide complementary information. Finally, in the third stage of research, additional sources identified in the analyzed articles will be assessed. This procedure will be repeated in the newly selected papers until no more relevant information is obtained.
Table 1 (Appendix 1) will be used to register the number of articles after the applied criteria that they are going to be: Year (published after 2010), Document type (article, article in press and conference paper), Source type (journals and conference proceedings), and Language (English).

Data management
An Excel-based file will be used to register, in a table form, the total number of articles obtained throughout the research, as previously mentioned. All of the selected papers will be exported and screened to remove duplicates. All the references will be managed using Mendeley software.

Selection process
The selection of literature will occur in two screening phases, based on the defined eligibility criteria. In the first phase, titles and abstracts will be screened, and in the second phase, full-text of all potentially relevant literature will be reviewed. If any doubt arises regarding the title and abstract screening, papers will be kept for the next phase as a potentially relevant study to be reviewed in the full-text phase. In the second phase, any disagreement will be resolved by consensus through discussion between two reviewers. If the conflict remains, the paper will be passed to a third reviewer for final resolution.

Data collection process
Qualitative data will be extracted from papers and included in the systematic review using a pre-structured table built by the authors. The aim will be to collect the data which will accurately answer the research questions and research objective. The table will be filled with the combined results of three reviewers. If any disagreement is observed, reviewers will try to solve it by discussing or using the fourth author's opinion for further disputes.
This will allow reducing bias and data extraction errors. The total number of screened studies, including those assessed for exclusion and inclusion criteria, will be determined. Furthermore, the reason to exclude studies during the selection process will be documented.

Data items
Tables will be used to record study statistics, including study characteristics (date of publication, country), target group, industry, study design, OSH field (musculoskeletal disorders, ergonomics, machinery, workplace safety, risk assessment), the hardware used, employed ML taxonomy (artificial neural network -ANN, Fuzzy logic), ML aims and scopes (classification, decision making, pattern recognition, prediction, analysis, regression), requirements of these methods, and finally limitation of the study.

Outcomes and prioritization
This systematic review aims to identify ML methods performed in OSH that involve measurement of various parameters in different workplaces, target groups. The second objective is to analyze the reliability of the results provided by those methods. The multiple methods of ML used in OSH, the methodological studies, and the list of the ML methods that have been examined for OSH with their possibilities and limitations will be reported.
By conducting this systematic review, a summary of the available research investigating ML in OSH in different industries for different phases of OSH, like monitoring, assessing, and controlling, will present. The obtained results from this review will help researchers, stakeholders, managers, and experts identify new technologies in OSH. It will also help to recognize knowledge gaps in the current literature, therefore, directing future research that will benefit and support the growth of OSH of industry.

Risk of bias in individual studies
The bias assessment in included articles will be performed with three independent reviewers, using the methodology suggested by Higgins et al. (2011) on a two-term scale: "high risk" or "low risk". Whenever it is impossible to discern between those categories, an "unclear risk" classification will be used. Each paper will be analyzed in light of its methodology and results in order to detect possible biases.

Data synthesis
The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement (Page et al., 2021) will be used to guide for writing of this review. The systematic review form will be used to write this study, and data synthesis will be carried out by extracting information from the eligible articles collected in the form sheet. In the table, the significant result will be highlighted.

Protocol registration
This protocol is under registration in PROSPERO All authors read and approved the final version.