The Remove-the-Mask Open-Source head and neck Surface-Guided radiation therapy system

Highlights • Open-source head and neck SGRT system developed to enable novel SGRT algorithm development.• Achieves a sub-degree and submillimetre surface tracking accuracy.• Allow easy access to raw surface images with high temporal resolution.• Easy to replicate and use in both laboratory and a clinical settings.


Introduction
Treatment with radiotherapy (RT) is indicated for the majority of head and neck (H&N) cancer patients [1,2].Radiotherapy drastically improves overall survival in H&N cancer patients; however, safe and effective H&N RT requires a higher accuracy than for most other treatment sites [1][2][3][4][5][6][7][8].This need for highly accurate radiation delivery stems from the necessity of generating steep dose gradients, and the radiation sensitivity of organs-at-risk (OARs) such as the brainstem, spinal cord, optical nerve, etc. and their proximity to the treated area [2,9].
To achieve the high level of precision needed, H&N RT treatments combine pre-treatment imaging to correct for day-to-day changes in internal anatomy and strict patient immobilisation with a thermoplastic mask [9][10][11].The use of such strict immobilisation causes up to 50 % of patients to experience claustrophobia and mask-related anxiety [12][13][14][15], which can be severe enough to reduce treatment adherence or trigger treatment refusal [16].Mask related anxiety can also lead to long-term mental health impacts such as symptoms of post-traumatic stress and recurrent nightmares [17,18].Immobilisation masks also partially negate the skin-sparing effect of megavoltage RT and increase skin dose by up to 18 % [19,20] on average for coplanar RT and up to 58 % for non-coplanar, non-isocentric beam delivery [21].Furthermore, the level of immobilisation that thermoplastic masks can achieve degrades with weight-loss induced changes in head anatomy [22][23][24].These changes can only be corrected through remaking the mask, a process that requires rescanning the patient and replanning, which negatively impact clinical workflow and can delay treatment.Ahn et al. found that 65 % of H&N patients in their study required a replanning at some point during their treatment, with anatomical changes leading to clinically significant tumour underdosing in 61 % of patients and to the spinal cord dose exceeding the safe threshold in 30 % of patients [25].
Open-face masks provide an alternative to thermoplastic masks that may reduce the incidence of mask-related anxiety and claustrophobia and provides similar level of immobilisation to closed masks [26], although patient motion exceeding clinical thresholds has been associated with their use [27].As a result, open-face masks are often combined with surface guided radiotherapy (SGRT) to accelerate patient positioning and provide motion monitoring in real-time without exposing the patient to harmful radiation [28].
While SGRT's usefulness in ensuring accurate treatment delivery for patients treated with open-face masks is undisputed, current SGRT systems struggle to accurately track non-rigid motion.Facial motion has been shown to reduce surface tracking accuracy [29,30] and changes in facial expression can even generate erroneous positioning correction for patients treated with an open-face mask [31].Additionally, ceilingmounted SGRT systems are subject to field-of-view obstructions by Carm linacs gantry and kV imagers, something that occurs most commonly in H&N treatment due to the treatment geometry [28].Furthermore, the shape of the human face lends itself to self-occlusion, a type of obstruction that occurs when a section of the imaged surface obstructs another section (e.g., the nose can hide the forehead or cheeks from view).Current SGRT systems solve this issue by using multiple surface imaging sensors located around the treatment room with overlapping field of view however, single sensor obstructions have been shown to reduce surface tracking accuracy [32] and the combination of self-obstruction and gantry obstruction can result in SGRT systems losing sight of the H&N during treatment delivery.The deployment of the kV imager, such as when acquiring a CBCT or for real-time Image-Guided Radiotherapy (IGRT) and large couch tilts increase the probability that a complete obstruction event will occur.
Solving those issues and creating a system that can accurately track H&N motion when faced with changes in facial expression during a RT treatment requires the development of novel surface tracking algorithms to handle surface deformation.In this study, we present the Remove the Mask (RtM) open-source H&N SGRT system.We aim to benchmark the accuracy of this system and to assess its accuracy and the factors impacting its accuracy for immobilisation-free H&N surface imaging in a radiotherapy setting.

Methods and materials
The RtM SGRT system's hardware consisted of two depth sensor cameras (Intel RealSense D415) used to image the surface of the patient.The sensors were positioned on each side of the patient, about 30 cm above the abdomen as shown in Fig. 1.This setup was chosen as it allowed the sensors to have an unimpeded view of the head, neck, and shoulders of the patient regardless of gantry angle and of whether the kV imager had been deployed on a C-arm linac.The sensors each contained 2 infrared cameras that were factory calibrated as a stereo depth camera, and a RGB camera that is factory calibrated to allow for colourisation of the surface data.Carfagni et al. have published a detailed description and characterisation of the image acquisition process of the Intel Real-Sense D415 sensor [33].
To control the surface imaging sensors and accomplish real-time surface tracking for the RtM system, we developed the Surface Guided Radiotherapy Toolkit in Python (SpyGRT) which we have made publicly available on GitHub under an MIT license [34].SpyGRT is a modular Python package allowed real-time surface data acquisition, processing and recording as well as in-depth data analysis of recorded data.Additionally, it also featured a calibration algorithm that allowed for verification that the frame of reference of the surface imaging system matched that of the treatment room.
The RtM SGRT system was used to capture, record and analyse the data in all experiment in this study.The data was captured at a frame rate of 30 fps however, due speed limits on data transfer and on writing to memory, the actual average frame rate of the recorded data was 15 fps.The average speed at which the RtM SGRT system was able to perform surface tracking was 4.5 fps.The quoted computational speed values were obtained when data capture and surface tracking were performed on a computer equipped with 32 GB DDR4 RAM and an NVIDIA RTX A5000 GPU.The surface motion data presented in this study was obtained during post-capture analysis to maximize the sampling rate of the measured motion.

Surface tracking algorithm
Like most SGRT systems, the RtM system tracks surfaces by performing a registration between a reference surface (which can either be acquired live, or imported from a prior session or the planning CT) and the most recently acquired surface image (the target surface).To increase the speed of the registration and remove non-patient points (such as the headrest) from the surface, all points outside of a pre-defined region-of-interest (ROI) are cropped from the target surface prior to the registration.
A feature of the RtM system is that the user-defined ROI is only used for the first frame in the tracking sequence.For each subsequent frame, the motion estimation for the previous frame is used to shift the ROI so that its location with respect to the patient is unchanged and therefore ensures that the target surface contains the same section of the patient's head in each frame.The motion estimation from the previous frame is used as the starting point for the surface registration of the current frame.
The registration algorithm used by the RtM system is the Open3D [35] implementation of the point-to-plane Iterative Closest Point (ICP) algorithm [36].This algorithm minimises the cost function shown in Eq. ( 1) where p and q are corresponding points on the target and reference surface respectively, T is the 6 degrees-of-freedom (DoF) rigid transformation between the target and reference surface, n p is the normal of the target surface at point p, and w pq is a weighting factor for the p, q corresponding pair and is calculated using the generalised adaptive loss function described by Barron [37].
The results of the registration are converted into a translation vector and a set of Euler angles which can be used for couch corrections.To improve tracking accuracy and preserve a high temporal resolution, the optimisation is performed in a 5-step rough-to-fine process.The first iteration of the registration is performed with the surfaces downsampled to 0.2 points per square centimeter, and the last iteration with 100 points per square centimeter.

Benchmarking the accuracy through phantom experiments
A phantom experiment was designed to test the accuracy of the RtM SGRT system using a UR16, a robotic arm capable of 6 DoF motion with < 0.2 mm and < 0.6 • accuracy and which has previously been validated for use in real-time tumour tracking QA [38,39].The phantom experiment aimed at quantifying the accuracy of the RtM SGRT system in a more controlled setting where all motion was rigid.A head phantom was secured to the robot and moved at fixed intervals of 1 mm from − 5 to 5 mm and 1 deg from − 5 deg to 5 deg one axis at a time for each translation and rotation axis.We also reproduced three real head and neck motions traces to evaluate the impact of surface deformation on the accuracy of our system.Fig. 2a shows the experimental setup.The tracking error for phantom experiments was defined as the difference between the motion measured by the RtM SGRT system and the motion trace given as an input to the UR16.

Evaluating accuracy for human participants: VISION study
The Visual Investigation of Surface Imaging for Oncological Needs (VISION) study is a human volunteer study approved by our institutional research ethics committee with the aim of acquiring surface images under similar conditions as that encountered in the clinic and to test the accuracy of the RtM SGRT system on human subjects.All participants in the VISION study provided informed consent prior to their participation.
During the study, participants were asked to lay down on a couch with their head resting on a neck support similar to those used in the clinic.Participants were asked to lay as still as possible for 10 min, and then asked to perform a series of head and facial motion: yawn, smile, slowly shake their head, frown, and swallow 5 times.Surface images were recorded throughout the entire study session with the RtM system, and the initial depth sensor temperature was measured using a temperature sensor embedded on the depth sensor chip.Initial sensor temperature was chosen for analysis as it indicates the extent to which the depth sensors were warmed up prior to the imaging session.The ground truth motion was the motion of 4 infrared reflectors located on a headgear worn by the participants and was captured with the OptiTrack (NaturalPoint, Corvallis, Or USA) commercial motion capture system which has a quoted accuracy of < 0.3 mm and < 0.1 deg.OptiTrack was independently validated using the UR16 (Universal Robots, Novi, MI, USA) robotic arm described in Section 2.3.The VISION study setup is shown in Fig. 2b.The tracking error when tracking human volunteer was defined as the difference between the motion measured by the RtM SGRT system and that measured by the OptiTrack system.
For each participant, the initial ROI was defined as the region from the boundary of the forehead and hairline to just below the chin.For subsequent frames, the ROI was adapted as described in Section 2.1.In this work, we present data from 10 VISION study participants, 5 male and 5 female, and compare the motion recorded by the RtM SGRT system with that recording by OptiTrack.
A multiple linear regression was performed to determine the impact of gender and initial sensor temperature on the error in the measured total displacement vector.A t-test was used to determine whether the coefficients of the regression were different from 0 and the statistical significance of those results.

Benchmarking of the accuracy of the RtM SGRT system
The tracking error (mean ± standard deviation) of the RtM system in tracking single axis translations and rotations was 0.0 ± 0.4 mm, 0.1 ± 0.3 mm and 0.3 ± 0.4 mm, and 0.1 ± 0.2 deg, 0.2 ± 0.2 deg and 0.0 ± 0.2 deg along the left-right (LR), superior-inferior (SI) and anteriorposterior (AP) axes respectively.For H&N motion traces reproduced on the UR16, the mean tracking error was − 0.1 ± 0.3 mm, − 0.2 ± 0.2 mm and 0.9 ± 0.4 mm, and 0.0 ± 0.1 deg, 0.1 ± 0.1 deg and 0.0 ± 0.1 deg for translations and rotations along the LR, SI and AP axes respectively.Fig. 3 shows one of the H&N motion traces reproduced on the UR16 and the motion measured by the RtM system.Fig. 4 shows the tracking error for that same motion trace.Error metrics and initial sensor temperature for all motion traces are shown in Fig. 5.

Accuracy of the RtM SGRT system in human subjects
The tracking error when tracking human volunteer was defined as the difference between the motion measured by the RtM SGRT system and that measured by the OptiTrack system.The surface tracking error of the RtM system on human subject was of − 0.1 ± 0.4 mm − 0.6 ± 0.6 mm and 0.3 ± 0.2 mm, and 0.0 ± 0.2 deg, 0.0 ± 0.1 deg and 0.0 ± 0.2 deg for translations and rotations along the LR, SI and AP axes respectively.Fig. 3 shows the recorded motion data for one of the participants.Fig. 4 shows the tracking error for that same motion trace.
The linear multiple regression showed no statistically significant correlation between gender and tracking accuracy (p = 0.37) but found that a low initial sensor temperature was correlated with worse tracking accuracy (p = 0.02).
We found that surface deformation from changes due to changing facial expression reduce the surface tracking accuracy.Fig. 6 compares the tracking error when a participant was instructed to perform forced facial motion and when the same head motion was reproduced on a rigid phantom.

Discussion
The RtM SGRT system is a novel H&N surface-guidance system that can be built with readily available consumer electronics and based on an open-source surface guidance code.In this study, the tracking accuracy of this system for both phantom and human subject was < 1 mm for translations and < 1 deg for rotations in both phantom measurements and in human subjects.This is in line with the published ESTRO guidelines on SGRT [32] and is comparable to what other studies have found for clinical SGRT systems [40,41].An important factor when evaluating the true accuracy of the RtM SGRT system is that the accuracy of the OptiTrack system which was used as ground truth is < 0.3 mm, which is similar in magnitude to the accuracy measured for the RtM SGRT system.As such, the true accuracy of the RtM SGRT system might differ slightly from the reported tracking error in this study.
The RtM SGRT system is a research H&N SGRT system that can be used to capture large datasets of surface images with high temporal resolution, and to facilitate the development, implementation, and testing of novel H&N SGRT techniques and algorithms.The couch mounted design of the RtM SGRT system bypasses the issue of occlusions from the gantry and kV imagers and allows easy installation and does not necessitate permanent modifications to the treatment vault which is ideal for a research system as it minimizes the impact on clinical practice when not in use.Furthermore, it also ensures that the tracking accuracy is not impacted by couch rotations, something which has been shown to reduce the accuracy of ceiling mounted SGRT systems [42].However, this design comes with the disadvantage of being unable to detect or verify couch motion since the SGRT sensors move with the couch.Additionally, the SGRT system would be dependant on the couch's own internal position monitoring to maintain isocenter alignment after couch corrections.
A major source of uncertainty in SGRT motion monitoring for H&N RT treatments with open-face masks is the reduced accuracy of surface tracking in the presence of non-rigid surface deformations induced by facial movements [30,31].A solution to this issue will require the development of SGRT algorithms which are more robust to non-rigid motion.C-RAD's Catalyst HD has implemented a non-rigid motion tracking [43], but previous studies have shown that surface deformations still had significant impacts on its tracking accuracy [31].

Y. Ben Bouchta et al.
Another limitation of this study is that we did not assess the motion of the internal anatomy given the observed surface motion.Given the presence of deformation in the volunteer data, it is possible that the measured surface motion was not representative of the internal motion.A similar effect As a result, the tracking error in this study represents the difference between the ground truth rigid motion and the observed motion rather than an error in the target position estimation, the more relevant quantity in a clinical setting.This effect would be particularly relevant to deeper target which are located further away from the imaged surface.
Similarly, since the point-to-plane registration outputs a 3D transformation rather than an independent translation and rotation, it is not fully feasible to differentiate between translational and rotational accuracy.Since the centre of origin of the SGRT system does not coincide with the centre of rotation of a person's head, a small error in rotation could cause a corresponding error in translation.The magnitude of this effect would increase with distance between the origin of the SGRT system and the centre of rotation.In its current phase of development, ease of adoption of the RtM SGRT system in a clinical setting is limited by the lack of a simple user interface and the lack of an easy to build couch mount for the surface imaging sensors.Future work will focus on improving ease of adoption and on using the RtM system to develop novel surface tracking algorithm that improve the robustness and accuracy of SGRT for H&N.We also plan to add functionalities that will allow the RtM SGRT system to be combined with fluoroscopic x-ray motion tracking modalities as recent studies have shown that using surface data in combination with x-ray imaging improves real-time tumour localisation accuracy compared to either modality alone [44].
The RtM system, a low-cost, adaptable and open-source SGRT system, lowers the barrier to entry to SGRT research and advances SGRT technology.To our knowledge, it is the only publicly available SGRT system that can capture surface images at high frequency in a clinical radiotherapy setting and has the flexibility to test new SGRT algorithms.In addition, while the focus of this work was on H&N RT applications, SpyGRT, the open-source software that was developed for the RtM SGRT system, does not depend on any H&N specific parameter, and can therefore be used to develop novel SGRT solutions and techniques for any RT treatment site.We believe that the RtM SGRT system will allow the radiation oncology community to develop SGRT applications to harness the full potential of surface imaging in improving RT treatments.While it is currently limited to rigid-motion tracking, we hope to develop the RtM system into a SGRT system capable of maintaining its accuracy in the presence of non-rigid facial deformation and that can enable safe and effective mask-free H&N RT when combined with real-time IGRT.
In conclusion, we have shown that the accuracy of the RtM SGRT meets both the AAPM and the ESTRO-ACROP international commissioning guidelines for SGRT systems.Furthermore, the RtM system is the only publicly available SGRT system that can capture the high temporal resolution surface data needed for the development of new SGRT techniques and algorithms while also allowing easy implementation and testing of novel surface tracking algorithms.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Rendition of how the RtM SGRT system would be used in the clinic showing the depth sensors position, the lack of obstruction from a C-arm linac and a surface image of a phantom captured by the RtM SGRT system colourised by distance to the camera (left) and with real-world colour (right).A typical ROI is shown superimposed on the real-world coloured image.

Fig. 2 .
Fig. 2. Depiction of the experimental setup used for the phantom measurements (a) and of the VISION study set-up (b).

Fig. 3 .Fig. 4 .
Fig. 3. Sample translational (left) and rotational (right) motion trace of a VISION participant.The ground truth motion measured by OptiTrack is shown in black, the RtM measured motion is shown in red and the RtM measured motion when the motion trace was replicated with the head phantom on the UR16 is shown in blue.The grey box at the end of each plot shows the section of the recording when the volunteer was asked to perform different facial motion.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 5 .
Fig. 5. Boxplot of the tracking error of the RtM SGRT system for each rotational and translational axes for VISION partcipants and phantom traces with initial sensor temperature overlayed.The lower, middle and upper bound of each box represents the 25th, 50th and 75th percentile of the tracking error, respectively and the whiskers length is 1.5 times the interquartile range.The Phantom S box represents the tracking error when tracking single axis motion, whereas the Phantom 1-3 boxes represents the tracking error for the head phantom on the UR16 replicating a human subject motion trace.

Fig. 6 .
Fig.6.Close up of the grey shaded portion of each plot in Fig.4showing the translational (left) and rotational (right) tracking error of the RtM system when tracking a VISION participant (red) when tracking the head phantom while the motion trace was replicated on the UR16 (blue).The vertical line indicates where one facial motion ends and the next begin.The different motions where a yawn(I), a smile(II), a head shake (III), a frown (IV) and 5 successive swallows(V).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Ben Bouchta: Conceptualization, Methodology, Software, Investigation, Formal analysis, Writingoriginal draft, Project administration.Mark Gardner: Conceptualization, Writingreview & editing.Chandrima Sengupta: Investigation, Resources, Writingreview & editing.Julia Johnson: Visualization.Paul Keall: Supervision, Conceptualization, Funding acquisition, Writingreview & editing.