Purpose: This research aims to utilize computer vision algorithms for the automated training of surgeons and the analysis of surgical footage.
Methods: Using pre-trained models we create our own dataset of 100 open surgery simulation videos with 2D hand poses. We also assess the ability of pose estimations to segment surgical videos into gestures and tool-usage segments and compare them to sensors and I3D features. Furthermore, we introduce novel surgical skill proxies stemming from domain experts’ training advice.
Results: State-of-the-art gesture segmentation accuracy on the Open Surgery Simulation dataset. The introduced surgical skill proxies presented significant differences for novices compared to experts and produced actionable feedback for improvement.
Conclusion: Pose estimations achieved comparable results to physical sensors while being remote and markerless. Surgical skill proxies that rely on hand poses proved they can be used to work towards automated training feedback.