Abstract: Vision-Language Models (VLMs) have advanced cross-modal understanding and generation, yet their domain adaptability remains limited. To address the lack of high-quality captions for fish ...
Abstract: Surgical phase recognition (SPR) is essential for surgical workflow analysis and provides immediate guidance during procedures. Existing methods aggregate frame-level information into a ...