Gaze based Video Summarization

Created scripts for the paper Gaze-enabled Egocentric Video Summarization via Constrained Submodular Maximization from scratch. Tested out the results in videoes taken from CMU using AR glasses

Topic Covered:

  1. Basics of CLIP Models and Multimodal ML.
  2. Backbone - MRCNN.
  3. Optimization Techniques - Submodular Optimization, Mutual Information.
  4. Clustering Algorithm - KMeans, Greedy, Temporal Star clustering.

Programming Language: Python, Pytorch