Visual Learning and Recognition (Deep Learning for Computer Vision)

● Trained multi-label image classification models for FashionMNIST and PASCAL 2007 datasets.
● Worked on weakly supervised object detection using Robust AlexNet backbone on Pascal dataset and obtained comparable mAP to the supervised networks.
● Deployed GAN network architectures which include LSGAN and WGAN-GP on CUB 2011 Dataset to generate realistic images of birds.
● Worked on open-ended Visual Question Answering for MSCOCO VQA dataset using simple Bag of Words baseline with GoogleNet feature extractor and Co-attention networks.

Learning Outcomes:

  1. Visualizing and Understanding Neural Nets.
  2. Basics of Image Segmentation, Object Detection and 3D Image Understanding.
  3. Generative Models - GANs, Autoencoders and VAEs.
  4. Few-Shot and Transfer Learning.
  5. Action Recognition and Videos.
  6. Combining Vision and Language models - VQA.

Programming Language: Pytorch, Python