Program Description

  • The Programme aims to teach the fundamentals of deep learning and applications of deep learning models for AI tasks related to text processing and image and video processing.  
  • The Programme will help develop capabilities to build deep learning models for real-world problems.

Learning Format



Duration 150 Hours covered across 10 months (Includes 10 Hours of Examination)

Program Fee

₹INR 2,00,000 + GST (inclusive of tax)


Program Description

Program Brochure

Sample Certificate

Education Qualification

B.E., B.Tech., MCA, or MSc (CS or IT) Graduates as on the Programme Start date i.e., Technical Orientation Date

Suggested Prerequisites

Candidate should have a basic understanding of mathematic topics such as Linear Algebra, Calculus, Probability and Statistics; Familiar with programming in either C or Python

Lead Faculty

Programme Coordinator - Prof. C. Chandra Sekhar
Professor, Department of Computer Science & Engineering, IITM

Prof. Sekhar’s expertise and research interests include speech recognition,neural networks, kernel methods, machine learning, deep learning and metric learning. He is the author of many research papers that have been published in peer reviewed, national and international journals and conferences. In 2016, he was the recipient of the coveted Srimathi Marti Annapurna Gurunath Award for Excellence in Teaching from IITM.

Programme Faculty – Dr. Dileep A.D.
Associate Professor, School of computing and electrical engineering, IIT Mandi

Received his M. Tech and PhD in computer science and engineering from IIT Madras His research interest includes pattern recognition, Kernel methods of Pattern Analysis, Machine learning for speech technology, Computer vision, Cloud and telecom networks.In 2020 he was the recipient of Teaching Honour Roll Award for Excellence in Teaching during the academic year 2019-20 at IIT Mandi.

Course Offered By

Learning Schedule

Motivation for Programme, Overview of Programme, Expected Outcomes of Programme

Motivation for Programme, Overview of Programme, Expected Outcomes of Programme

Function approximation (Regression), Classification, Clustering, Ranking, Information retrieval Text processing applications - Text classification, Parts-of-speech tagging, Named entity recognition, Text summarization, Text question answering, Machine translation;            Image and video processing applications:  Image classification, Image annotation, Image captioning, Video classification, Video captioning, Visual question answering, Visual common sense reasoning;Speech processing applications - Speech recognition, Speaker recognition, Speech emotion recognition, Spoken language recognition, Text-to-speech synthesis, Speech-to-speech translation;            
Data representation - Feature extraction, Representation learning, Embeddings

Supervised learning, Unsupervised learning, Semi-supervised learning, Active learning, Self-supervised learning, Transfer learning, Domain adaptation - Zero-shot, One-shot and Few-shot learning; Federated learning

Review of basics of mathematical topics Linear Algebra: Vectors and Matrices, Inner product of vectors, Matric multiplication by a vector, Eigen values and vectors of a matrix, Singular value decomposition of a matrix.
Calculus: Differentiation with one variable, Differentiation with multiple variables, Differentiation of a vector and a matrix, Unconstrained optimization problem solving.
Probability and Statistics: Random variables, First order and second order statistics, independent variables, Uncorrelated variables, Sum and Product Laws of probability, Probability distributions, Bayes rule.

Linear model for regression, Supervised learning, Parameter estimation - Maximum likelihood method, Overfitting, Regularization, Ridge regression, Lasso

K-nearest neighbours’ classifier, Bayes classifier, Normal density function, Maximum likelihood estimation, Gaussian mixture model, Naïve Bayes classifier, Decision surfaces, Dimension reduction methods - Principal component analysis, Linear discriminant analysis

McCulloch-Pitts neuron, Perceptron learning rule, Perceptron convergence theorem, Sigmoidal neuron, Softmax function, Multilayer feedforward neural network, Error backpropagation method, Gradient descent method, Stochastic gradient descent method, Stopping criteria, Logistic regression based classifier

Deep feedforward neural networks (DFNNs), Optimization methods: Generalized delta rule, AdaGrad, RMSProp, Adadelta, AdaM, Second order methods;  Regularization methods: Dropout, Dropconnect; Batch normalization

Basic CNN architecture, Rectilinear Unit (ReLU), 2-D Deep CNNs: LeNet, AlexNet, VGGNet, GoogLeNet, ResNet; Image classification using 2-D CNNs; 3-D CNN for video classification; 1-D CNN for text and audio processing; Vector of Linearly Aggregated Descriptors (VLAD) method for aggregation - NetVLAD

Architecture of an RNN, Unfolding an RNN, Backpropagation through time, Vanishing and exploding gradient problems in RNNs, Long short term memory (LSTM) units, Gated recurrent units, Bidirectional RNNs, Deep RNNs

Encoder-decoder paradigm, Image and video captioning models, Machine translation, Text processing models, Representation of words: Word2Vec, GloVe

Attention based models, Scale dot product attention, Multi-head attention (MHA), Self-attention MHA,  Cross-attention MHA, Position encoding, Encoder Decoder module in a transformer, Sequence to sequence mapping using transformer, Machine translation using transformer model, Vision transformer for image classification, Video captioning using transformer model, Bidirectional encoder representations from transformers (BERT) model for text processing, Pre-training a BERT model, Fine tuning a BERT model for text processing tasks, Vision-and-Language BERT (ViLBERT) for image and video processing tasks, Text and Visual question answering and reasoning using transformer models

Image generation models, Architecture and training of a GAN, Deep convolutional GAN, Cyclic GAN, Conditional GAN, Super-resolution GAN, Applications of GANs for image processing

Introduction to reinforcement learning, Markov decision processes, Policy gradients, Temporal difference learning, Q-learning, Deep reinforcement learning - Deep policy gradient, Deep Q learning; Text processing using deep reinforcement learning - Text classification, Text summarization

Are you interested in this program?

Our Learning Partners

Want To Know More

Guiding Star with Our Help!

Contact Us