Research


Research Interests

I am mainly interested in low complexity, resource constrained Machine Learning.
Below is a list of my projects. For a more up to date and extensive list of projects, please check out my CV.


Accumulation Bit-Width Scaling for Ultra-Low Precision Training of Deep Neural Networks

In the Summer and Fall of 2018, I interned at IBM T.J. Watson. My fantastic IBM colleagues and I worked together on several interesting problems including reduced precision training of deep neural networks and distributed learning algorithms. In particular, under the scope of deep learning with reduced precision floating-point arithmetic, one breakthrough was achieved. We developed a theoretical framework able to predict accumulation bit-width (in the mantissa sense) requirements for all three deep learning GEMMs. A paper on this topic was published in ICLR 2019 (posted on OpenReview and arXiv - my own PDF and Poster).


Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm

Many works (including our own) have focused on reducing the complexity of neural networks at inference time via precision optimization. However, the much harder problem of reduced precision training remains largely unresolved. In this work, I analyzed and determined precision requirements for training neural networks when all tensors, including back-propagated signals and weight accumulators, are quantized to fixed-point format. This project culminated in a published paper in ICLR 2019 (posted on OpenReview and arXiv - my own PDF and Poster). The code needed to generate these results is available in the Codes page of this website which will redirect you to my GitHub profile.


Numerical Precision of Deep Neural Networks

Much of the research in the machine learning community focuses on enhancing accuracy and functionality with minimal consideration of energy costs. Only in 2015-2016, did papers start to appear in machine learning conferences on determining precision requirements of neural networks. In this work I obtained theoretical guarantees on the minimum precision requirements of neural networks. The goal of the project is to determine precision assignments of weights and activations in an analytically sound manner, and while reducing the need to run lengthy simulations. A paper on the topic was published in ICML 2017 (PMLR version + personal version with supplementary material in same PDF). A follow up work on this topic with a fine-grained analysis and improved empirical results was published in ICASSP 2018 (PDF). The code needed to generate these results and evaluate the costs is available in the Codes page of this website which will redirect you to my GitHub profile.


Gradient Based Learning with Binary Activations

In the summer of 2017, I had an internship at the IBM T.J. Watson Research Center. The internship focused on on the topic of training deep neural networks with limited precision. Specifically, I worked on a method to use gradient based learning for binary activated networks. This work was published in ICASSP 2018 (PDF).


Fixed-point Hyperplane Classifiers

This project is the precursor to the one above on numerical precision of deep neural networks. This research seeks to bring rigor to the design of fixed-point learning systems which is currently being done using trial and error. Specifically, we characterized the precision to accuracy trade-off of support vector machines (SVM). We came up with several bounds that analytically predict the precision requirements of fixed-point SVM for both classification and training using the Stochastic Gradient Descent (SGD). A paper about this topic appeared in ICASSP 2017 - check it out here. The extended version is published as a JETCAS paper.


PredictiveNet

This project proposes a simple but highly efficient architectural idea to reduce the computational cost of Convolutional Neural Networks (CNN). The idea, pitched by my collaborator Yingyan Lin, is to decompose the computation at each convolutional layer into MSB and LSB parts. If the result of the MSB part of some output is negative, it is highly likely that the overall output is negative itself. In such a case, the residual processing is by-passed (clock-gated). My contribution in this project was an analytical validation of the technique. One paper about this project was accepted in ISCAS 2017 and can be found here.


Compute Sensor

This is the work of my previous collaborator, Dr. Sai Zhang. The idea is to bring computation to the bitlines and cross-bitlines of a sensory array using mixed-signal techniques. My contribution was setting up the algorithm and validation dataset as well as post layout verification. Check out our arXiv paper on the topic.