Projects

2021

Viral Tweets Prediction Challenge (ongoing competition): Utilizing Machine Learning techniques including Random Forests, PCA etc., to classify which Tweets go viral based on the data of the user and the tweets. Currently ranked #19 on the Learderboard.
Hot spot and Hot cell analysis: Identified 50 most significant hot zones within New York City using Spark SQL module of Apache Spark Cluster by performing Hot Zone and Hot Cell Analysis on NYC Taxi Data. Read data from HDFS and performed operations, computations using GeoSpark APIs.
Lazy Text Predict: Programmed the first prototype of lazy-text-predict, an open-source library for easily training and evaluating multiple Deep Learning based text classification methods. Currently developing the Continuous integration and testing pipeline for future development.

2020

Data Driven Disaster Response: Designed an interactive D3.js based dashboard for visualizing the social media data of a city to aid the disaster response during a natural disaster. Categorized the social media messages into resource categories using statistical metrics and Latent Dirichlet Allocation (LDA) and applied rule-based sentiment analysis using NLTK. Developed a set of interconnected visualizations including - line-charts, pie-charts, heat-maps, etc. to view the frequency of a resource need or a particular emotion in any part of the city during any time.
Meal Detection using CGM data: Trained a Random Forest model on Continuous Glucose Monitoring (CGM) time-series sensor data to automatically classify when a diabetic patient eats a meal, achieving an accuracy of 94%. Preprocessed the data and extracted meaningful temporal and frequency-based features from the CGM time-series data. Applied K-means and DBSCAN algorithms to bin the CGM data into clusters for further analysis by doctors.
Polling accuracy for US elections: A D3 visualization showing the arguments for and against trusting the election polls. I created two visualizations about a dataset that frame the data with opposite narratives. The idea was to make it like a debate: one team argues the affirmative position, while the other argues the negative.
Burrito Visualization: The visualization uses minimal icons/representations of burrito ingridients to show the user which particular ratings is being viewed. This visualization is made for some minimal artistic expression. The goal is to communicate certain aspects of the data effectively and in a creative manner, as opposed to supporting in-depth analysis such as might be done by domain experts.
Tweet Sentiment extraction (Kaggle competition): Fine-tuned several BERT models and other NLP transformers like XLNet, RoBERTa in a question-answer format to extract the substring of a given tweet which represents the given emotion of the tweet. Achieved a Jaccard similarity score of 0.687, better than around 600 other submissions. Programmed in Python and utilized PyTorch, transformers, Pandas, and Jupyter notebooks.
Model Integration with Mixture-of-Experts: Researched and studied various methods for integrating machine learning models together based on their similarity. Programmed a proof-of-concept machine learning model which could integrate three separate Convolutional Neural Networks (CNNs) (the experts) trained to classify different classes within the MNIST dataset, to showcase the feasibility of a dynamic machine learning system which can be expanded or contracted based on the use case.
Redesign of venezias.com: Analyzed the multiple design flaws in the User interface (UI) and the overall functionality of the venezias.com website. using heuristic evaluation and cognitive walkthrough. Redesigned several elements of the homepage, menu page, and the coupons page to make the UI reliable and easier for the user to navigate. Utilized the Axure RP tool to create the prototype.

2019

Semantic Text Similarity (STS) on Biomedical text: Preprocessed the clinical text to remove stop words, punctuation etc. and utilized various word2vec pre-trained models to extract token embeddings in order to create a single vector representation for each sentence. Fine-tuned multiple Bidirectional Encoder Representations from Transformers (BERT) models on the given STS dataset and extracted vector representation for each sentence. Engineered several similarity features based on the extracted sentence vectors and applied gradient boosting regression to achieve a Pearson correlation greater than 0.84 between the ground truth and the model’s predictions.
Machine Learning Algorithms: Implementation of some machine learning algorithms including: Logistic regression, Naive bayes, K-means and Convolutional neural network.
Text-to-face generation: Investigated and summarized various methods for facial image generation using text description of a face. Collected a dataset of text descriptions of hundreds of images from Labelled Faces in the Wild dataset and utilized word2vec to create text embeddings. Programmed a Keras implementation of StackGAN (a variation of Generative Adversarial Networks) and trained it to generate facial images using the collected dataset.

2018

Crop yield prediction based on temperature and rainfall for India: Predicted the temperature and rainfall for a set of Indian districts using Recurrent Neural Network (RNN) and it’s variation Long short-term memory (LSTM) and selected the method with the least mean absolute error. Utilized the rainfall and temperature prediction to further predict the yield of various crops in Indian districts using different methods - Linear regression, Random Forests, K- nearest neighbours (KNN) and a Feed-Forward Network; performed a comparative analysis for all the methods with Random Forests giving the least error. Used Pandas, Numpy, Scikit-learn, Keras and Matplotlib on Jupyter notebook for implementation.
Secure Soldier Monitoring System: Built a compact health and location monitoring system for soldiers in a battlefield using Raspberry Pi, Arduino and sensors to capture body temperature, heart-rate and GPS coordinates, along with a panic button and LCD to display messages. Re-engineered a blockchain prototype in Python to store AES encrypted data being transmitted from the monitoring system via GSM in an immutable and trustworthy fashion. Accepted to be published in 2019 International Conference on Signal Processing and Communication.
LuaRocks API : Refactored the core functionalities of LuaRocks commands for - listing, uninstalling and showing details of packages, searching and installing rocks from the web, opening documentation, linting the rockspec, selecting a rock-tree etc., to modularize them. Programmed a complete Application programming interface (API) to provide access to LuaRocks functionality using object-oriented design principles.
LuaRocks GUI: Designed a responsive and interactive web-based GUI using HTML, CSS, Bulma and Vue.js to give access to the LuaRocks functionality. Interfaced the GUI with the LuaRocks-API in the backend using CGILua.
Neural network compiler: Programmed a basic neural network compiler to define the layers, activation functions etc. Used Python Lex-Yacc and Keras.

2017

PyDetection and BliStick: Built a python library based on “OpenDetection” C++ library, for standardizing code written specifically for object detection. Developed an app called "BliStick" which uses the PyDetection library (served via a server) to aid visually challenged people to identify familiar/friendly faces and humanoid figures.
2 level visual encryption: Built a python tool for basic encryption for an image. Encryption is done by converting an image to string format and applying AES and then applying a modified version (2,n)Share Encryption.
Anomaly detection on Intel lab data: Applied simple moving average (SMA), Density-based spatial clustering of applications with noise (DBSCAN) and LSTM to detect anomalous readings from various sensors in the dataset. Used Pandas, Numpy, scikit-learn, Keras and Matplotlib.