About

Learn more about me


Leveraging Expertise in AI, Machine Learning and Deep Learning



  • Age: 25
  • Phone: +1 (838) 231-6344
  • Location: Albany, New York, United States
  • Degree: Masters in Data Science
  • Email: vinayvaida@gmail.com

I’m Vinay Vaida, a research-driven Data Scientist (M.S. in Data Science, SUNY Albany) who turns massive, messy datasets into business-critical results. At the New York State Department of Health, I built an anomaly-detection pipeline that scans 238 M+ records and blocks bad data in real time, and I deployed a deep-learning classifier that identifies pathogens in patient samples, cutting DNA-sequence turnaround by 95 %. I package these models in Docker/Airflow on GCP and surface insights through Tableau, reducing reporting effort by 80 % and accelerating leadership decisions. I’m eager to bring this end-to-end ML and cloud-engineering expertise to a forward-thinking team—let’s discuss how I can deliver the same impact for you. Reach me at vinayvaida@gmail.com or connect on LinkedIn.


Skills

Python 100%
R90%
C 100%
SQL 90%
Power BI / Tableau 95%
Machine Learning 90%
AWS 75%
Microsoft Azure80%
Data Mining95%
Regression Modeling 90%
Statistical Modeling 100%
Time Series Analysis 100%
Machine Vision90%
HTML / CSS 100%
Problem-Solving100%

Interests

Software Development

AI

Transformers

GPT

Visualization

Algorithms

Image Processing

Machine Vision

Resume

View My Resume

Summary

Vinay Vaida

I’m Vinay Vaida, a data scientist at the New York State Department of Health who turns complex medical data into fast, practical answers for public-health teams. In the past year, I built an automated system that scans more than 200 million health records and instantly flags mistakes, saving staff hours of manual checks. I also trained an AI model that reads ambulance and hospital notes and spots potential opioid-overdose cases with 95 %accuracy, giving officials an early warning to act. Most recently, I created a deep-learning tool that identifies harmful skin-fungus DNA in seconds, so doctors can start treatment sooner. I package all these solutions in easy-to-use dashboards and cloud apps, so decision-makers see clear insights instead of raw numbers, helping New York respond faster and more effectively to public-health threats.

Education

Masters of Science in Data Science

May 2024

University at Albany, SUNY

GPA: 3.7/4

Coursework: Topological Data Analysis, Machine Learning, Deep Learning, Applied Statistics using R, Linear Algebra, Optimization

Professional Experience

Research Scientist II

January 2024 - June 2025

New York State Department of Health, Albany

  • Division of Science and Technology
  • Predictive Modeling: Developed and implemented a K-means clustering model to detect anomalies in file submission patterns of Universal Public Health Node database, enhancing monitoring capabilities by effectively identifying irregular behaviors based on submission time, submitter, department, and payload size.
  • Data Exploration and Interpretation: Conducted rigorous data mining and feature engineering on a 238 million row UPHN database, uncovering inconsistent monthly file submission trends that improved data-driven decision-making.
  • Data Visualization: Forecasted incorrect file submissions from 2016 to 2023 with 94% accuracy using time series. Created Tableau visualizations to illustrate submission trends, enabling stakeholders to interpret anomalies and make informed, data-driven decisions.
  • Patient Claims Analysis: Analyzed over 2M patient claims with SAS PROC SQL across payers, facilities, prescriptions, and medication expenses, uncovering equity gaps in medication access and reimbursement trends by income, race, gender, and disability to inform policy and access decisions via interactive dashboards.
  • Wadsworth Clinical Research Center
  • AI-Driven Pathogen Detection: Designed and deployed AI model using DNA sequence text data (NLP) to identify harmful fungi, classifying 386+ samples in under 3 seconds with 0.98 precision and improving outbreak tracking across eight Northeast U.S. states.
  • Automated Clinical Pipeline: Engineered a Streamlit-based cloud tool on GCP using Docker and Python with a drag-and-drop GUI for automated DNA analysis, enabling non-technical users to run complex workflows easily and cutting processing time by 95%.
  • Data Warehousing: Developed Python pipeline integrating 2.95M records from multiple sources to automate monthly updates for NYS DOH's Candida auris Tableau dashboard, enabling real-time case monitoring across the Northeast U.S. states.
  • MycoSNP Pipeline: Streamlined weekly GCP pipeline runs (Java & Python) on patient samples, applying K-means and hierarchical clustering to trace Candida auris transmission and support real-time outbreak reporting with CDC.

Data Analyst

February 2021 - July 2022

Cognizant Technology Solutions, Hyderabad

  • Data Mining and Business Analytics: Mined and analyzed structured and unstructured data from various company databases including product shipment logs, manufacturer costs, and wholesale pricing, driving optimization of marketing techniques and product development.
  • Model Selection and Evaluation: Collaborated in an agile manner with a cross-functional team of 7 for model selection, testing, and evaluation. Assessed the impact of variables and features on model efficiency to formulate analysis for feature selection, parameter tuning, pricing strategy refinements, and A/B testing.
  • Custom Algorithm Development: Applied regression and multivariate data analysis using Python, SQL, and R to develop custom models and algorithms to optimize product profitability, enhancing data-driven decision-making by 20%.
  • Time Series Forecasting: Established and executed 3+ time series forecasting models to estimate future sales and shipment volumes, drawing on past trends of historical sales and logistics data and encompassing data spanning thousands of rows.

Portfolio

My Works

  • All
  • Data Engineering
  • Machine Learning
  • Statistical Analysis
  • Web
  • Dashboards
  • NLP
UberData Insights

UberData Insights: Analyzing Uber Data with Mage Pipeline and BigQuery

Analyzing Uber Data in Google Cloud Platform

  • Data Modeling : Fact and Dimensional Table
  • Mage Pipeline
  • Data Analysis
  • BigQuery
Crime Analysis in Louisville KY

Crime Analysis in Louisville KY

  • R
  • Feature Engineering
  • Time Series Analysis
  • Serious and Violent Crime Model
  • SARIMA Forecasting

Crime Analysis in Louisville KY

TweetFlow

TweetFlow - Automating Twitter Data Pipeline with Apache Airflow

TweetFlow - Automating Twitter Data Pipeline with Apache Airflow

  • Amazon Web Services
  • Twitter API
  • Data Pipeline
  • Directed Acyclic Graph
Predictive Modeling of House Prices

Predictive Modeling of House Prices

House Price Prediction

  • Python
  • Statistical Analysis
  • Feature Engineering
  • Linear Regression
Airline Tweets Sentiment Analysis
Airline Tweets Sentiment Analysis

Airline Tweets Sentiment Analysis

  • Python
  • Data Augmentation
  • Logistic Regression
  • SVM
Solar Radiation Prediction

Solar Radiation Prediction

Solar Radiation Prediction

  • Linear Regression
  • K-means clustering
  • Principal Componet Analysis
  • Decision Tree
Face Mask Detection

Face Mask Detection

Face Mask Detection

  • Data Augumentation
  • Tensor Flow
  • Convolutional Neural Network
  • OPEN CV
Financial Forecasting and Trend Analysis

Financial Forecasting and Trend Analysis

Financial Forecasting and Trend Analysis

  • Financial Analysis of Stocks
  • Trend Analysis
  • Time series forecasting with ARIMA
Sentiment Analysis of Elon Musk Tweet

Sentiment Analysis of Elon Musk Tweet

Sentiment Analysis of Elon Musk Tweet

  • Python
  • Text Processing
  • Natural Language Processing
Power BI Dashboards

Power BI Dashboards

Power BI Dashboards

  • FaceBook Ad Camapign Analysis
  • Financial Customer Analysis
  • Road Accident Analysis
  • Sales Performance Dashboard
Trip Package Automation

Trip Package Automation

Trip Package Automation

  • Java
  • Web Development
  • Automation
  • Apache POI
Calculate Trip Cost

Calculate Trip Cost

Calculate Trip Cost

  • Java
  • Automating Web Browser
  • TestNG
  • Jenkins

Contact

Contact Me

Address

New York, United States

Social Profiles

Mobile

+1 (838) 231-6344

Loading
Your message has been sent. Thank you!