Atharva K

About

I'm an aspiring software professional with a diverse set of technical skill set spanning cloud platforms, machine learning, and software development. My cloud expertise includes AWS Cloud Foundations certification, comprehensive training through the AWS Graduate Academy, and proficiency in Azure cloud services through different Azure courses. While I might be considered fresher to professional software development, I've built a solid foundation in Python programming, machine learning concepts, and data analysis. Through self-study and hands-on projects, I've developed the skills needed to contribute effectively while continuing to learn and grow. I'm now eager to bring my technical knowledge and enthusiasm for innovation to a professional role where I can make meaningful contributions to challenging projects.

psychology

ML / AI

cloud_sync

CLOUD

Microsoft Azure Services | Amazon Web Services | Docker | Git |

data_exploration

Data Science

Projects

View some of my latest projects

Handwritten Equation Solver using HOG & SVM

Developed a machine learning pipeline that deciphers handwritten numerical equations and computes their results. The system combines advanced image segmentation techniques such as adaptive thresholding and morphological operations to effectively isolate individual characters. For feature extraction, it leverages Histogram of Oriented Gradients (HOG) to capture critical shape and texture details, and these features are then classified using a Support Vector Machine (SVM) with a polynomial kernel. I employed grid search to systematically tune the Hyperparameters for both.

Machine Learning

NLP

Computer Vision

Image Segmentation

HOG (Histogram of Gradients)

SVM (Support Vector Machines)

Hyperparameter Tuning

Data Science

PyTorch

Feature Engineering

Polynomial kernel

code Github Repository

AWS Hosted Portfolio

Built and deployed a personal portfolio application hosted on AWS, leveraging an end-to-end cloud infrastructure for scalability and reliability. The deployment includes AWS CloudFront as a Content Delivery Network (CDN) for faster global content delivery and EC2 instances for hosting the application. A custom VPC configuration ensures secure network isolation, while tailored security group policies manage access control. The site uses a custom domain from Namecheap, with SSL/TLS encryption configured through AWS Certificate Manager to ensure secure communication.

Django

CDN

AWS Cloudfront

EC2

VPC

SSL/TLS

Domain Integration

Automated Deployment Pipeline

DNS Management

Bootstrap 5

Back-End Development

Cloud Architecture

code Github Repository

AI Financial Analyst

Developed an AI financial chatbot that leverages a fine-tuned LLaMA-3 8B model to provide personalized investment recommendations and financial insights. The system integrates real-time market data from Yahoo Finance and NewsAPI, combining quantitative metrics with sentiment analysis of financial news to deliver actionable advice. Fine-tuned using QLoRA - Quantized Low-Rank Adaptation. Key features include contextual financial Q&A, sentiment evaluation, and trend analysis.

LLM

Large Language Model

QLoRA

Yfinance

Huggingface Transformer

PEFT

API integration

code Github Repository

Volatility Forecasting Using GARCH Model

Created a financial forecasting model to predict market volatility using the GARCH(1,1) framework on 13 years of S&P 500 historical data. The model achieved an impressive RMSE of 0.0058, demonstrating high accuracy in volatility prediction. Implemented a rolling-window backtesting framework to validate the model's performance and compute Value at Risk (VaR) at a 95% confidence level, yielding a VaR of 0.0205 for effective risk assessment.

Machine Learning

Financial Risk Modeling

Volatility Analysis

GARCH

Time Series Analysis

Rolling-Window Backtesting

Risk Assessment

Quantitative Finance

Python

Statistical Modeling

code Github Repository

Evaluate Student Summaries

Engineered an NLP pipeline to predict the quality of student summaries, achieving an MSE of 0.21 using a Random Forest model. The system extracts deep linguistic features by analyzing patterns, syntactic structures, and semantic coherence to understand the nuances of written content. In addition, a chi-square statistical analysis module was incorporated to pinpoint the discriminative vocabulary between high- and low-quality essays, providing actionable insights for educational feedback.

Machine Learning

NLP

Feature Engineering

Python

Feature Extraction

Chi-Square Analysis

Linguistic Analysis

code Github Repository

Bim-Viewer

Developed a concise 3D BIM visualization engine that loads and renders Industry Foundation Classes (IFC) files with precision. Utilizing a modular Python codebase, the application integrates the PyQt5 GUI framework with an OpenGL rendering pipeline and leverages IfcOpenShell for accurate BIM file interpretation. The viewer features an interactive navigation system with smooth pan, zoom, and rotation capabilities enabled by PyQt5 event handling, allowing users to seamlessly explore and analyze detailed building models.

BIM Visualization

IfcOpenShell

OpenGL

PyQt5

Python

code Github Repository

Movie-Data-Analysis

This project involves the development of a comprehensive movie data analysis dashboard, focusing on trends and insights from movies released between 2021 and 2023. Data was collected using the OMDB API and stored in a PostgreSQL database, with SQLAlchemy used for database integration. The analysis pipeline utilized Pandas for data cleaning and transformation, followed by detailed exploratory data analysis (EDA) to uncover patterns in movie ratings, view counts, and title word frequencies.

Python

Pandas

Data Visualization

Exploratory Data Analysis-EDA

SQLAlchemy

PostgreSQL

Database integration

API integration

Seaborn

Matplotlib

Statistical Analysis

Trend Analysis

code Github Repository

Dynamic Embedding Model for Retrieval-Augmented Generation (RAG)

This project builds a dynamic embedding pipeline that intelligently classifies incoming queries with a Random Forest model trained on custom datasetand then selects the optimal domain-specific embedding model. It re-embeds both queries and documents, indexes them in an in-memory vector database using ChromaDB, and leverages cosine similarity for effective retrieval. The approach ensures that context-rich prompts are constructed for language model generation, optimizing semantic relevance and improving retrieval accuracy in a robust retrieval-augmented generation (RAG) system.

Machine Learning

NLP

PyTorch

LLM

Huggingface Transformer

Python

RAG

Embeddings

Chromadb

Dynamic Embedding

code Github Repository

Hi! I'm Atharva Kulkarni

Computer Engineering Graduate at Stonybrook University,NY

About

ML / AI

CLOUD

Data Science

Projects

Handwritten Equation Solver using HOG & SVM

AWS Hosted Portfolio

AI Financial Analyst

Volatility Forecasting Using GARCH Model

Evaluate Student Summaries

Bim-Viewer

Movie-Data-Analysis

Dynamic Embedding Model for Retrieval-Augmented Generation (RAG)