The following is a selection of projects I’ve worked on over the past few years to give you an idea of my interests as an ML engineer and what sort of work I’m passionate about. At the moment, I’m in search of full-time work as a mid-career ML engineer where I would help build data-driven software alongside other talented engineers. At the moment, I’m looking for full-time roles outside of the defense industry.
Topic Search (2022)
Technologies: Python, SpaCy,
Pandas, Apache
Kafka, Protobuf, Docker
Compose
I was contracted by a stealth-mode social media startup to build out a topic
modeling service that would power a rewards system which would help monitor and
encourage discussion related to current events, food, helpful advice, and humor.
I curated a labelled dataset out of 6 years of reddit data and then used to SpaCy to build a supervised topic classification pipeline that could be deployed in their production environment.
To test my codebase, I replicated all of the infrastructure they were using in production on my own server. This involved a mixture of setting up Confluent’s Schema Registry, Apache Kafka, as well as using GitLab’s CI/CD for automated generation of the protobuf Python packages hosted in GitLab’s Package Registry for passing data between my code and the rest of the platform.
In the end I achieved an a validation AUC ROC of >.93 across all topics, and was able to run inference more than 44,000+ median-sized posts per second on the e2-micro instance provided by GCP, effectively keeping the entire service within the free quota provided to a standard account on Google Cloud Platform.
MotionX (2020-2022)
Technologies: C++17, OpenCV,
FFmpeg,
OpenVINO,
DeepStream SDK
While working as a contractor at
Cloudastructure, I was tasked with leading
the rewrite of the code they deployed at the edge. I designed a computer vision
pipeline that could do motion sensing similar to the functionality provided by
the Motion project. This was eventually
extended to include a low-latency notification system that could send SMS and
Email notifications when people where detected on the camera feed. All of this
was deployed on small form factor (SFF) PCs that did not have dedicated GPUs.
I also lead the research into transitioning our edge platform to the Nvidia Jetson platform, and went so far as to demonstrate a proof-of-concept powered by the DeepStream SDK
Koi No Yokan (2021)
Technologies: Python, NumPy,
Pandas, FastAPI
Koi No Yokan is a search engine for anime. I’ve collected a a corpus of reviews
compirising of 43,020 different shows and movies, and made it possible to search
for shows using a simple TF-IDF
search.
Then, I took it a step further and built a recommender system driven by the public data on MyAnimeList. By providing the username of an account on the website, I collect the publicly available watch list of that account and perform a custom-written query on a computed graph of recommendations over their corpus of shows.
This work was heavily inspired by previous search engine I developed for the corpus of arXiv preprint metadata which was powered by non-negative matrix factorization. More on that in this blog post