What I’m Working On
Here’s how I’ve been spending my time recently. I’m working to grow as a data engineer, learn more about ML, and improve my coding skills.
LLMs: Application to Production (Databricks Academy Course)
I’m currently working through this course. It covers the basics of language models, tokenization, and word embeddings before proceeding to labs focused on finding and leveraging open source models from Hugging Face. Finally, it covers building, fine-tuning, deploying, and monitoring LLM-based applications.
Advanced Data Engineering with Databricks (Databricks Academy Course)
I recently earned the Databrick Certified Data Engineering Professional certification. This course along with my experience building data pipelines on Databricks over the past couple years prepared me for that exam. The content covered Databricks tools and features, but it also more generally covers data engineering principles and processes.
Optimizing Apache Spark on Databricks (Databricks Instructor-Led Training)
I took this course in February of 2023 with about 15 other coworkers. It covered the five most common performance problems for Spark optimization: skew, spill, shuffles, storage, and serialization. I’m actively looking for opportunities to recognize and fix these problems in my work.
Machine Learning with Python (MITx Online Graduate Course)
I took this course in the summer of 2022. It began by covering linear classifiers and moved on to regression, neural networks, clustering, and reinforcement learning. Here are the links to the course on edX and my repo.
Social Media API using FastAPI (FreeCodeCamp.org Tutorial)
I created a website to highlight some of what I’ve learned from this course and to practice building a documentation site. This is a pretty comprehensive tutorial that covers setting up a PostgreSQL database, developing the API, and deploying to multiple environments. You can also visit my GitHub repo.
SnakeSay (Real Python Code Conversation)
I heard this code conversation promoted on the Real Python Podcast. I learned that pyproject.toml is the current standard for Python packaging config and that pip installing in editable mode is the best way to ensure your relative and absolute imports work correctly while developing your project. These concepts were discussed as we built a simple command line tool that prints a little snake saying your desired message! Here is the link to my repo.
Advent of Code (Coding Challenge)
2022 was the first year I participated in the Advent of Code. My coworker started a private leaderboard, which motivated me to get involved! I achieved gold stars through day 18, excluding Days 16 and 17, on which I got silver stars. I’d like to go back and complete the remaining days (although I might need a little help from Reddit as they get more difficult!) Here’s the link to my repo.
Google Foobar (Coding Challenge)
In the summer of 2022, I received an invite to the Google Foobar challenge. I completed 6 challenges over 3 levels, but my time expired as I was working through level 4. It was still a lot of fun, and I’m glad I invested the time. Here’s the link to my repo.