Projects

Platform Starter Kit

Terraform + Kubernetes + ArgoCD + OPA Gatekeeper + Veeam Kasten

Problem: Platform teams provisioning Kubernetes clusters manually end up with snowflake environments, inconsistent RBAC, no backup governance, and configuration drift between dev and prod.

Approach: Built a production-ready starter kit that provisions AKS clusters via Terraform with ArgoCD handling all post-cluster configuration as GitOps. Every change, from namespace onboarding to backup policy, flows through Git. OPA/Gatekeeper enforces guardrails at the admission layer so teams can self-serve without bypassing security.

Terraform modules for AKS with configurable node pools, networking, RBAC, and remote state on Azure Storage; separate dev/prod environment configs
ArgoCD GitOps delivery with automated sync, self-heal for drift detection, and prune for removed resources
OPA/Gatekeeper policy requiring StatefulSets to carry a backup label, enforcing backup governance at admission
Kasten K10 backup with daily snapshots, 7/4/3 GFS retention, automated export to Azure Blob via managed identity
Pod Security Standards (restricted), default-deny NetworkPolicies, Gateway API routing, Kustomize-based team onboarding with namespace isolation, quotas, and LimitRanges

Terraform Kubernetes ArgoCD OPA Gatekeeper Helm Kasten K10 Kustomize

UpDog Monitor

GitHub

FastAPI + Prometheus + Grafana + React/TypeScript

Problem: Simple uptime checks tell you a service is down but not whether you're burning through your error budget. Teams need SLO-driven monitoring with real-time visibility into availability and latency against defined targets.

Approach: Built a full-stack monitoring platform with an SLO engine that computes availability (99.5% target) and latency (p95 < 500ms) from Prometheus metrics. Error budget burn rate tracking surfaces problems before SLO breach, not after. The entire stack deploys to Kubernetes with CI that includes container image scanning.

FastAPI backend with PostgreSQL, React/TypeScript frontend, and background worker performing health checks on configurable intervals
Custom Prometheus instrumentation, histograms, counters, gauges, with Grafana dashboards provisioned via JSON
Alerting rules for SLO breaches and burn rate using histogram_quantile and rate queries
CI/CD via GitHub Actions: lint, test, build, push to GHCR, and Trivy container image scanning

FastAPI React TypeScript Prometheus Grafana Docker SLO/SLI GitHub Actions

Other Projects

Roll Call

GitHub Live Site

Full-Stack Web Application

Community web app for organizing and discovering tabletop gaming events. User authentication, event management, and group discovery.

Full-Stack Authentication