Projects

Platform Starter Kit

Terraform + Kubernetes + ArgoCD + OPA Gatekeeper + Veeam Kasten

Problem: Platform teams provisioning Kubernetes clusters manually end up with snowflake environments, inconsistent RBAC, no backup governance, and configuration drift between dev and prod.

Approach: Built a production-ready starter kit that provisions AKS clusters via Terraform with ArgoCD handling all post-cluster configuration as GitOps. Every change, from namespace onboarding to backup policy, flows through Git. OPA/Gatekeeper enforces guardrails at the admission layer so teams can self-serve without bypassing security.

  • Terraform modules for AKS with configurable node pools, networking, RBAC, and remote state on Azure Storage; separate dev/prod environment configs
  • ArgoCD GitOps delivery with automated sync, self-heal for drift detection, and prune for removed resources
  • OPA/Gatekeeper policy requiring StatefulSets to carry a backup label, enforcing backup governance at admission
  • Kasten K10 backup with daily snapshots, 7/4/3 GFS retention, automated export to Azure Blob via managed identity
  • Pod Security Standards (restricted), default-deny NetworkPolicies, Gateway API routing, Kustomize-based team onboarding with namespace isolation, quotas, and LimitRanges
Terraform Kubernetes ArgoCD OPA Gatekeeper Helm Kasten K10 Kustomize

UpDog Monitor

FastAPI + Prometheus + Grafana + React/TypeScript

Problem: Simple uptime checks tell you a service is down but not whether you're burning through your error budget. Teams need SLO-driven monitoring with real-time visibility into availability and latency against defined targets.

Approach: Built a full-stack monitoring platform with an SLO engine that computes availability (99.5% target) and latency (p95 < 500ms) from Prometheus metrics. Error budget burn rate tracking surfaces problems before SLO breach, not after. The entire stack deploys to Kubernetes with CI that includes container image scanning.

  • FastAPI backend with PostgreSQL, React/TypeScript frontend, and background worker performing health checks on configurable intervals
  • Custom Prometheus instrumentation, histograms, counters, gauges, with Grafana dashboards provisioned via JSON
  • Alerting rules for SLO breaches and burn rate using histogram_quantile and rate queries
  • CI/CD via GitHub Actions: lint, test, build, push to GHCR, and Trivy container image scanning
FastAPI React TypeScript Prometheus Grafana Docker SLO/SLI GitHub Actions

Other Projects

Roll Call

Full-Stack Web Application

  • Community web app for organizing and discovering tabletop gaming events. User authentication, event management, and group discovery.
Full-Stack Authentication
Blog