Available · H1B · US-based

Raman
Srivastava

Senior DevOps & Cloud Engineer. I build infrastructure that holds up — 99.99% uptime, 45% faster deployments, $2M+ cost savings. Enterprise Kubernetes on Azure & AWS.

99.99%
Uptime
45%
Faster Deployments
$2M+
Annual Savings
150M+
Records / Week
Azure AKS AWS EKS Terraform GitOps ArgoCD AZ-400 AWS SAA 5+ Years

Where I've Built

AssetMark · Senior DevOps / Cloud Engineer
FEB 2024 — PRESENT
$17B wealth management platform · United States
Architected enterprise AKS platform using Terraform & GitOps (ArgoCD). Eliminated configuration drift across 15+ engineering teams; reduced cluster provisioning from hours to under 15 minutes with modular Terraform and self-service infrastructure.
Reduced deployment time by 45% via multi-stage CI/CD pipelines (GitHub Actions, Azure Pipelines) with progressive delivery — canary, blue-green, and A/B deployments via Argo Rollouts. Achieved 98%+ release success rate across 40+ microservices with automated rollback on failure.
Cut security vulnerabilities by 40% embedding Checkmarx (SAST), SonarQube, Veracode (DAST), and OPA/Kyverno (policy-as-code) into CI/CD. Implemented zero-trust Istio mTLS, automated Key Vault secret rotation, and Azure Policy for compliance (SOC 2, PCI-DSS).
Sustained 99.99% platform uptime with Dynatrace, Prometheus, Grafana, ELK Stack, and OpenTelemetry. Established SLO/SLI error-budget policies, reduced MTTR by 40%. Built chaos engineering framework (LitmusChaos) for pre-release resilience validation.
Engineered zero-downtime cluster upgrades with HPA/KEDA autoscaling, optimized node pools, and taints/tolerations. Reduced monthly compute costs by 18% (~$2M annual savings) through right-sizing, spot instances, and FinOps dashboards.
Architected enterprise event-driven platform on AKS using Apache Kafka for real-time data streaming. Tuned Kafka clusters (partition optimization, consumer group autoscaling) achieving 35% throughput increase and 20% latency reduction.
Automated PostgreSQL provisioning, schema migrations (Alembic), backup validation, and failover testing via Terraform. Built Python/Shell automation for drift detection, cost analysis, and resource cleanup, saving 6+ engineering hours weekly.
Delivered AI-powered exception analysis platform (RemediAI) — LangGraph AI agents, Azure Service Bus, PostgreSQL, FastAPI, React dashboard. Reduced .NET exception debugging time via intelligent stack trace analysis with LLM-based remediation suggestions.
Azure AKSTerraformArgoCDIstioKafkaPrometheusGrafanaAzure DevOpsGitHub ActionsPythonPostgreSQLDynatraceCheckmarxKEDALangGraph
Genpact · DevOps / Cloud Engineer
FEB 2019 — JAN 2022
Pharmaceutical enterprise · India
Led full legacy-to-AWS cloud migration of pharmaceutical data platform — migrated 50+ applications using Amazon EMR, AWS Glue, Redshift, S3, and VPC. Reduced operational costs by 40% while scaling to 150M+ records/week at 98% data accuracy.
Architected containerized big data platform on Amazon EKS with auto-scaling, multi-AZ HA, and disaster recovery for pharmaceutical analytics. Designed scalable ingestion pipelines from 80+ data sources processing 100+ TB into S3 data lake and Redshift warehouse.
Built 80+ automated data quality checks using Python, AWS Lambda, and AWS Glue — null detection, sign validation, schema validation, outlier detection. Prevented $2M+ in quarterly losses from bad incentive payouts.
Implemented end-to-end CI/CD pipelines with GitLab CI, GitHub Actions, and Jenkins for automated build/test/deploy of data workflows. Cut release cycles by 40%, increased deployment frequency 3x across teams.
Enhanced pipeline observability with CloudWatch dashboards, SNS alerting, custom log aggregation, and Glue job metrics. Automated SSIS-based ETL workloads using SQL Agent, Jenkins, and containerized execution on EC2/ECS, improving team productivity by 15%.
Led L2/L3 on-call production support for AWS workloads: incident logging, root cause analysis, hot-fix coordination, and SLA-driven recovery for pharmaceutical analytics serving 15+ business teams.
AWS EKSAWS GlueEMRRedshiftS3TerraformGitLab CIJenkinsPythonApache SparkCloudWatchLambda

Open Source & Side Projects

AKS Enterprise Platform
Production-grade multi-tenant AKS with Terraform modules, ArgoCD GitOps, Istio service mesh, Azure Policy, and full observability stack.
TerraformAKSArgoCDIstioPrometheus
🤖
RemediAI
AI-powered .NET exception analysis platform using LangGraph agents, Azure Service Bus, PostgreSQL on AKS. LLM-based stack trace remediation.
PythonLangGraphAzureAKSFastAPI
📊
Real-Time Stock Market ETL
Streaming data pipeline: Kafka ingestion, AWS Glue transformation, S3/Athena querying, Power BI visualization. Full real-time data engineering stack.
KafkaAWS GlueAthenaPower BITerraform
🏛
Three-Tier Architecture on EKS
12 microservice three-tier deployment on AWS EKS demonstrating web, API, and database layer separation with Terraform IaC.
AWS EKSTerraformKubernetesDocker
🏗
Terraform Modules
Reusable multi-cloud Terraform modules: VPC/VNet, AKS/EKS clusters, databases, monitoring, security baselines. Production tested.
TerraformAzureAWSHelm
🚀
Progressive Delivery Pipeline
Canary, blue-green, and A/B deployments with Argo Rollouts, Istio traffic shifting, Prometheus analysis, and automated rollback.
Argo RolloutsIstioPrometheusArgoCD
📈
SRE Observability Platform
Full-stack observability: Prometheus/Grafana, ELK, OpenTelemetry, Dynatrace with SLO-driven alerting and PagerDuty integration.
PrometheusGrafanaELKOpenTelemetryDynatrace
🏠
ML Price Prediction
End-to-end ML pipeline on AWS: data preprocessing, feature engineering, scikit-learn modeling, and visualization dashboards.
PythonSparkscikit-learnAWS S3
🔥
Chaos Engineering
LitmusChaos experiments for pod, network, node, and DNS failure injection on AKS/EKS with resilience scoring.
LitmusChaosAKSPrometheusGrafana
🔒
SonarQube + Jenkins DevSecOps
Integrated SAST pipeline with SonarQube, Jenkins, and automated quality gates for continuous code security scanning.
JenkinsSonarQubeCheckmarxDocker
Stock Market Kafka Streaming
Apache Kafka real-time streaming pipeline with producers, consumers, stream processing, and storage integration.
KafkaPythonStream Processing
🎤
Text to Voice
Python text-to-speech application with multiple voice engines and output format support for accessibility use cases.
PythonTTSCLI
🎮
Server Survival
Cloud architecture tower defense game. Defend servers from cyber threats — cloud gamification project.
JavaScriptHTML5Cloud Concepts
📄
Resume Maker
Web-based resume builder with customizable templates, export to PDF, and ATS-optimized formatting engine.
JavaScriptHTML/CSSPDF
🌐
Portfolio Site
Personal portfolio and resume site hosted on GitHub Pages. Dark-themed, responsive, with ATS resume downloads.
HTML/CSSJSGitHub Pages

Technologies I Work With

☁️ Cloud Platforms

Azure AKSAWS EKSGCP GKEAzure DevOpsAzure MonitorAWS LambdaRedshiftKey VaultAPIMCosmos DBCloudFront

⚙️ Infrastructure as Code

TerraformHelmArgoCDAnsibleARM TemplatesBicepCloudFormationKustomizeTerragrunt

🚀 CI/CD & GitOps

GitHub ActionsGitLab CIJenkinsAzure PipelinesArgo RolloutsFluxCanaryBlue-GreenSpinnaker

🔒 DevSecOps

CheckmarxSonarQubeVeracodeSnykTrivyOPAKyvernoIstio mTLSZero TrustAzure PolicySOC 2

📊 Observability & SRE

PrometheusGrafanaDynatraceELK StackOpenTelemetryJaegerTempoCloudWatchPagerDutySLO/SLIChaos Engineering

🗄️ Data & Streaming

Apache KafkaDatabricksPySparkAWS GlueEMRAirflowKinesisAthenaRedshift

💻 Languages

PythonBashPowerShellGoSQLYAMLHCL

🗃️ Databases

PostgreSQLRedisCosmos DBDynamoDBRDSAzure SQLElasticsearch

Verified Credentials

🔷
DevOps Engineer Expert
Microsoft AZ-400 · Exp Apr 2027
🔷
Azure Developer Associate
Microsoft AZ-204
🟠
Solutions Architect Associate
AWS SAA · Exp Apr 2027
🔴
Lakehouse Fundamentals
Databricks

Let's build
something reliable.

Open to Senior DevOps, SRE, Cloud, and Platform Engineering roles. H1B transfer welcome.

Send a message →