Site Reliability Engineer Sre Apply
Position: Site Reliability Engineer (SRE)
Location: Richmond, VA or Plano, TX
Work Model: Hybrid 3 days onsite per week
Duration: Long term contract
Job Summary:
We are seeking an experienced Site Reliability Engineer (SRE) to support cloud-native platforms and production systems for a large enterprise environment. This role will focus on ensuring high availability, reliability, performance, and scalability of mission-critical applications running on AWS.
Strong Preference: Former Capital One engineers. Candidates must be able to provide verifiable Capital One credentials and be eligible for rehire.
Key Responsibilities:
- Design, build, and maintain highly reliable, scalable, and resilient systems in AWS
- Monitor system health, performance, and availability using SRE best practices
- Implement automation to reduce manual operational work
- Troubleshoot production incidents and perform root cause analysis (RCA)
- Develop and maintain scripts and tools to improve system reliability and efficiency
- Partner with application development, platform, and infrastructure teams
- Support on-call rotations and incident response as required
- Enforce operational excellence, security, and compliance standards
Required Skills & Qualifications:
- Former Capital One experience HIGHLY preferred
- Must provide credentials for rehire eligibility verification
- Strong hands-on experience with AWS (EC2, EKS, Lambda, CloudWatch, IAM, etc.)
- Python scripting experience strongly preferred
- Bash or Shell scripting experience will also be considered
- Experience with Linux-based systems and troubleshooting
- Understanding of SRE concepts: SLIs, SLOs, error budgets, monitoring, and alerting
- Experience supporting production environments at scale
Preferred Qualifications:
- Experience with CI/CD pipelines
- Infrastructure as Code (Terraform, CloudFormation)
- Containerization and orchestration (Docker, Kubernetes)
- Observability tools (Prometheus, Grafana, Datadog, CloudWatch)
- Experience working in highly regulated enterprise environments

