image
  • Snapboard
  • Activity
  • Reports
  • Campaign
Welcome ,
loadingbar
Loading, Please wait..!!

Principal Devops Sre Engineer

  • ... Posted on: Oct 28, 2024
  • ... Tekshapers Inc
  • ... Troy, Michigan
  • ... Salary: Not Available
  • ... Full-time

Principal Devops Sre Engineer   

Job Title :

Principal Devops Sre Engineer

Job Type :

Full-time

Job Location :

Troy Michigan United States

Remote :

Yes

Jobcon Logo Job Description :

Employment type Full time

Job Title Principal DevOps/SRE Engineer Application-Centric Observability

Job Location 100% Remote (We consider candidates from US as well as CANADA)

Duration Full time

Experience level -11-15+ years

Mandatory Skills DevOps/SRE, observability tools, Datadog, metrics collection, alerting, dashboard creation, SLI/SLO frameworks, Python, Bash, etc.

Job Description

No Client interview (L1, L2 & Final round)

Note: candidates need to work on CST hours

Position Overview:

The Principal DevOps/SRE Engineer will lead the development and implementation of a cutting-edge observability framework, focusing on application-centric monitoring and insights. This role is responsible for ensuring the performance, reliability, and scalability of business-critical applications through enhanced visibility into application-level metrics, leveraging modern observability tools like Datadog. The ideal candidate will collaborate closely with development teams to deliver robust monitoring solutions that go beyond infrastructure and provide deep insights into application behavior and performance.

Responsibilities:

  • Design and Implement Observability Framework: Develop and implement an end-to-end observability framework that extends beyond infrastructure to focus on application-specific metrics. Ensure comprehensive visibility into the performance of key business applications.
  • Datadog Integration and Enhancement: Leverage Datadog to instrument application-level monitoring, integrating golden signals (SLI/SLOs) for performance, availability, and reliability.
  • Develop SLI/SLO Blueprints: Create and maintain SLI/SLO blueprints for key business applications, defining and measuring golden signals (latency, traffic, errors, saturation) to ensure optimal system health.
  • System Performance Optimization: Proactively monitor and assess application performance, identifying areas for improvement. Collaborate with development and SRE teams to implement performance optimization measures.
  • Dashboard and Visualization: Develop centralized dashboards with drill-down capabilities, providing real-time visibility into the health of applications and enabling quick identification of performance issues.
  • Business Journey Mapping: Work closely with business and engineering teams to map out critical business journeys and ensure that observability systems capture relevant metrics for each journey.
  • Gap Analysis and Continuous Improvement: Perform baseline measurements, identify gaps in existing monitoring systems, and work to close those gaps by integrating additional telemetry data.
  • Incident Response and Alerting: Define and implement alerting mechanisms based on SLI/SLO thresholds. Ensure the observability system can trigger appropriate alerts and escalations in case of performance degradation.
  • Collaboration with Development Teams: Work alongside development and data engineering teams to embed observability practices into the SDLC, ensuring that monitoring is an integral part of the application architecture from the ground up.
  • Knowledge Sharing: Provide training and guidance to teams on best practices for application observability, ensuring consistent adoption of tools and methodologies across the organization.

Qualifications:

  • 11-15 years of hands-on experience in DevOps/SRE, with a strong focus on observability for large-scale, high-performance applications.
  • Expertise in using and enhancing observability tools like Datadog, including deep experience with metrics collection, alerting, and dashboard creation.
  • Proven ability to create and implement SLI/SLO frameworks to track application performance, availability, and reliability.
  • Strong understanding of monitoring application health across various services, containers, and microservices architectures.
  • Experience in business journey mapping and ensuring observability captures relevant metrics at every stage of the user experience.
  • Expertise in root cause analysis and providing insights into system performance through observability data.
  • Proficiency in programming/scripting languages (e.g., Python, Bash) for automation and tool integration.
  • Proven track record of driving performance improvements and maintaining system health through proactive monitoring and alerting.

Preferred Skills:

  • Hands-on experience with cloud-native applications running on AWS or other cloud platforms.
  • Familiarity with CI/CD pipelines and integrating observability tools into the development lifecycle.
  • AWS certifications or equivalent experience with cloud infrastructure monitoring.
  • Strong knowledge of modern infrastructure stacks, including Kubernetes, Docker, and serverless architectures.
  • Experience working in agile environments, collaborating closely with product, development, and operations teams.

Tekshapers is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

Jobcon Logo Position Details

Posted:

Oct 28, 2024

Employment:

Full-time

Salary:

Not Available

Snaprecruit ID:

SD-CIE-8b19f219e77c7bda41585d7844302b3596791a00f09731ac77a08a0eabef6b76

City:

Troy

Job Origin:

CIEPAL_ORGANIC_FEED

Share this job:

  • linkedin

Jobcon Logo
A job sourcing event
In Dallas Fort Worth
Aug 19, 2017 9am-6pm
All job seekers welcome!

Principal Devops Sre Engineer    Apply

Click on the below icons to share this job to Linkedin, Twitter!

Employment type Full time

Job Title Principal DevOps/SRE Engineer Application-Centric Observability

Job Location 100% Remote (We consider candidates from US as well as CANADA)

Duration Full time

Experience level -11-15+ years

Mandatory Skills DevOps/SRE, observability tools, Datadog, metrics collection, alerting, dashboard creation, SLI/SLO frameworks, Python, Bash, etc.

Job Description

No Client interview (L1, L2 & Final round)

Note: candidates need to work on CST hours

Position Overview:

The Principal DevOps/SRE Engineer will lead the development and implementation of a cutting-edge observability framework, focusing on application-centric monitoring and insights. This role is responsible for ensuring the performance, reliability, and scalability of business-critical applications through enhanced visibility into application-level metrics, leveraging modern observability tools like Datadog. The ideal candidate will collaborate closely with development teams to deliver robust monitoring solutions that go beyond infrastructure and provide deep insights into application behavior and performance.

Responsibilities:

  • Design and Implement Observability Framework: Develop and implement an end-to-end observability framework that extends beyond infrastructure to focus on application-specific metrics. Ensure comprehensive visibility into the performance of key business applications.
  • Datadog Integration and Enhancement: Leverage Datadog to instrument application-level monitoring, integrating golden signals (SLI/SLOs) for performance, availability, and reliability.
  • Develop SLI/SLO Blueprints: Create and maintain SLI/SLO blueprints for key business applications, defining and measuring golden signals (latency, traffic, errors, saturation) to ensure optimal system health.
  • System Performance Optimization: Proactively monitor and assess application performance, identifying areas for improvement. Collaborate with development and SRE teams to implement performance optimization measures.
  • Dashboard and Visualization: Develop centralized dashboards with drill-down capabilities, providing real-time visibility into the health of applications and enabling quick identification of performance issues.
  • Business Journey Mapping: Work closely with business and engineering teams to map out critical business journeys and ensure that observability systems capture relevant metrics for each journey.
  • Gap Analysis and Continuous Improvement: Perform baseline measurements, identify gaps in existing monitoring systems, and work to close those gaps by integrating additional telemetry data.
  • Incident Response and Alerting: Define and implement alerting mechanisms based on SLI/SLO thresholds. Ensure the observability system can trigger appropriate alerts and escalations in case of performance degradation.
  • Collaboration with Development Teams: Work alongside development and data engineering teams to embed observability practices into the SDLC, ensuring that monitoring is an integral part of the application architecture from the ground up.
  • Knowledge Sharing: Provide training and guidance to teams on best practices for application observability, ensuring consistent adoption of tools and methodologies across the organization.

Qualifications:

  • 11-15 years of hands-on experience in DevOps/SRE, with a strong focus on observability for large-scale, high-performance applications.
  • Expertise in using and enhancing observability tools like Datadog, including deep experience with metrics collection, alerting, and dashboard creation.
  • Proven ability to create and implement SLI/SLO frameworks to track application performance, availability, and reliability.
  • Strong understanding of monitoring application health across various services, containers, and microservices architectures.
  • Experience in business journey mapping and ensuring observability captures relevant metrics at every stage of the user experience.
  • Expertise in root cause analysis and providing insights into system performance through observability data.
  • Proficiency in programming/scripting languages (e.g., Python, Bash) for automation and tool integration.
  • Proven track record of driving performance improvements and maintaining system health through proactive monitoring and alerting.

Preferred Skills:

  • Hands-on experience with cloud-native applications running on AWS or other cloud platforms.
  • Familiarity with CI/CD pipelines and integrating observability tools into the development lifecycle.
  • AWS certifications or equivalent experience with cloud infrastructure monitoring.
  • Strong knowledge of modern infrastructure stacks, including Kubernetes, Docker, and serverless architectures.
  • Experience working in agile environments, collaborating closely with product, development, and operations teams.

Tekshapers is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

Loading
Please wait..!!