Operations Engineer Platform Engineer Apply
Job Title: Operations Engineer/Platform Engineer
Job Description:
We are seeking a highly skilled and adaptable Platform and Operations Engineer to join our team. This hybrid role combines the responsibilities of a traditional operations engineer with those of a platform engineer, ensuring that our data platforms and infrastructure are reliable, efficient, and secure. The ideal candidate will excel in system monitoring and maintenance, incident response, automation, deployments, and collaboration with cross-functional stakeholders. You should thrive in complex and ambiguous environments, demonstrating exceptional problem-solving skills and clear communication.
Schedule & Location
M-F, 9-5, or as needed for escalations
Remote, but occasional travel to customer location required (NYC Metro area candidates)
Key Responsibilities:
System Monitoring and Maintenance:
Monitor and maintain the health, performance, and availability of our data platforms and infrastructure.
Perform routine maintenance, updates, and optimizations of systems and processes.
Incident Response:
Diagnose, troubleshoot, and resolve complex technical issues.
Proactively implement solutions to prevent recurrence and improve system resilience.
Automation and Optimization:
Develop and implement automation scripts and tools to optimize workflows.
Ensure repeatable, scalable, and efficient operations across the platform.
Deployments and Configurations:
Manage deployments, configurations, and upgrades for platform services.
Ensure seamless integration between data storage, processing, and streaming solutions.
Collaboration with Stakeholders:
Partner with developers, data engineers, and business teams to align platform capabilities with organizational goals.
Serve as a liaison between technical and non-technical teams to facilitate effective communication.
Security and Compliance:
Implement security best practices and monitor for vulnerabilities.
Ensure systems and processes meet organizational and regulatory compliance requirements.
Documentation:
Create and maintain detailed documentation of system configurations, troubleshooting procedures, and operational workflows.
Key Skills and Qualifications:
Strong problem-solving abilities and the ability to thrive in complex, ambiguous environments.
Excellent communication skills with a collaborative mindset.
Hands-on experience with core Hadoop components, including HDFS, MapReduce, and Yarn.
Expertise in data streaming solutions such as Apache Kafka.
Knowledge of data storage solutions like HBase, Hive, and Kudu.
Experience with data processing frameworks such as Apache Spark or Flink.
Familiarity with DevOps tools such as Azure DevOps Pipelines and Ansible.
Proficiency in Linux-based operating systems, including shell scripting and tooling.
Basic programming knowledge and familiarity with common programming strategies and frameworks.
Experience with Cloudera platforms is highly desirable.