Data Engineer Distributed Systems Apply
Job Title - Data Engineer, Distributed Systems
Job Type - Contract
Visa: USC Only
Location - Remote or Austin, Texas
SUMMARY
Tavant is a niche software services firm with over 3,200 associates worldwide. We're looking for a top-notch Data Engineers to lead the design, development, and optimization of our distributed data store, particularly in Druid for one of our premier Cupertino-based multi-national technology client.
You will represent Tavant in our client engagement and work with cross-functional team comprised of Data Engineers, Administrators, Platform Engineers and Architects. You will incorporate sound data modeling principles in design to optimize storage & retrieval and develop solutions on top of distributed data stores.
If you're passionate about data engineering and have a keen design acumen in building scalable systems, we encourage you to apply! We're building an incredible consulting team and believe you'll find a highly energized environment to share your expertise and help boost engineering innovation.
Our client's Advertising Platforms group makes it possible for people around the world to easily access informative and imaginative content on their devices while helping publishers and developers promote and monetize their work. Our client group power advertising operations in News, in the App Store, and in other platforms.
KEY RESPONSIBILITIES
Design and Development:
Design and implement data ingestion pipelines to load data into Apache Druid.
Develop and optimize Druid schemas and data models for efficient querying and performance.
Design and implement data aggregations against event-level data in distributed database (preferably in Druid)
Migration from existing data stores to Druid
Optimization:
Implement best practices for data ingestion, storage, and querying to improve system performance.
Optimize query performance and resource utilization.
KEY QUALIFICATIONS
5+ years of experience in data engineering or analytics in distributed data systems, i.e., Druid, Snowflake, Redshift, CockroachDb, or Pinot (Druid experience preferred).
Strong understanding of distributed data store architecture, data ingestion methods, scaling mechanisms and querying capabilities.
Proficiency in SQL and experience with data modeling and ETL processes.
Hands-on application coding experience in Java, Python or others.
A fast learner of new technologies.
Experience with performance tuning and optimization of data systems.
Excellent problem-solving skills and the ability to work independently as well as in a team.
Familiarity with other big data technologies (e.g., Hadoop, Spark, Kafka) is a plus.
Highly beneficial:
Familiarity with cloud platforms (e.g., AWS, GCP, Azure) and containerization (e.g., Docker, Kubernetes).
Knowledge of programming languages such as Scala.
Experience generating reports against distributed data systems with SQL and visualization tools.