Big Data Service Reliability Engineer

Multiple Locations
Required Experience: 0 year(s)
Employment Type: Full-Time
Salary Range: Not available
Posted 4 days ago
Job Description

Contribute as a core team member in the design, development, testing, and support of data analytics systems. This position requires evaluation, implementation and management of software tools and practices to mitigate risks and introduce operational efficiencies.

This role focuses primarily on system engineering and support in the I&O portion of I.T.

Manage the infrastructure of multi-tenant, data analytics systems consisting of technologies like Hadoop, MapR, Informatica and other data related technologies.
Build, automate and operate analytics environments in on-prem, hybrid and public cloud environments.
Implement and administer sound operational practices to ensure optimum performance, rapid time-to-execution, system reliability, data durability and recoverability, instrumentation, etc.
Apply capacity planning methodologies and estimating requirements for lowering or increasing capacity of clusters.
Perform and automate all Big Data infrastructure/environment builds to include: design, security, cluster configuration, performance tuning, and ongoing monitoring.
Analyze and recommend platform improvements (portability, instrumentation, orchestration, etc.) that help PVH lower costs and improve workflow and system performance.
Perform high-level, day-to-day operational maintenance, support, and upgrades for the Big Data clusters.
Research and recommend automated approaches for system administration tasks.
Creation of key performance metrics. Measuring the utilization, performance, and overall health of the platforms.
Collaborate with product managers, lead engineers, and data scientists on all facets of the Big Data ecosystem.
Must be proficient in software driven orchestration and automation tools such as Satellite, Terraform, Chef or Ansible.



Possess at least 3 years of managing a multi-tenant production Hadoop or other data analysis environment.
A deep understanding of Hadoop internals, design principals, cluster connectivity, security and the factors that affect distributed system performance.
Proven experience with identifying and resolving hardware and software related issues.
Knowledge of best practices related to security, performance, and disaster recovery.
Experience with at least two of the following languages; SQL, Python, Java, Scala, Spark or Bash.

Bachelors degree preferred
Technical certifications considered a plus
Experience working within Public Cloud Services (IaaS, PaaS) as well as private cloud services (Openstack, VSphere, KVM).
Experience as a DBA or Systems Admin.
Experience with change management procedures.
Knowledge of high degree configuration management and automation tools like Ansible for non-trivial installation.
Good understanding of OS concepts, process management and resource scheduling.
Basics of networking, CPU, memory and storage.
Ability to work in a team environment in a collaborative manner.
Ability to concentrate on a wide range of loosely defined complex situations, which require creativity and originality, where guidance and counsel may be unavailable.
Must be able to demonstrate the following technical, functional, leadership and business core competencies, including:Agile Practices
Emerging Technologies
Programming Languages and Frameworks
Programming/Software Development
SDLC Methodologies and Practices
System/Platform Domain Knowledge

Company Overview