About the project
Leading HR Tech solution in DACH.
Your responsibilities
Design, implement, and maintain highly available and scalable infrastructure, ensuring system reliability and performance.
Develop and maintain monitoring, alerting, and incident response systems to proactively identify and address potential issues.
Collaborate with development teams to define and implement automated deployment and continuous integration/continuous deployment (CI/CD) pipelines.
Automate manual processes and tasks through scripting and infrastructure-as-code (IaC) using tools such as Ansible, Terraform, or Puppet.
Conduct performance analysis, capacity planning, and optimization of systems to ensure efficient resource utilization.
Perform regular security audits, vulnerability assessments, and implement appropriate security measures across the infrastructure.
Troubleshoot and resolve complex system and application issues in a timely manner.
Collaborate with development teams to design and implement disaster recovery and backup strategies.
Participate in on-call rotations and respond to incidents, working towards quick resolution and root cause analysis.
Stay up-to-date with the latest industry trends, tools, and best practices in site reliability engineering.
Continuously identify areas for improvement in system reliability, scalability, and performance and propose and implement solutions.
Collaborate with cross-functional teams to ensure smooth integration of new services and features.
Document system configurations, processes, and procedures to maintain a comprehensive knowledge base.
Our requirements
Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field (or equivalent work experience).
Proven experience as a Site Reliability Engineer or in a similar role.
Strong knowledge of Linux/Unix systems and system administration.
Proficiency in scripting languages such as Bash, Python, or PowerShell.
Experience with cloud platforms such as AWS, Azure, or Google Cloud.
Strong understanding of networking concepts and security best practices.
Experience with configuration management tools (e.g., Ansible, Puppet, or Chef).
Familiarity with infrastructure-as-code (IaC) tools like Terraform or CloudFormation.
Experience with containerization technologies such as Docker and orchestration tools like Kubernetes.
Strong understanding of monitoring and alerting tools (e.g., Prometheus, Grafana, or Nagios).
Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI/CD, or CircleCI).
Knowledge of relational and NoSQL databases and their integration into scalable architectures.
Excellent problem-solving and troubleshooting skills.
Strong communication and collaboration abilities, with the capacity to work effectively in a cross-functional team.
Ability to handle multiple tasks and prioritize work in a fast-paced environment.