Site Reliability Engineer/DevOps Engineer hired
Description & Responsibilities
The Digital Transformation Team is looking for an expert in Site Reliability/Production Engineering, who will oversee and support the development of the different digital platforms coordinated by the Team.
You will be responsible for:
- Managing the life cycle of the infrastructure services of application platforms (development, production and divestiture). Planning the platform monitoring, identifying relevant metrics to guarantee a high level of reliability of the infrastructures
- Designing and implementing cloud-based infrastructures on the requirements of the stakeholders
- Producing technical specifications to design cloud infrastructures
- Automating processes to improve the scalability and reliability of the application platforms
- Activities for security hardening on the cloud infrastructures
- Identifying and prioritizing the technical debt to eliminate
- Identifying and proposing alternative technologies to develop implementations with a higher degree of scalability
- Coordinating activities to solve complex technical issues
- Providing reliable resource plans for the development of the infrastructures
- Developing automated tests to validate the source code
- Collaborating with colleagues and stakeholders to develop and maintain processes of disaster recovery
- Writing postmortem documents and technical reports about issues and malfunctions
- Promoting and sharing the devops culture in the Team and in the public sector community
We’re looking for a talented professional who is passionate in developing and managing complex IT infrastructures, with a proven track record in the development of digital platforms and with a strong scientific and technical background.
Key Qualifications
- Good knowledge of Linux, IT security practices and of fundamental notions of networks
- Practical experience in the public cloud field (Google Cloud, Azure or AWS)
- Practical experience in the field of cloud Open Source technology, specifically OpenStack
- Work experience in agile contexts
- Solid experience in coding and scripting (Python/Bash)
- Experience with scheduling technologies for containers like Kubernetes or Docker Swarm
- Experience with modern systems of logging like ElasticSearch, Graylog or Fluentd
- Solid experience with monitoring processes applying technologies like Graphite or Prometheus
- Familiarity with the principles and philosophy of DevOps, demonstrating a strong aptitude to reduce the operative overburdening of systems, through the implementation of automated processes
- Experience in the design and development of scalable and solid software architectures
- Work experience with standard tool of project management (Gantt), and with agile ones (Scrum or Kanban)
- Motivated, innovation oriented, curious and open-minded attitude
Education
- MS in Computer Science or related field with at least 3 years of experience in the IT industry as Site Reliability/ Production Engineering, or, in the absence of a degree, +5 years of experience in the IT industry as Site Reliability/ Production Engineering
- Proficiency in English