Job description
This role can be based either in the Dublin office or fully remote (within Ireland). If you’re passionate about system health, observability, auto-scaling, applying best practices in incident management, and often find yourself being the main coordinator during an outage or other system issue, then this could be the role for you.
How you will make a difference:
- Be an evangelist for best practices in incident management and the benefits of an SRE mindset across Engineering and Operations;
- Assist engineering teams in developing monitoring dashboards for all production systems;
- Establish a new framework for incident management within the organisation;
- Participate in the improvements happening across the organisation as it moves towards high performance capabilities;
- Play a role in the production release process, ensuring the definition of done has been met;
- Contribute to system architecture and design sessions to ensure that all system improvements adhere to SRE best practices.
This role would be a good fit for someone who:
- Cares deeply about system availability, performance, resilience, and reliability;
- Is a naturally curious person, constantly learning;
- Has experience managing the operations of large distributed IT systems;
- Has setup monitoring and alerting systems for a large production system;
- Enjoys leading communications during a production issue;
- Is passionate about SRE best practices.
The skills you will bring:
- 4+ years’ previous experience in site reliability
- Experience in 24/7 monitoring of distributed systems
- New Relic/Graylog/Nagios
- Knowledge of microservices architecture in a cloud-based environment (AWS or similar)
- Knowledge of mobile technologies – iOS/ Android
- Ambitious and driven with a desire to progress technically
- Demonstrable ability to troubleshoot technical issues
- Must be able to work in a process driven environment, but show initiative when there are process gaps
- Bachelors’ degree or equivalent relevant experience
- Excellent written and verbal communication skills
- Good understanding of Information Security controls
- Good knowledge of CI/CD deployment strategies
- Demonstrable understanding of networking topologies
- Firewall- Cisco ASA
seankuhnke.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, seankuhnke.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, seankuhnke.com is the ideal place to find your next job.