Détails du poste
- Lieu de travail : Montreal (Présentiel)
- Type de poste : Permanent à temps plein
Site Reliability Engineer
Lieu
Montreal, Canada (Day 1 onboarding onsite;in-office presence required 3x/week)
Description du poste
The Reliability and Production Engineering (RPE) team is seeking talented individuals with a passion for production support and real-time problem solving. This role is part of our growing Site Reliability Engineering (SRE) capabilities within the RPE organization, supporting the Technology transformation. Successful candidates will thrive in a dynamic, fast-paced environment that values collaboration, ingenuity, and adaptability.
As a Site Reliability Engineer, you will focus on improving system service availability, observability, scalability, performance, and resilience by applying sound software engineering principles and leveraging modern tooling.
Responsabilités clés
- Troubleshoot issues across the entire technology stack: hardware, software, applications, and networks.
- Collaborate with engineering and development teams to design, build, and maintain reliable systems.
- Identify and implement automation opportunities for deployment, management, and visibility of services.
- Proactively assess and mitigate systems reliability risks.
- Participate in global and regional support coverage, including occasional weekend on-call rotations.
- Represent the RPE organization in design reviews and operational readiness exercises.
Qualifications & Compétences
Requis
- Strong troubleshooting and debugging skills with ability to identify root causes.
- Excellent communication and interpersonal skills; ability to present technical problems to non-technical audiences.
- Solid Linux system administration experience.
- Basic scripting skills (Python, Bash, Perl, Ruby).
- Hands-on experience with enterprise monitoring tools (AppDynamics, Grafana, Splunk, Dynatrace).
- Familiarity with automation/configuration/release management tools (e.G., Ansible, GitHub).
- Awareness of modern software and systems architectures (cloud, microservices, load balancing, databases, caching, distributed systems).
Préférés
- Practical experience supporting large-scale systems.
- Strong analytical and problem-solving skills with a sense of ownership and accountability.
- Ability to work effectively in a team-oriented environment.
*//EEO Employer: Minorities/ Females/ Disabled/ Veterans/ Gender Identity/ Sexual Orientation//*