BRD - Groupe Societe GeneraleSite Reliability Engineer
Aug. 2022Bucharest, RomaniaMonitoring
• Monitor and take steps to improve the overall application stack performance and stability.
• Improve the monitoring framework by adding new integrations and monitoring services.
• Maintain and monitor the deployment and orchestration of the servers, Docker containers, databases, and general backend infrastructure for non-prod environments.
• DevOps is responsible for detecting capacity issues and working closely with the BI & Infrastructure architecture teams for mitigation. Issue resolution
• Troubleshoot complicated, cross platform issues handling the entire applicative stack (OS, Networking, Database, Applicative) • Work closely with the Operational team, in a consulting role, for solving incidents in the production environment.
• Act as single point of contact for technical issues encountered on the non-production environments.
• Work closely with the Service Desk team in order to build a troubleshooting knowledge base for the end-users. Reliability
• Apply automation and software to any tasks or parts of the system that would benefit from it.
• Document your system knowledge as you acquire it over time, create run-books, and ensure critical system information is readily available to those who need it
• Keep up-to date with security, and proactively identify, diagnose, and solve complex security issues for all environments, in cooperation with our IT security team.
• Maintain backup, and redundancy strategies
• Conduct system analysis, configuration management and develop improvements for the solution’s performance.
• Qualify and deploy applicative evolutions/upgrades on the non-production environments. Offer advice on new features vs stability vs operability.