The Site Reliability Engineering team is responsible for our Cloud Management Platform which provides the core tools and systems (such as automation, monitoring, ITSM, IDAM, etc) to enable UKCloud to build and operate our multi-cloud environments – efficiently and securely. The team consists of circa 20 IT specialists with particular skills in integration and automation.
Operational and line management of the Automation & Service Reliability team.
Lifecycle management of the core components of the Cloud Management Platform, including; reactive support of internal users of the CMP, proactive support such as preventiative maintenance and continuous service improvement/optimisation of existing core services
Providing project support for the adoption of new or replacement CMP systems or tools – from requirements, through design and engineering into core service.
Ensuring appropriate supporting documentation for design, configuration and on-going support is available and maintained to support the enablement of 24/7 operations teams (external to ASR) to take on more of the reactive and proactive support requirements.
As a member of the Office of the CTO (OCTO) Team, contribute to the development and implementation of an Automation & Service Reliability plan, including recommendations for technical training and wider team development.
Working with internal stakeholders such as the Customer Services team, the Platform Operations team, Project Managers and the Software Engineering team to ensure their expectations are
Consistently met and that they are bought into the continued improvement of the ASR function and the Cloud Management Platform. · Contribute to the creation of technical standards within UKCloud to drive consistency, ease of automation and efficient scalability
Providing regular fact (metric) based reports/scorecards on the performance of the ASR function to the CTO and wider business. · Maintain procedural and audit-readiness of the function including policies, processes and education.
Experienced and successful leader of Infrastructure teams.
A proven track record of delivering infrastructure and systems automation programmes.
A good understanding of core datacentre technologies: Services, Servers, Networks, Storage gained through an operational support background.
Experience of executing a Vision and Strategy and the successful implementation thereof.
Strong operational support experience of Linux Operating Systems and virtualised platforms.
Demonstrable skill in at least one of the following scripting languages: Ansible, Bash, Python
Leadership of technical teams and the ability to lead by example
About the Company
UKCloud provides an unbeatable, secure UK public cloud. Focused solely on serving the UK Public Sector. We are committed to assurance and security while delivering flexible, agile and value-based cloud hosting to our customers.
Formed in 2012, UKCloud is based in Farnborough (Hampshire) and Corsham (Wiltshire). We have a team of 250+ people and we continue to grow! We are looking for people who want a rewarding career in a business who truly invest in you as an individual.
Competitive salary plus 10% bonus
25 days' holiday increasing to 30 days over length of service, half a day birthday leave, charity day
Access to free parking
Active social and charity events
Cycle to work scheme
Friday breakfasts, fruit and soft drinks
UKCloud is an equal opportunities employer and positively encourages applications from suitably qualified and eligible applicants. Applicants must be eligible to work and live in the UK and will be required to undergo and maintain appropriate UK government security clearance.