Cloud Native Infrastructure Senior Engineer
UKCloud is dedicated to helping the UK Public Sector by delivering more choice and flexibility through safe and trusted cloud technology. We own and operate UK sovereign, industry leading, multi-cloud platforms which are secure, assured and located from the Government’s Crown Campus. Our team of multi-cloud experts are dedicated to helping our customers gain value from the agility and cost savings of a multi-cloud strategy and Making Transformation Happen – cheaper, faster, safer
The Cloud Native Infrastructure Senior Engineer works as part of the Cloud Native Infrastructure team to deliver exceptional services to our customers as they manage, maintain and operate our Digital application portfolio. The role operates alongside other functional teams including Availability Management, Capacity Planning, Lifecycle Management, Transition to Support, Security and Problem Management.
- Taking ownership of live incidents to drive rapid and safe resolution and following up to ensure repeats are mitigated or eliminated
- Assisting the Principal and Technical Authority with design and development of the platform including evaluation of new products, solutions and services to high standards
- Delivering secure, performant and scalable solutions that enable UKCloud to achieve business goals
- Actively engage with other project and technical teams to improve cross-working/team knowledge
- Input into functional requirements, ensuring these are clearly stated and testable. Ensure non-functional requirements are addressed
- Write supporting documentation for design, configuration and on-going support
- Maintain technical knowledge of current technical trends and standards
- Provide Subject Matter Expert level code and application support when needed, including leadership of technical investigations
- Maintain oversight of day to day operations of the Cloud Native platform ensuring capacity, monitoring, patch management and availability is optimised, current and enabled for future demand
- Produce reports, documentation, processes and implement regular tasks as required to support the day to day running and maintenance of the Cloud Native platform
- Working with the Principal to ensure tasks are correctly proritised and planned to meet department’s strategic goals
- Representing the team at cross functional meetings (CAB, DSR, Scrum planning, Service Acceptance etc.) and delivering actioned items in a timely manner
- Provide OpenStack Subject Matter Expert support when needed, including leadership of technical investigations
- Provide input into both high & low level design (HLD & LLD) documentation when required.
- Work alongside Technical Authority, suppliers and vendors for procurement of hardware.
- Delivery of enhancements and innovations to agreed Roadmap
- Improvements of Automation and proactivity in capability
- Customer feedback from Customer Surveys, Account Directors and Technical Account Managers
- Essential experience includes:
- Minimum of 5 years operating, maintaining and managing cloud infrastructure at scale, ideally OpenStack
- Minimum of 5 years of Linux System administration
- Minimum of 2 years working to Agile methodologies
- Demonstrable skills around Infrastructure as code applied with configuration management, automation and orchestration tools
- Programming, patterns and constructs. Our main language is Python, but a fundamental understanding and an ability to learn is crucial/critical.
- Desirable experience would include:
- Relevant RHCE certification
- Fundamental understanding of Docker/Containers
- Demonstrable skills with shell scripting and the typical variety of Linux command line tools such as sed, awk, grep etc. using source control systems, CI/CD, unit testing. We make use of the Atlassian tool set, including Bitbucket and Bamboo.
- Production experience with Open Source database technologies (Galera MySQL, Mongo DB, Redis)
- Understanding of how to monitor microservice based applications.
- Fundamental understanding of software defined networking, particularly openvswitch .
- Understanding of infrastructure as code, applied with configuration management, automation and orchestration tools such as TripleO, HEAT, Ansible and Puppet.
- Fundamental understanding of the architecture of big data services
- Experience of Software Defined Storage concepts in particular Ceph, including Pools, placement groups and crush maps
- Strong written and oral communication. Systematic and organised.
- Ability to prioritise workloads, working under pressure and produce high quality
- Ability to present technical issues and solutions to a variety of audiences
- Ability to work effectively in a team and across multi-functional teams
- Embrace and champion the use of automation across the platform
- Strong coaching and mentoring of junior engineers
Our technology stack includes Ceph, OpenStack, TripleO, Openvswitch, Python, Galera MySQL Cluster, Mongo, Ansible, Puppet, Linux (Redhat/Centos), Kubernetes (Containers/Docker), OpenShift, CloudFoundry and Big Data (Hadoop/Elasticsearch)
Employees holding this position are required to achieve and maintain an appropriate security clearance, as detailed in the header of this job description, as a condition of employment with UKCloud Ltd.
UKCloud Ltd is an equal opportunities employer. Applications from individuals are encouraged regardless of age, disability, sex, gender, sexual orientation, race and religion.
Information Security Management System
This position is within the scope of the UKCloud Information Security Management System (ISMS), and the post holder is responsible for complying with applicable requirements of the UKCloud Information Security Policy, Information Security Manual, and all other information security policies, processes and documentation including UKCloud SyOPs (UKC-AAA-11).
Information Security: Asset/Control/Risk Ownership
This position may be responsible for the ownership of:
- Information Assets or Supporting Assets
- Control Objectives or Controls
- Management of identified risks
- Suppliers – specifically their information security responsibilities
IT System Access
Employees holding this position will automatically be provided with access to the systems and data that have been specified within the UKCloud IT System Access Matrix (UKC-GEN-46).