What is the service?
Hadoop in the Cloud is UKCloud’s highly secure PaaS implementation of Hadoop. It provides a cloud-based solution to help organisations address the challenges of big data storage and processing.
Why deliver Hadoop as a cloud service?
The service enables organisations to explore a highly connected, secure, stable solution that’s optimised for big data, from proof of concept through to production workloads — while minimising the investment, time and risk associated with buying, provisioning, configuring and maintaining Hadoop infrastructure, platforms and licenses.
How is Hadoop in the Cloud billed?
The service is a true cloud service, billed by the hour based on the storage consumed, with no upfront cost, minimum commitment or early exit fees.
Does UKCloud offer a free trial?
We offer a 30-day free trial so that you can test and evaluate our service without commitment. Your trial provides you with a live environment on the UKCloud platform to test our services and verify whether they are suited to your needs.
Where is the service hosted?
The service is delivered by a UK company from two tier 3 UK data centres separated by more than 100km, and securely connected by high-bandwidth, low-latency dedicated connectivity.
Does my data leave the United Kingdom?
As the service is delivered from UK data centres by a UK company, the data does not leave the UK when at rest.
How is Hadoop in the Cloud supported?
UKCloud manage and support the Hadoop Core Platform using our dedicated support team based in the UK. Support is available via helpdesk ticket or phone.
What modules constitute the Hadoop Core Platform?
The Hadoop core platform consists of the following modules and associate supporting services:
- Hadoop Common. The common utilities that support the other Hadoop modules
- Hadoop Distributed File System (HDFS™). A distributed file system that provides high-throughput access to application data
- Hadoop YARN. A framework for job scheduling and cluster resource management
- Hadoop MapReduce v2. A YARN-based system for parallel processing of large datasets
The modules in the Hadoop core platform facilitate data ingress and egress, and native MapReduce v2 applications.
Can I use Hadoop in the Cloud in the UKCloud Elevated (previously IL3) domain?
Yes, Hadoop in the Cloud is available in both the OFFICIAL Assured and Elevated domains.
Is the service Pan Government Accredited?
UKCloud’s existing PGA still applies to the infrastructure underpinning our services, but since the move to the Government Security Classification Policy (GSCP), we are no longer able to seek PGA for new services such as Hadoop in the Cloud.
We are now required to self-assert our services, with customers then responsible for assessing and selecting the most appropriate cloud services which meet their individual security requirements.
We provide confidence that the service still meets the highest level of information assurance, which is why we continue to conduct independent testing and validation of our platform, and have the findings made available to our customers and partners, thereby enabling their SIROs to make an informed decision about self-asserting any service they choose to consume.
What Hadoop Distributions will ‘Hadoop in the Cloud’ support?
This service currently supports Hortonworks® HDP and Cloudera Enterprise. We will continue to review supporting additional distributions according to market demand.
How did UKCloud define its Hadoop Core Platform?
To deliver a quality services, we identified the boundaries of Hadoop in order to make a clear delineation between UKCloud-provided and -supported services, and customer/third-party services. We’ve adopted the industry definition of Hadoop as per the Apache Foundation http://hadoop.apache.org/
Can I use Hadoop in the Cloud over closed networks such as PSN or N3?
The service is accredited for use over PSN. Connectivity to the N3 network will be considered when an appropriate sponsor submits a requirement.
What is the underlying storage technology for the service?
We designed our platform to be optimised specifically for Hadoop, in line with best practices established by VMware and the Hadoop community.
Unlike some Hadoop cloud service providers, we give each node VM exclusive access to a physical drive attached directly to the host, helping to increase both performance and security.
How do you ensure my data remains secure in a multi-tenant environment?
Hadoop in the Cloud was designed with data security as a priority. Each Hadoop cluster is deployed as its own entity and within its own virtualised environment from a storage, processing and management perspective. This, coupled with all HDFS data being stored on a physical drive exclusive to a single tenant’s virtual node, helps ensure the highest level of data security and assurance.
Will UKCloud manage rolling point Hadoop releases?
We will monitor the release of any minor, major and security patch releases, and test them on our own platforms. We won’t automatically apply updates, but will present our testing, update packages and blueprints to enable customers to apply patches at their own discretion.
What is the HDFS data replication factor for Hadoop in the Cloud?
UKCloud has fixed the HDFS data replication factor to a multiple of three. This factor is in line with established Hadoop practices, and helps keep costs for the service to a minimum.
How large can I grow my Hadoop cluster?
UKCloud is confident that our Hadoop in the Cloud service is capable of operating at a scale more than large enough to deal with the majority of Hadoop use cases and production workloads.
Does UKCloud offer any scheduled automated backup for Hadoop in the Cloud?
There is no scheduled automated backup for this service as Hadoop’s storage engine, HDFS, is engineered with infrastructure failure in mind. That means localised component failures are tolerated within the infrastructure via data replication, eliminating single points of failure (including physical host failure or disk failure).
Hadoop v2.4.1+ allows for manual creation of snapshots of HDFS, which can be stored offline using our Cloud Storage.
Is your Hadoop in the Cloud service extensible to offer additional analytics and visualisation tools?
We have engineered the service to enable customers to provision their own analytics, business intelligence and visualisation tools on our Compute service line with full, reduced-latency connectivity to Hadoop in the Cloud.
Does Hadoop in the Cloud support active/active replication of my cluster between your two data centres?
Currently our service offers only a single active cluster from either of our data centres.
Active/passive clusters could be configured using our low-latency dedicated connectivity to enable synchronous replication, but the customer or partner would be responsible for supporting this configuration.
Third-party tools for active/active Hadoop clusters are available, but we would not be responsible for the design, implementation, testing or support of these tools.
What are Velocity Packs?
To maximise cost flexibility against user requirements, UKCloud offers three Hadoop cluster types for customers to choose from, based on their initial Hadoop data requirements, coupled with their projected velocity of future data ingest.
Can I mix and match different Velocity Packs?
It is currently not possible to mix and match cluster node types within a single cluster (for example, a low-velocity cluster can only scale out with low-velocity slave nodes).
How is Hadoop in the Cloud supported?
We manage and support the Hadoop core platform using our dedicated support team based in the UK. Support is available via helpdesk ticket, phone or email.
Will my cluster performance increase, the more worker nodes I deploy?
Owing to the way Hadoop places and queries data, the more worker nodes the cluster can spread its data across, the faster performance becomes.
How do you ensure the performance and resilience of Hadoop in a virtualised environment?
We’ve used Big Data Extension and Hadoop Virtual Extension technologies to create rack, host and node awareness within our virtual data centre, to help ensure the best placement of nodes from a performance and resilience perspective.