This IBM® RedpaperTM publication provides guidance on building an enterprise-grade data lake by using IBM SpectrumTM Scale and Hortonworks Data Platform for performing in-place Hadoop or Spark-based analytics. It covers the benefits of the integrated solution, and gives guidance about the types of deployment models and considerations during the implementation of these models. Hortonworks Data Platform (HDP) is a leading Hadoop and Spark distribution. HDP addresses the complete needs of data-at-rest, powers real-time customer applications, and delivers robust analytics that accelerate decision making and innovation. IBM Spectrum ScaleTM is flexible and scalable software-defined file storage for analytics workloads. Enterprises around the globe have deployed IBM Spectrum Scale to form large data lakes and content repositories to perform high-performance computing (HPC) and analytics workloads. It can scale performance and capacity both without bottlenecks.
Storage systems must provide reliable and convenient data access to all authorized users while simultaneously preventing threats coming from outside or even inside the enterprise. Security threats come in many forms, from unauthorized access to data, data tampering, denial of service, and obtaining privileged access to systems. According to the Storage Network Industry Association (SNIA), data security in the context of storage systems is responsible for safeguarding the data against theft, prevention of unauthorized disclosure of data, prevention of data tampering, and accidental corruption. This process ensures accountability, authenticity, business continuity, and regulatory compliance. Security for storage systems can be classified as follows: Data storage (data at rest, which includes data durability and immutability) Access to data Movement of data (data in flight) Management of data IBM® Spectrum Scale is a software-defined storage system for high performance, large-scale workloads on-premises or in the cloud. IBM SpectrumTM Scale addresses all four aspects of security by securing data at rest (protecting data at rest with snapshots, and backups and immutability features) and securing data in flight (providing secure management of data, and secure access to data by using authentication and authorization across multiple supported access protocols). These protocols include POSIX, NFS, SMB, Hadoop, and Object (REST). For automated data management, it is equipped with powerful information lifecycle management (ILM) tools that can help administer unstructured data by providing the correct security for the correct data. This IBM RedpaperTM publication details the various aspects of security in IBM Spectrum ScaleTM, including the following items: Security of data in transit Security of data at rest Authentication Authorization Hadoop security Immutability Secure administration Audit logging Security for transparent cloud tiering (TCT) Security for OpenStack drivers Unless stated otherwise, the functions that are mentioned in this paper are available in IBM Spectrum Scale V4.2.1 or later releases.
When technology workloads process healthcare data, it is important to understand Health Insurance Portability and Accountability Act (HIPAA) compliance and what it means for the technology infrastructure in general and storage in particular. HIPAA is US legislation that was signed into law in 1996. HIPAA was enacted to protect health insurance coverage, but was later extended to ensure protection and privacy of electronic health records and transactions. In simple terms, it was instituted to modernize the exchange of healthcare information and how the Personally Identifiable Information (PII) that is maintained by the healthcare and healthcare-related industries are safeguarded. From a technology perspective, one of the core requirements of HIPAA is the protection of Electronic Protected Health Information (ePHIPer through physical, technical, and administrative defenses. From a non-compliance perspective, the Health Information Technology for Economic and Clinical Health Act (HITECH) added protections to HIPAA and increased penalties $100 USD - $50,000 USD per violation. Today, HIPAA-compliant solutions are a norm in the healthcare industry worldwide. This IBM® Redpaper publication describes HIPPA compliance requirements for storage and how security enhanced software-defined storage is designed to help meet those requirements. We correlate how Software Defined IBM Spectrum® Scale security features address the safeguards that are specified by the HIPAA Security Rule.
Advancing the science of medicine by targeting a disease more precisely with treatment specific to each patient relies on access to that patient's genomics information and the ability to process massive amounts of genomics data quickly. Although genomics data is becoming a critical source for precision medicine, it is expected to create an expanding data ecosystem. Therefore, hospitals, genome centers, medical research centers, and other clinical institutes need to explore new methods of storing, accessing, securing, managing, sharing, and analyzing significant amounts of data. Healthcare and life sciences organizations that are running data-intensive genomics workloads on an IT infrastructure that lacks scalability, flexibility, performance, management, and cognitive capabilities also need to modernize and transform their infrastructure to support current and future requirements. IBM® offers an integrated solution for genomics that is based on composable infrastructure. This solution enables administrators to build an IT environment in a way that disaggregates the underlying compute, storage, and network resources. Such a composable building block based solution for genomics addresses the most complex data management aspect and allows organizations to store, access, manage, and share huge volumes of genome sequencing data. IBM SpectrumTM Scale is software-defined storage that is used to manage storage and provide massive scale, a global namespace, and high-performance data access with many enterprise features. IBM Spectrum ScaleTM is used in clustered environments, provides unified access to data via file protocols (POSIX, NFS, and SMB) and object protocols (Swift and S3), and supports analytic workloads via HDFS connectors. Deploying IBM Spectrum Scale and IBM Elastic StorageTM Server (IBM ESS) as a composable storage building block in a Genomics Next Generation Sequencing deployment offers key benefits of performance, scalability, analytics, and collaboration via multiple protocols. This IBM RedpaperTM publication describes a composable solution with detailed architecture definitions for storage, compute, and networking services for genomics next generation sequencing that enable solution architects to benefit from tried-and-tested deployments, to quickly plan and design an end-to-end infrastructure deployment. The preferred practices and fully tested recommendations described in this paper are derived from running GATK Best Practices work flow from the Broad Institute. The scenarios provide all that is required, including ready-to-use configuration and tuning templates for the different building blocks (compute, network, and storage), that can enable simpler deployment and that can enlarge the level of assurance over the performance for genomics workloads. The solution is designed to be elastic in nature, and the disaggregation of the building blocks allows IT administrators to easily and optimally configure the solution with maximum flexibility. The intended audience for this paper is technical decision makers, IT architects, deployment engineers, and administrators who are working in the healthcare domain and who are working on genomics-based workloads.
This IBM® RedguideTM publication describes big data and analytics deployments that are built on IBM Spectrum ScaleTM. IBM Spectrum Scale is a proven enterprise-level distributed file system that is a high-performance and cost-effective alternative to Hadoop Distributed File System (HDFS) for Hadoop analytics services. IBM Spectrum Scale includes NFS, SMB, and Object services and meets the performance that is required by many industry workloads, such as technical computing, big data, analytics, and content management. IBM Spectrum Scale provides world-class, web-based storage management with extreme scalability, flash accelerated performance, and automatic policy-based storage tiering from flash through disk to the cloud, which reduces storage costs up to 90% while improving security and management efficiency in cloud, big data, and analytics environments. This Redguide publication is intended for technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing Hadoop analytics services and are interested in learning about the benefits of the use of IBM Spectrum Scale as an alternative to HDFS.
IBM® Spectrum Scale is a proven, scalable, high-performance data and file management solution. It provides world-class storage management with extreme scalability, flash accelerated performance, automatic policy-based storage that has tiers of flash through disk to tape. It also provides support for various protocols, such as NFS, SMB, Object, HDFS, and iSCSI. Containers can leverage the performance, information lifecycle management (ILM), scalability, and multisite data management to give the full flexibility on storage as they experience on the runtime. Container adoption is increasing in all industries, and they sprawl across multiple nodes on a cluster. The effective management of containers is necessary because their number will probably reach a far greater number than virtual machines today. Kubernetes is the standard container management platform currently being used. Data management is of ultimate importance, and often is forgotten because the first workloads containerized are ephemeral. For data management, many drivers with different specifications were available. A specification named Container Storage Interface (CSI) was created and is now adopted by all major Container Orchestrator Systems available. Although other container orchestration systems exist, Kubernetes became the standard framework for container management. It is a very flexible open source platform used as the base for most cloud providers and software companies' container orchestration systems. Red Hat OpenShift is one of the most reliable enterprise-grade container orchestration systems based on Kubernetes, designed and optimized to easily deploy web applications and services. OpenShift enables developers to focus on the code, while the platform takes care of all of the complex IT operations and processes. This IBM Redbooks® publication describes how the CSI Driver for IBM file storage enables IBM Spectrum® Scale to be used as persistent storage for stateful applications running in Kubernetes clusters. Through the Container Storage Interface Driver for IBM file storage, Kubernetes persistent volumes (PVs) can be provisioned from IBM Spectrum Scale. Therefore, the containers can be used with stateful microservices, such as database applications (MongoDB, PostgreSQL, and so on).
High-speed I/O workloads are moving away from the SAN to Ethernet and IBM® Spectrum Scale is pushing the network limits. The IBM Spectrum® Scale team discovered that many infrastructure Ethernet networks that were used for years to support various applications are not designed to provide a high-performance data path concurrently to many clients from many servers. IBM Spectrum Scale is not the first product to use Ethernet for storage access. Technologies, such as Fibre Channel over Ethernet (FCoE), scale out NAS, and IP connected storage (iSCSI and others) use Ethernet though IBM Spectrum Scale as the leader in parallel I/O performance, which provides the best performance and value when used on a high-performance network. This IBM Redpaper publication is based on lessons that were learned in the field by deploying IBM Spectrum Scale on Ethernet and InfiniBand networks. This IBM Redpaper® publication answers several questions, such as, "How can I prepare my network for high performance storage?", "How do I know when I am ready?", and "How can I tell what is wrong?" when deploying IBM Spectrum Scale and IBM Elastic Storage® Server (ESS). This document can help IT architects get the design correct from the beginning of the process. It also can help the IBM Spectrum Scale administrator work effectively with the networking team to quickly resolve issues.
This IBM® Redbooks® publication provides information to help you with the sizing, configuration, and monitoring of hybrid cloud solutions using the transparent cloud tiering (TCT) functionality of IBM SpectrumTM Scale. IBM Spectrum ScaleTM is a scalable data, file, and object management solution that provides a global namespace for large data sets and several enterprise features. The IBM Spectrum Scale feature called transparent cloud tiering allows cloud object storage providers, such as IBM CloudTM Object Storage, IBM Cloud, and Amazon S3, to be used as a storage tier for IBM Spectrum Scale. Transparent cloud tiering can help cut storage capital and operating costs by moving data that does not require local performance to an on-premise or off-premise cloud object storage provider. Transparent cloud tiering reduces the complexity of cloud object storage by making data transfers transparent to the user or application. This capability can help you adapt to a hybrid cloud deployment model where active data remains directly accessible to your applications and inactive data is placed in the correct cloud (private or public) automatically through IBM Spectrum Scale policies. This publication is intended for IT architects, IT administrators, storage administrators, and those wanting to learn more about sizing, configuration, and monitoring of hybrid cloud solutions using IBM Spectrum Scale and transparent cloud tiering.
Having appropriate storage for hosting business-critical data and advanced Security Information and Event Management (SIEM) software for deep inspection, detection, and prioritization of threats has become a necessity for any business. This IBM® Redpaper publication explains how the storage features of IBM Spectrum® Scale, when combined with the log analysis, deep inspection, and detection of threats that are provided by IBM QRadar®, help reduce the impact of incidents on business data. Such integration provides an excellent platform for hosting unstructured business data that is subject to regulatory compliance requirements. This paper describes how IBM Spectrum Scale File Audit Logging can be integrated with IBM QRadar. Using IBM QRadar, an administrator can monitor, inspect, detect, and derive insights for identifying potential threats to the data that is stored on IBM Spectrum Scale. When the threats are identified, you can quickly act on them to mitigate or reduce the impact of incidents. We further demonstrate how the threat detection by IBM QRadar can proactively trigger data snapshots or cyber resiliency workflow in IBM Spectrum Scale to protect the data during threat. This third edition has added the section "Ransomware threat detection", where we describe a ransomware attack scenario within an environment to leverage IBM Spectrum Scale File Audit logs integration with IBM QRadar. This paper is intended for chief technology officers, solution engineers, security architects, and systems administrators. This paper assumes a basic understanding of IBM Spectrum Scale and IBM QRadar and their administration.
Having the appropriate storage for hosting business critical data and the proper analytic software for deep inspection of that data is becoming necessary to get deeper insights into the data so that users can categorize which data qualifies for compliance. This IBM® RedpaperTM publication explains why the storage features of IBM SpectrumTM Scale, when combined with the data analysis and categorization features of IBM StoredIQ®, provide an excellent platform for hosting unstructured business data that is subject to regulatory compliance guidelines, such as General Data Protection Regulation (GDPR). In this paper, we describe how IBM StoredIQ can be used to identify files that are stored in an IBM Spectrum ScaleTM file system that include personal information, such as phone numbers. These files can be secured in another file system partition by encrypting those files by using IBM Spectrum Scale functions. Encrypting files prevents unauthorized access to those files because only users that can access the encryption key can decrypt those files. This paper is intended for chief technology officers, solution, and security architects and systems administrators.
There is a growing insider security risk to organizations. Human error, privilege misuse, and cyberespionage are considered the top insider threats. One of the most dangerous internal security threats is the privileged user with access to critical data, which is the "crown jewels" of the organization. This data is on storage, so storage administration has critical privilege access that can cause major security breaches and jeopardize the safety of sensitive assets. Organizations must maintain tight control over whom they grant privileged identity status to for storage administration. Extra storage administration access must be shared with support and services teams when required. There also is a need to audit critical resource access that is required by compliance to standards and regulations. IBM® SecurityTM Verify Privilege Vault On-Premises (Verify Privilege Vault), formerly known as IBM SecurityTM Secret Server, is the next-generation privileged account management that integrates with IBM Storage to ensure that access to IBM Storage administration sessions is secure and monitored in real time with required recording for audit and compliance. Privilege access to storage administration sessions is centrally managed, and each session can be timebound with remote monitoring. You also can use remote termination and an approval workflow for the session. In this IBM Redpaper, we demonstrate the integration of IBM Spectrum® Scale and IBM Elastic Storage® Server (IBM ESS) with Verify Privilege Vault, and show how to use privileged access management (PAM) for secure storage administration. This paper is targeted at storage and security administrators, storage and security architects, and chief information security officers.
The role of the IT solutions is to enforce the correct handling of personal data using processes developed by the establishment. Each element of the solution stack must address the objectives as appropriate to the data that it handles. Typically, personal data exists either in the form of structured data (like databases) or unstructured data (like files, text, documents, and so on.). This IBM Redbooks publication specifically deals with unstructured data and storage systems used to host unstructured data. For unstructured data storage in particular, some key attributes enable the overall solution to support compliance with the EU General Data Protection Regulation (GDPR). Because personal data subject to GDPR is commonly stored in an unstructured data format, a scale out file system like IBM Spectrum Scale provides essential functions to support GDPR requirements. This paper highlights some of the key compliance requirements and explains how IBM Spectrum Scale helps to address them.
This will help us customize your experience to showcase the most relevant content to your age group
Please select from below
Login
Not registered?
Sign up
Already registered?
Success – Your message will goes here
We'd love to hear from you!
Thank you for visiting our website. Would you like to provide feedback on how we could improve your experience?
This site does not use any third party cookies with one exception — it uses cookies from Google to deliver its services and to analyze traffic.Learn More.