Doctors without Borders and Journalists without Borders were both in the news last year, but 2015 could add a new nomadic concept: data without borders.
That’s what IBM Fellow and Storage CTO Vincent Hsu (below left) predicts. As data volumes grow and Big Data analysis involving data from multiple sources moves from test to production, it will be increasingly impractical for users to move data to a single location. So data have to exist in multiple locations and analysis will have to come visit.
This presents some mind-boggling new complexities as analysis tools need to evolve to harmonize simultaneous operations across multiple locations. For example, information stored in the data warehouse, on social media sites and in transaction processing systems will need to be coordinated in order to accurately measure the impact of a new marketing strategy on public perception and sales. The analysis processes at each side will need to be aware of each other to combine results and reliable predictive analysis.
Data volume is only one challenge dictating a “data without borders” approach. Data variety is another. Traditional structured data such as company financial results will remain important, but unstructured text and multimedia data as well as real-time Internet-of-things (IoT) feeds will also grow in value. Users and service providers will search for new questions to ask and new ways to get value from the data.
Mobile is another driver, Hsu says. Business end-users ranging from C-suite and line-of-business executives to company analysts will consume analytic information on various devices both in and outside the office. These storage systems will need to access data on mobile devices as well as on internal and external arrays and user desktops. They will need to automatically remove the storage inefficiencies of duplicate files in a system, reduce instances of human error and encourage greater collaboration.
This leads to a second prediction – storage architectures that connect storage on premise with storage in the external cloud into a single logical entity will become increasingly important. This will help to drive the adoption hybrid clouds and software-led storage. Those two technologies will allow companies to keep their financial and other core data in house while connecting it directly with Internet-based data such as social media and IoT in a single logical database that spans multiple data types and physical locations. As software-led storage that spans the hybrid cloud environment gains momentum, intelligent, policy-driven data tiering will also grow in popularity.
The huge growth in data will also drive changes in storage media. Flash is taking over the high performance storage space, but companies are also going to have to archive increasing volumes of data both for compliance and research purposes. And in some cases, at least, that data will still need to be easily retrievable. This might include very large volumes of IoT data, such as feedback from hundreds of sensors monitoring airliners in flight.
These data volumes data will in many cases make even SATA disk impractical. IBM is already seeing growth in the sale of tape systems designed specifically to meet this need, Hsu said. For example, one petrochemical industry client is equipping drilling platforms with sensors connected to tape systems. These platforms are often in isolated locations with little Internet connectivity, so the company collects all the IoT data on tape and then flies the tapes to a central location for analysis.
This strategy is likely to become common as IoT grows. An airplane, for instance, can produce Terabytes of data on a single flight. It will be impossible to send all that data live across the Internet, even if a good network connection is available. A more practical approach is to capture the data on a tape drive onboard, then have the tape changed between flights and shipped to a central location for detailed analysis. Add a front-end inline analysis capability to identify out-of-range readings that might indicate a developing problem and send only that data immediately in flight over the network, and this system can increase flight safety and maintenance efficiency by providing information that can allow the airline to have the right person and replacement parts waiting when the plane lands while also helping the airline save money longer term by optimizing strategies for fuel savings and maintenance efficiency.
Hsu says that further out the need for active archives of multi-petabyte size databases will also drive adoption of inexpensive three-dimensional flash storage.
Backup strategies will change, too. The day of the nightly data backup is over, done in by the huge volumes of data that now need to be protected as well as the need for 24-by-7 operations that effectively shut the backup window. Instead, seamless hybrid cloud storage systems that are cloud-and application-aware and that incorporate integrated endpoint management will support continuous backup to an active archive in the cloud. These systems will leave virtually no gaps, supporting very low recovery point (RPOs) and recovery time objectives (RTOs) with minimal disruption during recoveries.
And those archives will be active rather than passive, supporting background analysis, development/test and other processes that are not appropriate for the production database. The size of the data sets and the expense of archiving dictate that the company derive business value by making it easier to combine current and archival data for analysis.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.