The days of using spreadsheets to manage a company’s data are long gone.
The big data revolution has brought profound changes to how companies collect, store, manage, and analyze their data.
Advances in data warehousing have empowered companies to take millions of rows of disparate bits of information and generate on-demand, real-time insights to help make smarter, data-driven decisions.
But what’s next?
We reached out to 7 of the industry’s top thought-leaders to get their predictions on where data warehousing was headed and how those changes will impact businesses.
Prediction: Enterprise Data Warehouses will continue to have a place in analytics, but the times they are ‘a changing’ with pressure on architecture, development and operations.
Enterprise Data Warehouses (EDW) will continue to adjust their standing due to Hadoop. EDWs will face strong competition from the rising “data lake” architecture based on Hadoop. Data lakes provide cost savings on software and storage. Newer organizations will adopt this strategy for the economic reasons. Cloudera, MapR and to an extent HortonWorks are embracing this approach. Existing enterprise installations will ask hard questions about the integration of Hadoop into existing installations. However, data lakes specifically and Hadoop in general has the downside of “time to implementation”. Teradata has the “lead” on adapting to Hadoop. SAP and IBM are working to adjust their strategies.
EDWs will face HUGE changes from the world of data warehouse automation. Just like we no longer “hand code” ETL scripts, I foresee 2015 as the year that productization of data modeling and database administration to speed up “time to implementation”. Platforms such as Wherescape, Kalido and Datical are pushing the envelope on how database structures are designed and implemented. As EDWs continue to evolve with development, test and production environments, it will be critical to reduce errors and speed migration of database schema.
John L Myers
Managing Research Director BI
Prediction: Go as you grow
Just like anything else in enterprise computing, “data warehousing” will fade away as something you don’t have to think or worry about. It will just be there, similar to electricity, enabling amazing things such as finding insights, telling stories, making decisions. For two reasons:
One, as powerful services to make sense of data are moving entirely into the cloud, the need to architect, build, maintain and upgrade your own data warehouse or rent it in one or a few central locations will fade away. It’s already being replaced by something much more elastic and seamless that grows as you go. The data warehouse of the future is a fluid, living system that brings resources online as your needs evolve. A modern front-end hides all the complexities while your service will choose the data sets you want to query and bring them together as needed. Go big, and the resources at your disposal will be augmented, from a few thousand rows to petabytes. A powerful and flexible array of cloud services will handle this task, freeing business people to work with the data. In short, the entire web will be your data warehouse.
Second, the old siloed model and even the notion of creating “data lakes” for later analysis both miss the point. Teams or organizations will increasingly answer their questions in a browser, even on a tablet. No installs, no native apps required. To do that, they need to connect to dozens of data sources on the fly, often as they stream in. Tying yourself down with expensive hardware and software bets or being locked down with a particular vendor means losing valuable time and budget. And given the fact that we don’t know what types of data sources will be more important than others, the way forward is a virtual model that transforms the entire web into a data mesh, making connections as we need them. If BI becomes a social, collaborative endeavor, so should your data.
Co-founder and CEO
Prediction: Enterprises will build “operational data warehouses” to combine data from multiple sources in real time and go beyond dashboards and reports to actually use their data in day-to-day operations.
Today’s data warehouses are not moving at the speed of the business. It takes forever to integrate a new data source into your data warehouse. You have to figure out what reports you’re going to want so you can pre-define data dimensions for aggregation. You have to figure out a schema that can accommodate all the data you’re going to include. You have to set up ETL to translate your operational data into that analytic schema, and you have to maintain separate technology stacks at the operational, analytic, and archive tiers. This kind of traditional data warehouse is resistant to change.
There are three trends driving the move to a more agile model. First is the trend towards wanting to move faster and accommodate more data quickly. Waiting months to develop a schema and build the required ETL is no longer acceptable. Second is the trend towards discovery-based analytics, driven by the consumer experience with search technologies. Business analysts today want a search-based paradigm that allows them to formulate new questions to ask the data based on the results of the question they just asked a few seconds ago, and they want the results in real-time so they can figure out the question they want to ask next.
Third is the trend towards operationalizing the data from the data warehouse. This means building data services that can combine data from multiple sources and provide that data securely and performantly to an operational process so that process can complete in real time. Fraud detection, eligibility for benefits, and customer onboarding are all examples of use cases that used to be performed offline but now need to be performed online in real-time.
Data Warehousing has never really been about warehousing your data. It’s always been about getting value out of it. Enterprises want more agility, and they’re finding that new technologies like NoSQL can deliver more value on a greater variety of data faster than ever before.
VP of Engineering
Prediction: The data warehouse will become an integrated enterprise processing engine, fusing multistructured data and analytics, while incorporating multiple procedural and scripting languages to ensure multiple user communities have immediate access to relevant insight.
Teradata believes the future of data warehousing is here today and it continues to evolve in a powerful and positive direction. Today’s data warehouse is not what you think! It has capabilities that weren’t imagined even just a few years ago.
The data warehouse used to be a single platform, whereas today, it’s a logical data warehouse that consists of multiple systems, each with their own analytic strengths, working together transparently to solve business problems. The data warehouse used to be accessed through SQL only and primarily by business users, but now also supports languages such as R, Java, Perl, Ruby, and Python running in parallel inside the database and used not only by business users but also by data scientists and application developers.
The data warehouse used to contain only structured data which were stored in row format. Today it also stores multi-structured data such as XML, JSON, and weblogs natively and includes methods to operate on this data along with all the other data in the warehouse. Data can also be stored in a combination of row and column formats, giving companies flexibility, performance, and storage efficiency.
Today’s warehouse also comes with deployment flexibility, giving companies the choice of on-premises and/or cloud, public and/or private.
The future of the data warehouse is here today — and is evolving rapidly. Teradata continues to develop these powerful capabilities and more — at a market-leading pace, ensuring access to big data analytics for all users within the organization – and naturally, our main priority is enabling real business value.
Director of Product Marketing
Prediction: Processing data and analytics in the cloud will become a requirement.
Cloud has gone from a sandbox for experiments to a critical piece of enterprise infrastructure. Even late adopters caught napping have awakened to cloud’s potential – it’s increasingly hard to find a business that doesn’t use cloud applications or cloud infrastructure. It’s become almost impossible to be a credible CIO without a strategy for where and how to leverage the cloud. That has resulted in an increasing amount of data residing and being generated in the cloud—most estimates agree that the share of data in the cloud will rapidly surpass (or already has surpassed) on premises data.
To date, data warehousing and analytics in the cloud has been pretty rare. But we’re approaching an inflection point, driven by evolution in data and in cloud offerings, that will make it a necessity to have a cloud-based solution for data warehousing and analytics. Cloud-based solutions will be critical to helping organizations expand access to data and analytics as well as increase their agility with their data. Taking advantage of the flexibility and cost model of the cloud, these solutions will offer performance on demand and native understanding of diverse data to support a wide range of analytics, without the management overhead and cost of traditional on-premises offerings.
Vice President of Products and Marketing
Prediction: Big Data projects will start with data warehouse optimization and ultimately lead to managing data as an asset.
Data warehouses are reaching their capacity much too quickly, as the demand for more data and more types of data are forcing IT organizations into very costly upgrades. Further compounding the problem is that many organizations do not have a strategy for managing the lifecycle of their data. It is not uncommon for much of the data in a data warehouse to be unused, or infrequently used, or that too much compute capacity is consumed by extract-load-transform (ELT) processing. Therefore, we’re seeing many organizations adopt new Big Data technologies, such as Hadoop, to offload data from traditional databases and warehouses.
Where companies struggle to make the transition to enterprise Hadoop implementations is in their ability to find the resource skills to staff their Big Data projects. In other words, they can scale their data storage and processing with Hadoop technology but can’t scale the resource skills to do the actual work on Hadoop. The best way to scale resource skills for Big Data projects is to utilize tools that existing developers and analysts are already familiar with and that work with new technologies, such as Hadoop. This is why Informatica’s data integration and data quality tools run on Hadoop and why we see other vendors providing SQL (Structured Query Language) capabilities on Hadoop. Most data analysts and data engineers are familiar with SQL, and there are more than 100,000 trained Informatica developers around the world to build data pipelines on Hadoop.
Once organizations have Hadoop in place, they can then build out a “data lake” to manage and process new types of data, such as machine log files, social media or longer time horizons (e.g., years vs. months) of transaction data. Then, the problem shifts to data governance and managing data as an asset across the company. This, in turn, requires data security, data lineage, master data management, search and more self-service tools so business users can access and analyze data in a more agile fashion. This is why we are also seeing the emergence of Chief Data Officers (CDOs) so that companies have the discipline to manage their data and grow their businesses.
Senior Director, Big Data Product Marketing
Prediction: The traditional data warehouse market has progressed into an important transitional stage.
The concept of the centralized data warehouse was previously focused on the best use of existing and available technology. In late 2008, Gartner noted the beginning of a new concept which we now refer to as the “logical data warehouse”. At the core, data consistency and integration is not wholly dependent upon physical transformation and storage—you don’t have to build a centralized repository. For thirty years, physical stores of integrated data was the best and possibly the only effective optimization technique to achieve reliably consistent performance from analytic systems. But the “Achilles’ heel” of these systems was always the slow requirements “wrangling” that was required to determine how business analysts wanted to use the data before it could be integrated and deployed.
From 2014 forward, the data warehouse will evolve into three separate technology areas. Optimization technology will focus on determining the stability of data usage by business analysts. Well-established, consistent and stable use-cases such as those in key performance metrics, dashboards and other results which are easily deployed via easily connected portals will continue to be deployed in some type of data repository. It doesn’t matter if it is on disk, SSD, Flash or within in-memory databases—it’s still a repository with integrated and transformed data. Use cases that are confident with their sources, but not so sure on the analytic models will use data virtualization approaches—either as an independent tier or from the BI platform or even DBMS solutions (both of which can already perform this function). But with the final class of use-cases, where even the data source formats and contents can be questioned and must be evaluated (so-called unstructured and big data assets at this time), the business analysts must be supported for data access using various programming languages and given the ability to use distributed platforms, such as Hadoop, Search and other non-standard information analysis tools (e.g. video and image analytics).
This means from now on, the ability to access and manipulate data with a “best fit engineering” approach through a coordinated control center becomes far more important. New competitors enter the market. Communications vendors like Huawei, Cisco, Verizon and Orange will enable distributed data integration and transformation. Established integration vendors like Informatica and IBM will compete with data warehouse platform vendors like Teradata who are introducing their own “command center” type of software for coordinating this new world.
The data warehouse becomes what it always was meant to be—a dynamic data integration and transformation engine that delivers consistent performance via a coordinated information semantic where some of it is virtual, some is physical, some is distributed.
Research VP, Information Management