Industry Insights From a Data and Analytics CTO

What kinds of trends are you seeing in the data and analytics world?

The most significant movement is truly adopting the cloud in earnest. Customers are using cloud-based resources more and more for storage and compute power, and I don’t see this slowing down or ever turning back. We’ve been in the middle of the cloud movement for several years, and we’ve guided many customers on their journey to the cloud. During our tenure in this world, we’ve seen a pretty even split within our customer base of organizations using AWS and Azure, with some use of Google Cloud Platform. This seems to contradict industry statistics that show much stronger adoption of AWS, but I contend the numbers even out when you focus on data and analytics use cases.

Outside the not-so-surprising aforementioned increase in cloud Infrastructure-as-a-Service (IaaS) interest, we’re also seeing a much stronger interest and adoption of Platform- and Software-as-a-Service (PaaS and SaaS) cloud offerings. For example, users of more turnkey products like Snowflake are mostly insulated from the lower level underpinnings like underlying operating systems, file storage strategies, the intricacies of distributed computing and data backups, and all of the maintenance and patches and updates therein—things you often must directly manage when using lower level IaaS components of AWS and Azure.

Why the trend toward the cloud?

There are OpEx vs CapEx reasons why accounting departments and budget owners prefer subscription pricing models as part of cloud offerings, but above and beyond that, organizations like the flexibility to start small—quickly spin up an environment, prototype a solution, and build a proof of concept—and then either cut bait (i.e. abandon the disposable cloud resources) or quickly scale to the moon. Businesses like the ability to take low-cost risks, and subscription pricing models coupled with disposable cloud resources make this possible.

Businesses like the ability to take low-cost risks, and subscription pricing models coupled with disposable cloud resources make this possible.

Click to Tweet

How does the cloud play a factor in the current work-from-home climate?

Right now, more than ever, having nebulous resources somewhere in the cloud is very attractive. Unless you’re Amazon/Microsoft/Google, you don’t need to have people working in a physical server room on premises doing server maintenance. Your workforce, including end users and admins of all flavors, is able to work remotely anywhere using cloud resources. While remote work wasn’t historically a primary driver to cloud, it is very much top of mind now. Even companies that resisted remote work in the past have discovered that they can be productive with a remote workforce, sometimes even more so than before. Remote accessibility to server resources is a must, whether they are full-time employees or project consultants.

If the cloud movement was already in motion, how are companies’ behaviors going to change in the face of the COVID-19 response regarding data and analytics?

In many ways, we in the data and analytics industry have been down this road before in the not-so-distant past. During the Global Financial Crisis of 2008-2009, there were two main themes we saw regarding data and analytics: (1) deal with a known and trusted entity, and (2) do more with less.

This notion of “do more with less” is simple on the surface—run more efficiently. Here’s a very basic example: Increase market share with fewer marketing and sales resources.

However, even with this simple example, the analytical approach can be multifaceted.

Descriptive analytics: Looking backward, what types of customers responded to what types of campaigns?
Predictive analytics: What is the expected response rate to the upcoming campaign? What customer profile is most likely to respond?
Prescriptive analytics: What specifically should I offer to a customer once they have responded to a campaign?

A decade ago, most organizations relied on descriptive analytics to do more with less.

This time around, organizations will more frequently employ more sophisticated types of analytics (i.e., predictive and prescriptive) that rely on artificial intelligence (AI), machine learning (ML), or some flavor of data science for three main reasons.

Easier access to disposable, low-risk, but very powerful compute resources at their fingertips.
Several AI and ML tools are now available that didn’t exist a decade ago.
AI and ML are now mainstream thinking, and resistance to cloud resources has waned.

Continuing with this thought, I believe companies will dare to step out on a limb to do more innovation this time around, versus sticking to safe “do more with less” initiatives.

It should also be pointed out that companies have more data available for analysis now. As companies have modernized their internal systems over the past decade, they have generated and collected more and more data. They have become more data-savvy and understand that capturing and storing data for analytics is not just a nice-to-have; it’s a critical part of being competitive.

This “more data” theme extends beyond the virtual walls of an organization; companies now also have better access to external data with which they can augment and enhance their internal data. Since current events are top of mind, here’s a basic example: When reports about infections, exposures, and testing first started rolling out, the reports showed simple raw, absolute counts. Shortly thereafter, we started seeing these numbers reported as rates. In order to calculate a rate, you need a denominator, and in this example, that’s the population of a geographic entity. And while this is a super simple example, it’s an example of an analysis that would have been extremely difficult to perform a decade ago.

How does the increased availability of data impact the general public’s view of COVID-19?

The general public is becoming more data-driven in their day-to-day lives. A wide-scale and quantum leap in data literacy is a good thing, albeit driven by something not so positive.

The average news consumer has become familiarized and adopted en masse the idea of looking at outbreak projection graphs. This means they—unknowingly or not—have embraced the notion of predictive analytics and what-if scenarios. While this notion may seem trite to a seasoned data analyst, it wasn’t basic for the vast majority of the public, who now expect this kind of analysis.

How are you seeing businesses today use advanced analytics?

Modern front-end analytical tools have blurred the lines between traditional and advanced analytics, and that’s a very good thing. Industry pundits sometimes call this “augmented analytics,” and while I agree with this notion and like that term, I don’t think that end users really care what specific type of analytics is employed as long as they can make decisions based upon what they’re looking at.

The lines are becoming blurry on the backend, too. For example, database vendors are exposing powerful ML and AI algorithms via SQL.

I’m super excited about the lines getting blurred up and down the stack because ultimately, this means it’s easier to empower users with better and better analytics.

These advances mean that advanced analytics are no longer reserved for industries that have historically been on the bleeding edge of analytics, and advanced analytics are no longer reserved for data scientists. We’re now at a point where companies know they have to squeeze as much out of their data as possible in order to remain competitive. They know that if they don’t unlock insights buried in their data via enhanced analytics of any flavor, one of their competitors will.

While lots of companies are ready to make the jump to “true” data science, the data scientists are often hesitant because they don’t trust the data. What’s the disconnect?

Data prep—leading to good, clean, reliable, and analyzable data—is arguably more important than the models themselves, but data prep is often the least favorite part of a data scientist’s job. Naturally, they’d rather be building models because that is usually more fun. On top of that, sophisticated data prep is typically not part of a data scientist’s formal training, and they simply don’t have the years of practice building data stores (and the data models and semantic layers on top of said data stores) that the traditional ETL developers and data modelers would.

Therefore, it is prudent and incumbent upon an organization to ensure that data scientists—and all analysts, for that matter—begin their analysis working from a strong data platform and foundation. In order to get a strong data foundation, you need to understand the sources of data across the organization, know the best way of storing the data (data warehouse vs. a data lake), and what types of analyses may be performed. Analysts and data scientists should be spending their time analyzing data, not organizing and wrestling with it.

Analysts and data scientists should be spending their time analyzing data, not organizing and wrestling with it.

Click to Tweet

But we mustn’t stop there. Analysts and data scientists should be removed from the burden of explaining and justifying source data. Organizations must also create an environment with management-level buy-in and collaboration on an overarching data strategy. This helps prevent situations where a data scientist has built the perfect model with perfect data, but nobody adopts the model because they don’t trust the source data.

Analysts and data scientists should be removed from the burden of explaining and justifying source data.

Click to Tweet

These are reasons why one of our most important offerings is our data assessment and the subsequent implementation of data strategy. We first take inventory of what customers already have in place, guide stakeholders across the enterprise to audit and get buy-in on what data sources are important, agree on definitions, and then help organize all of the data in a data warehouse and/or a data lake. And because there will be multiple representations of the same data coming from multiple source systems, we often act as a neutral third party to broker conversations and help establish buy-in on which data source is “more correct” in certain situations. A strong data governance program helps businesses when more and more data sources enter their data warehouse or data lake.

Why do so many data science projects fail, even among companies with a strong data foundation?

It is estimated that 85% of all data science projects fail. We can safely guess that the failure rate is much lower when a strong data foundation is in place, but the failure rate is obviously greater than 0%.

A big factor is poor project management—more specifically, poor definition of goals and insufficient collaboration with stakeholders. Many organizations improperly assume that they can send a data scientist to work in a vacuum to do their magic, and the data scientist will emerge with a perfect model that will be universally accepted and adopted. As with any type of project, stakeholders— the people who understand and drive the business—must work with the people implementing the project from start to finish.

Another factor is that people assume that data will always be predictive—or be “predictive enough.” Predictive data is said to have “signal,” and the reality is that many datasets have no or very weak signal. Technically speaking, this presents a problem because data scientists often do not discover weak signals until after several months of work. Culturally speaking, this presents a problem because organizations may not be ready to trust the results of the model that have “low confidence,” or maybe they didn’t define an acceptable confidence level at the initiation of the project.

These factors are why an upfront data science readiness assessment can radically increase a data science project’s likelihood for success. This type of assessment helps ensure organizations are ready before initiating a data science project, and it has three main components:

Project definition: Establish the goal(s) and value of the proposed data science project.
Technical readiness: Profile and quantify the quality of input data and its signal.
Cultural readiness: Ensure that stakeholders understand (and accept!) that data science yields results with varying degrees of uncertainty/confidence (this is especially important for stakeholders who may be used to looking at reports that are 100% precise), then define an acceptable level of confidence so the outcomes will be trusted throughout the organization.

But be aware that an assessment alone does not ensure success! Based upon the findings of an assessment, organizations must have the discipline to postpone a data science project until they have remedied any technical or cultural gaps that were identified.

Closing remarks and some food for thought

Companies are being forced to reassess strategies and business models, but I am optimistic about their ability to quickly adapt to change. I have seen many companies use their data assets and cloud resources to expertly navigate new and unexpected challenges and expand business in untapped markets. I believe companies that take this opportunity to take more calculated risks with AI and machine learning will flourish and set themselves up for many future successes. I look forward to what will come from this ingenuity!

Full interview available for download.

Talk With a Data Analytics Expert

"*" indicates required fields

First Name*

Last Name*

Company Email*

Phone Number*

Job Position

Company Name*

Company Location*

State*

Province*

How did you hear about us?

Comments*

By submitting this form, I understand Analytics8 will process my personal information in accordance with their privacy policy, and may contact you about products and services. You can opt out of our communications at any time by unsubscribing here.

CAPTCHA