Last updated on April 14, 2020
Industry Insights From A Data And Analytics CTO
By Patrick Vinton
During a Q&A session with Analytics8 CTO, Patrick Vinton, he discusses industry trends and shares insights into how companies are—and should be—facing topical challenges. Discussion topics include: the urgency of cloud adoption, how COVID-19 is affecting business behaviors, how to face these challenges, and opportunities stemming from the blurred lines between traditional and advanced analytics.
What kinds of trends are you seeing in the data and analytics world?
The most significant movement is truly adopting the cloud in earnest. Customers are using cloud-based resources more and more for storage and compute power, and I don’t see this slowing down or ever turning back. We’ve been in the middle of the cloud movement for several years, and we’ve guided many customers on their journey to the cloud. During our tenure in this world, we’ve seen a pretty even split within our customer base of organizations using AWS and Azure, with some use of Google Cloud Platform. This seems to contradict industry statistics that show much stronger adoption of AWS, but I contend the numbers even out when you focus on data and analytics use cases. Outside the not-so-surprising aforementioned increase in cloud Infrastructure-as-a-Service (IaaS) interest, we’re also seeing a much stronger interest and adoption of Platform- and Software-as-a-Service (PaaS and SaaS) cloud offerings. For example, users of more turnkey products like Snowflake are mostly insulated from the lower level underpinnings like underlying operating systems, file storage strategies, the intricacies of distributed computing and data backups, and all of the maintenance and patches and updates therein—things you often must directly manage when using lower level IaaS components of AWS and Azure.Why the trend toward the cloud?
There are OpEx vs CapEx reasons why accounting departments and budget owners prefer subscription pricing models as part of cloud offerings, but above and beyond that, organizations like the flexibility to start small—quickly spin up an environment, prototype a solution, and build a proof of concept—and then either cut bait (i.e. abandon the disposable cloud resources) or quickly scale to the moon. Businesses like the ability to take low-cost risks, and subscription pricing models coupled with disposable cloud resources make this possible.Businesses like the ability to take low-cost risks, and subscription pricing models coupled with disposable cloud resources make this possible.
How does the cloud play a factor in the current work-from-home climate?
Right now, more than ever, having nebulous resources somewhere in the cloud is very attractive. Unless you’re Amazon/Microsoft/Google, you don’t need to have people working in a physical server room on premises doing server maintenance. Your workforce, including end users and admins of all flavors, is able to work remotely anywhere using cloud resources. While remote work wasn’t historically a primary driver to cloud, it is very much top of mind now. Even companies that resisted remote work in the past have discovered that they can be productive with a remote workforce, sometimes even more so than before. Remote accessibility to server resources is a must, whether they are full-time employees or project consultants.If the cloud movement was already in motion, how are companies’ behaviors going to change in the face of the COVID-19 response regarding data and analytics?
In many ways, we in the data and analytics industry have been down this road before in the not-so-distant past. During the Global Financial Crisis of 2008-2009, there were two main themes we saw regarding data and analytics: (1) deal with a known and trusted entity, and (2) do more with less. This notion of “do more with less” is simple on the surface—run more efficiently. Here’s a very basic example: Increase market share with fewer marketing and sales resources. However, even with this simple example, the analytical approach can be multifaceted.- Descriptive analytics: Looking backward, what types of customers responded to what types of campaigns?
- Predictive analytics: What is the expected response rate to the upcoming campaign? What customer profile is most likely to respond?
- Prescriptive analytics: What specifically should I offer to a customer once they have responded to a campaign?
- Easier access to disposable, low-risk, but very powerful compute resources at their fingertips.
- Several AI and ML tools are now available that didn’t exist a decade ago.
- AI and ML are now mainstream thinking, and resistance to cloud resources has waned.
How does the increased availability of data impact the general public’s view of COVID-19?
The general public is becoming more data-driven in their day-to-day lives. A wide-scale and quantum leap in data literacy is a good thing, albeit driven by something not so positive. The average news consumer has become familiarized and adopted en masse the idea of looking at outbreak projection graphs. This means they—unknowingly or not—have embraced the notion of predictive analytics and what-if scenarios. While this notion may seem trite to a seasoned data analyst, it wasn’t basic for the vast majority of the public, who now expect this kind of analysis.How are you seeing businesses today use advanced analytics?
Modern front-end analytical tools have blurred the lines between traditional and advanced analytics, and that’s a very good thing. Industry pundits sometimes call this “augmented analytics,” and while I agree with this notion and like that term, I don’t think that end users really care what specific type of analytics is employed as long as they can make decisions based upon what they’re looking at. The lines are becoming blurry on the backend, too. For example, database vendors are exposing powerful ML and AI algorithms via SQL. I’m super excited about the lines getting blurred up and down the stack because ultimately, this means it’s easier to empower users with better and better analytics. These advances mean that advanced analytics are no longer reserved for industries that have historically been on the bleeding edge of analytics, and advanced analytics are no longer reserved for data scientists. We’re now at a point where companies know they have to squeeze as much out of their data as possible in order to remain competitive. They know that if they don’t unlock insights buried in their data via enhanced analytics of any flavor, one of their competitors will.While lots of companies are ready to make the jump to “true” data science, the data scientists are often hesitant because they don’t trust the data. What’s the disconnect?
Data prep—leading to good, clean, reliable, and analyzable data—is arguably more important than the models themselves, but data prep is often the least favorite part of a data scientist’s job. Naturally, they’d rather be building models because that is usually more fun. On top of that, sophisticated data prep is typically not part of a data scientist’s formal training, and they simply don’t have the years of practice building data stores (and the data models and semantic layers on top of said data stores) that the traditional ETL developers and data modelers would. Therefore, it is prudent and incumbent upon an organization to ensure that data scientists—and all analysts, for that matter—begin their analysis working from a strong data platform and foundation. In order to get a strong data foundation, you need to understand the sources of data across the organization, know the best way of storing the data (data warehouse vs. a data lake), and what types of analyses may be performed. Analysts and data scientists should be spending their time analyzing data, not organizing and wrestling with it.Analysts and data scientists should be spending their time analyzing data, not organizing and wrestling with it.
Analysts and data scientists should be removed from the burden of explaining and justifying source data.
Why do so many data science projects fail, even among companies with a strong data foundation?
It is estimated that 85% of all data science projects fail. We can safely guess that the failure rate is much lower when a strong data foundation is in place, but the failure rate is obviously greater than 0%. A big factor is poor project management—more specifically, poor definition of goals and insufficient collaboration with stakeholders. Many organizations improperly assume that they can send a data scientist to work in a vacuum to do their magic, and the data scientist will emerge with a perfect model that will be universally accepted and adopted. As with any type of project, stakeholders— the people who understand and drive the business—must work with the people implementing the project from start to finish. Another factor is that people assume that data will always be predictive—or be “predictive enough.” Predictive data is said to have “signal,” and the reality is that many datasets have no or very weak signal. Technically speaking, this presents a problem because data scientists often do not discover weak signals until after several months of work. Culturally speaking, this presents a problem because organizations may not be ready to trust the results of the model that have “low confidence,” or maybe they didn’t define an acceptable confidence level at the initiation of the project. These factors are why an upfront data science readiness assessment can radically increase a data science project’s likelihood for success. This type of assessment helps ensure organizations are ready before initiating a data science project, and it has three main components:- Project definition: Establish the goal(s) and value of the proposed data science project.
- Technical readiness: Profile and quantify the quality of input data and its signal.
- Cultural readiness: Ensure that stakeholders understand (and accept!) that data science yields results with varying degrees of uncertainty/confidence (this is especially important for stakeholders who may be used to looking at reports that are 100% precise), then define an acceptable level of confidence so the outcomes will be trusted throughout the organization.
Closing remarks and some food for thought
Companies are being forced to reassess strategies and business models, but I am optimistic about their ability to quickly adapt to change. I have seen many companies use their data assets and cloud resources to expertly navigate new and unexpected challenges and expand business in untapped markets. I believe companies that take this opportunity to take more calculated risks with AI and machine learning will flourish and set themselves up for many future successes. I look forward to what will come from this ingenuity! Full interview available for download.Talk With a Data Analytics Expert
Key Takeaways
- Cloud adoption continues to accelerate, with organizations embracing IaaS, PaaS, and SaaS for flexibility, scalability, and reduced infrastructure management.
- The work-from-home shift has underscored the importance of remote access and cloud-based environments for both efficiency and continuity.
- Economic pressures, like those seen during the COVID-19 pandemic, are pushing businesses toward predictive and prescriptive analytics to “do more with less.”
- Public data literacy has increased, with people becoming more familiar with concepts like predictive analytics through exposure to COVID-19 data reporting.
- Modern analytics tools are blurring the lines between traditional reporting and advanced techniques like machine learning, enabling broader adoption.
- Clean, reliable data is foundational to any analytics or data science initiative, yet data preparation remains an undervalued and undertrained skill set.
- Many data science projects fail due to unclear goals, poor stakeholder collaboration, or weak signal in the data, highlighting the need for readiness assessments.
- Organizations that invest in data strategy, governance, and cultural alignment are better positioned to succeed with AI and advanced analytics initiatives.
