During a Q&A session with Analytics8 CTO, Patrick Vinton, he discusses industry trends and shares insights into how companies are – and should be – facing topical challenges. Discussion topics include: the urgency of cloud adoption, how COVID-19 is affecting business behaviors, how to face these challenges, opportunities stemming from the blurred lines between traditional and advanced analytics, and more.
The most significant movement is truly adopting the cloud in earnest. Customers are using cloud-based resources more and more for storage and compute power, and I don’t see this slowing down or ever turning back. We’ve been in the middle of the cloud movement for several years, and we’ve guided many customers on their journey to the cloud. During our tenure in this world, we’ve seen a pretty even split within our customer base of organizations using AWS and Azure with some use of Google Cloud Platform. This seems to contradict industry statistics that show much stronger adoption of AWS, but I contend the numbers even out when you focus on data and analytics use cases.
Outside of the not-so-surprising aforementioned increase in cloud Infrastructure-as-a-Service (IaaS) interest, we’re also seeing a much stronger interest and adoption of Platform- and Software-as-a-Service (PaaS and SaaS) cloud offerings. For example, users of more turnkey products like Snowflake are mostly insulated from the lower level underpinnings like underlying operating systems, file storage strategies, the intricacies of distributed computing and data backups, and all of the maintenance and patches and updates therein – things you often must directly manage when using lower level IaaS components of AWS and Azure.
There are OpEx vs CapEx reasons why accounting departments and budget owners prefer subscription pricing models as part of cloud offerings, but above and beyond that, organizations like the flexibility to start small – quickly spin up an environment, prototype a solution, build a proof of concept – and then either cut bait (i.e. abandon the disposable cloud resources) or quickly scale to the moon. Businesses like the ability to take low-cost risks, and subscription pricing models coupled with disposable cloud resources make this possible.
Businesses like the ability to take low-cost risks, and subscription pricing models coupled with disposable cloud resources make this possible.
Right now more than ever, having nebulous resources somewhere in the cloud is very attractive. Unless you’re Amazon/Microsoft/Google, you don’t need to have people working in a physical server room on premise doing server maintenance. Your workforce, including end users and admins of all flavors, is able to work remotely anywhere using cloud resources. While remote work wasn’t historically a primary driver to cloud, it is very much top of mind now. Even companies who resisted remote work in the past have discovered that they can be productive with a remote workforce, sometimes even more so than before. Remote accessibility to server resources is a must, whether they are full time employees or project consultants.
In many ways, we in the data and analytics industry have been down this road before in the not-so-distant past. During the Global Financial Crisis of 2008-2009, there were two main themes we saw regarding data and analytics: (1) deal with a known and trusted entity and (2) do more with less.
This notion of “do more with less” is simple on the surface — run more efficiently. Here’s a very basic example: Increase market share with less marketing and sales resources.
However, even with this simple example, the analytical approach can be multifaceted.
A decade ago, most organizations relied on descriptive analytics to do more with less.
This time around, organizations will more frequently employ more sophisticated types of analytics (i.e. predictive and prescriptive) that rely on artificial intelligence (AI), machine learning (ML), or some flavor of data science for three main reasons.
Continuing with this thought, I believe companies will dare to step out on a limb to do more innovation this time around versus sticking to safe “do more with less” initiatives.
It should also be pointed out that companies have more data available for analysis now. As companies have modernized their internal systems over the past decade, they have generated and collected more and more data. They have become more data-savvy and understand that capturing and storing data for analytics is not just a nice-to-have; it’s a critical part of being competitive.
This “more data” theme extends beyond the virtual walls of an organization; companies now also have better access to external data with which they can augment and enhance their internal data. Since current events are top of mind, here’s a basic example: When reports about infections, exposures, and testing first started rolling out, the reports showed simple raw, absolute counts. Shortly thereafter, we started seeing these numbers reported as rates. In order to calculate a rate, you need a denominator, and in this example, that’s the population of a geographic entity. And while this is a super simple example, it’s an example of an analysis that would have been extremely difficult to perform a decade ago.
The general public is becoming more data-driven in their day-to-day lives. A wide scale and quantum leap in data literacy is a good thing, albeit driven by something not so positive.
The average news consumer has become familiarized and adopted en masse the idea of looking at outbreak projection graphs. This means they – unknowingly or not – have embraced the notion of predictive analytics and what-if scenarios. While this notion may seem trite to a seasoned data analyst, it wasn’t basic for the vast majority of the public who now expects this kind of analysis.
Modern front-end analytical tools have blurred the lines between traditional and advanced analytics, and that’s a very good thing. Industry pundits sometimes call this “augmented analytics,” and while I agree with this notion and like that term, I don’t think that end users really care what specific type of analytics are employed as long as they can make decisions based upon what they’re looking at.
The lines are becoming blurry on the backend too. For example, database vendors are exposing powerful ML and AI algorithms via SQL.
I’m super excited about the lines getting blurred up and down the stack because ultimately this means it’s easier to empower users with better and better analytics.
These advances mean that advanced analytics are no longer reserved for industries that have historically been on the bleeding edge of analytics, and advanced analytics are no longer reserved for data scientists. We’re now at a point where companies know they have to squeeze as much out of their data as possible in order to remain competitive. The know that if they don’t unlock insights buried in their data via enhanced analytics of any flavor, they know one of their competitors will.
Data prep – leading to good, clean, reliable, and analyzable data – is arguably more important than the models themselves, but data prep is often the least favorite part of a data scientist’s job. Naturally, they’d rather be building models because that is usually more fun. On top of that, sophisticated data prep is typically not part of a data scientist’s formal training, and they simply don’t have the years of practice building data stores (and the data models and semantic layers on top of said data stores) that the traditional ETL developers and data modelers would.
Therefore, it is prudent and incumbent upon an organization to ensure that data scientists – and all analysts, for that matter – begin their analysis working from a strong data platform and foundation. In order to get a strong data foundation, you need to understand the sources of data across the organization, know the best way of storing the data (data warehouse vs. a data lake), and what types of analyses may be performed. Analysts and data scientists should be spending their time analyzing data, not organizing and wrestling with it.
Analysts and data scientists should be spending their time analyzing data, not organizing and wrestling with it.
Analysts and data scientists should be removed from the burden of explaining and justifying source data.
It is estimated that 85% of all data science projects fail. We can safely guess that the failure rate is much lower when a strong data foundation is in place, but the failure rate is obviously greater than 0%.
A big factor is poor project management — more specifically, poor definition of goals and insufficient collaboration with stakeholders. Many organizations improperly assume that they can send a data scientist to work in a vacuum to do their magic, and the data scientist will emerge with a perfect model that will be universally accepted and adopted. As with any type of project, stakeholders — the people who understand and drive the business — must work with the people implementing the project from start to finish.
Another factor is that people assume that data will always be predictive – or be “predictive enough.” Predictive data is said to have “signal,” and the reality is that many datasets have no or very weak signal. Technically speaking, this presents a problem because data scientists often do not discover weak signal until after several months of work. Culturally speaking, this presents a problem because organizations may not be ready to ready to trust the results of the model that have “low confidence,” or maybe they didn’t define an acceptable confidence level at the initiation of the project.
These factors are why an upfront data science readiness assessment can radically increase a data science project’s likelihood for success. This type of assessment helps ensure organizations are ready before initiating a data science project, and it has three main components:
But be aware that an assessment alone does not ensure success! Based upon the findings of an assessment, organizations must have the discipline to postpone a data science project until they have remedied any technical or cultural gaps that were identified.
Companies are being forced to re-assess strategies and business models, but I am optimistic about their ability to quickly adapt to change. I have seen many companies use their data assets and cloud resources to expertly navigate new and unexpected challenges and expand business in untapped markets. I believe companies who take this opportunity to take more calculated risks with AI and machine learning will flourish and set themselves up for many future successes. I look forward to what will come from this ingenuity!
To thrive with your data, your people, processes, and technology must all be data-focused. This may sound daunting, but we can help you get there. Sign up to meet with one of our analytics experts who will review your data struggles and help map out steps to achieve data-driven decision making.
In one hour, get practical advice that you can use to initiate or continue your move of data and analytics workloads to the cloud.
During your free one-hour cloud strategy session, we will: