Although data science is a great tool to help understand trends and anomalies, as well as to make business decisions, sometimes you need to say no to the data science project and take some necessary steps before moving forward.

More and more companies are turning to data science to help solve business problems—whether it be to identify fraud detection, optimize your supply chain, or predict customer churn. And while there is no doubt that data science can bring huge benefits, it can also result in failure without proper planning and preparation—and without proper consideration of whether to even start the project.

In this blog, we will define what data science is and what qualifies as a data science project, what types of questions you should ask and answer before starting, and when to say no to a data science project.

What is Data Science?

Data science is defined as the field of applying advanced analytics techniques—such as machine learning and predictive analytics—and scientific principles to extract meaningful and actionable insights from big data and to use them for strategic decision-making.

What Qualifies as A Data Science Project?

A data science project isn’t simply machine learning or about making predictions. In order to draw valuable information from large, complex data, a data science project requires data prep like cleansing and/or aggregation; data manipulation, such as feature engineering or transformation; and the development of statistical and machine learning models for advanced analytics—which are typically done by creating models with applications such as Python or R. Data science projects involve the entire end-to-end process of developing advanced analytics beyond just reviewing what has already happened.

What Questions Should Be Asked and Answered Before Starting a Data Science Project?

It’s easy to get swallowed up in a data science project—especially if you jump in headfirst without taking the time to identify key players in the project and asking some basic questions:

Does your data science project have a stakeholder and a champion?

First, it is important that your data science project has both a stakeholder(s) as well as a champion. The stakeholder(s) is the person who presents the business case for the project and the champion is the person who really sells it to the rest of the organization. You cannot have a successful data science project without buy-in from end users and without sign-off from the c-suite or whoever is making the financial decision to proceed.

How analytically mature is your organization?

The next step is to look at what you’re already doing with data and the level of analytics maturity within your organization. Are you currently analyzing what “has” already happened (descriptive analytics) through tools like Power BI, Qlik, Tableau, etc.? If so, then you may be ready for advanced analytics. If not, you’ll want to take the necessary steps to first analyze what has happened, and then possibly look to understand “why” something has happened (diagnostic analytics), as that might be enough information to solve your business problem more immediately and with less resources. It can also help provide insight into features for what “will happen” (predictive analytics) in the future. You don’t want to run before you can crawl—especially with analytics – as it will slow you down and likely lead to a failed project.

If you’re ready to explore more advanced analytics, then the next step is to ask and answer the following questions:

  • Have you identified a real problem that you want to solve or a specific target to predict? It is critical to have a goal that you’re trying to achieve and that others will find value in and adopt. Your analysis or prediction will be flawed if you do not know why you are doing what you are doing, and if you do not know how others will need to use it.Talk to stakeholders—business users and IT—to field questions, identify the real problem you are looking to solve, and to set clear expectations. One of the main reasons why data science efforts fail is that they are not aligned to any clear goals or objectives within the organization.In addition to bringing key stakeholders into the initial conversations, make sure to identify important KPIs. Maybe these are already ones being analyzed and could be better predicted through machine learning or automation. Or maybe finding out the levers to pull to reach the target KPI will be of importance. The important thing to note is that if you focus on needs that the organization does not actually have, you will not get the necessary buy-in.
  • Is your data organized? After you’ve identified your goal and confirmed the value of the project, you need to determine if your data is organized. You don’t necessarily need to have a fully operational data warehouse (at least not initially), but you need to have the ability to pull together relevant datasets that can be used in analysis. Additionally, for this to be sustainable, you need to have a means of acquiring new data of the same kind for future analysis.Some important considerations are whether someone has taken the time to collect the data you need and store it in a format in which it can be used. And once your data is in an accessible state, you will need to ensure that it is clean (no errors, no nulls, common formatting of names, etc.). You also need to perform data manipulation such as data normalization or converting categorical values into numeric, for example. Without organized data, you will potentially see incorrect, biased, or generally faulty results. The chance of any machine learning model performing well against improperly prepared data is very slim.
  • Do you have enough data to get adequate samples for predictions? Not having the right data can be a problem, but so is not having enough data. Most data scientists will tell you that they would rather have more data than a perfect machine learning model. This is because the more data you must train and test models against, the more consistent your results will be. If your dataset only has a small number of records, it is unlikely it will be truly informative of any future state. And while “small” can be relative to the work you are doing, making predictions off a limited sample will very rarely return as good a result as a larger sample. So, before you jump into machine learning, you will want to ensure that you have done the right amount of data collection.
  • Is your data relevant and of the right quality? You need to profile your data to ensure it is relevant and of sufficient quality to build out a model and have the right data for your specific project. Exploratory data analysis (“EDA”) is essential to understand the nature of the data and how the different fields (features) might be informative of a particular target. Identifying basic summary statistics like mean, median, and percentiles can help you determine the right path for your model.Additionally, you can get a sense of the variance in your data, which is necessary for a feature to be important. Further, knowing if you have an imbalanced target population can help establish if additional manipulation is needed in some cases, like with classification models. This is also a good time to do some rigorous data cleansing (should the dataset call for it) to ensure your results are consistent and usable. And one final key step of ensuring quality data is performing feature engineering to generate additional variables based on a combination of existing ones. Developing additional features can boost the power of your dataset even more than fine tuning a final model.
  • If you were to build out a model, would you see adoption by end-users (is the problem something they care about)? Although this is something you should identify at the beginning of a data science project, you should continuously ask this question going forward, especially as if you identify new problems in the process. Data science is a team sport. If everyone isn’t playing in sync, you’re going to lose.

You don’t need to answer all these questions right off the bat; it can be done in phases. But it’s important to understand how your ability to address each question will help determine your readiness for, as well as the success of your data science project.

What Are Clear Red Flags of When to Say No to A Data Science Project?

Although data science is a great tool to help your organization solve business problems and predict business outcomes, it’s not always the right tool to use. There are times you need to say “no” to a data science project, including when:

  • You are not ready: You don’t want to put the cart before the horse. Despite your best intentions of wanting to start a data science project, if your data is disorganized or you’re too early in the analytics maturity model, you need to say no to the project. You will not get the results you’re seeking, and this will ultimately cost you more time, effort, and money than it’s worth.

    What to do:
    Always start by looking at where you are in the analytics maturity model, and then from there, assess your data. Is your organization already data literate? Is your data organized the right way and is it relevant to the problem you are looking to solve?  Do you have good quality data and enough of it? 
  • You have confirmation bias: You don’t want to start a data science project just to prove someone’s preconceived answer—right or wrong. All stakeholders need to understand and accept that the answers may not be what they expect and could potentially require a business disruption.

    What to do:
    Approach data science with the goal of evidence-based decision-making, not decision-based evidence-making.
  • You won’t get user adoption: Even if you can get great insights out of your data science project, it won’t mean anything if you don’t get user adoption.

    What to do:
    Look to first align your targets and goals internally to ensure adoption and be sure to have a determined communication plan with resources. This is a good opportunity to show the art of the possible with diagnostic analytics first and gain buy-in from the users before jumping into machine learning and predictive analytics. Make sure to develop a resource plan that includes stakeholders and implement open communication practices during the project.
  • You don’t have the required subject matter expertise: You cannot have a successful data science project without the right subject matter expert to inform your process. This could be someone who knows enough about the problem you are looking to solve, or really any experts in your organization. It’s just as important as having great data engineers and people who know how to build machine learning models and operationalize them.

    What to do:
    Make sure you have tapped the right stakeholders internally to participate in the project. Externally, look for an analytics consultant that can help guide the project. Make sure the consultant has know-how not just with machine learning and predictive analytics, but also a consultant that has experience working on data science projects within your industry and who can translate requirements between business and tech teams. 
  • You are asked to do something illegal, illicit, or just ethically questionable: Data science should not be used for the wrong reasons, or to introduce biased beliefs into business solutions or practices. Avoid anything that promotes discrimination, reinforcing of human biases, lack of transparency, and privacy.

    What to do: Just say no.
Matt Levy Matt Levy is a Managing Consultant at Analytics8. Practicing what we call “ethical data science,” Matt specializes in making sure that our customers avoid bias when building machine learning models so that their projects bring real value to their organization. Matt wrote his Masters Capstone thesis on fantasy golf analysis, and is a consistent winner of A8 fantasy sports competitions.
Subscribe to

The Insider

Sign up to receive our monthly newsletter, and get the latest insights, tips, and advice.

Thank You!