Analytics8 discusses some common needs for data science applications in higher education and provides tips on how to start your data science project.

How Can Advanced Analytics Help Higher Ed Institutions?

Advanced analytics techniques help institutions of higher learning in a growing number of ways. Colleges and universities have so much critical data on their hands, like prospective student application data, fundraising dollars, department budgets, ranking metrics, student body performance and satisfaction, and more. Between student, faculty, and donor expectations and the reality of remaining competitive in the marketplace, the stakes are high in this industry. It’s time for higher ed institutions to put their data to work; and we have some tips on where to begin.


Use Case: Improve Graduation Rates

Achieving a high graduation rate while decreasing dropouts is a primary goal of every educational institution.

Tactics to improve student completion rates come in many forms and with varying price tags, but this data science technique appeals to all institutions.

Use Data Science to Identify High Risk Students

Predictive models are a great approach to help colleges identify high-risk students before their situation is in disrepair. Using traditional analytics techniques, it can be cumbersome for analysts to spot at-risk students because of data volume and the number of different factors that contribute to a student’s success. This is why many colleges are turning to data science to proactively help those students succeed.

A predictive model can examine all performance, demographic, and student activity data to systematically uncover any correlations with students not succeeding. Data points that could be examined include student demographics, grades, financial aid, and even data tracked through students ID cards, like frequency of library or dining hall visits and use of campus health services. Analyzing data from across different disciplines can help schools identify more specific problems, such as a difficult professor, or if there is a combination of adversities at hand.

Schools can also utilize a predictive model to understand which learning programs (tutoring, online class discussion portals, learning management system, etc.) are being utilized and their association with good or bad performance.

Once a school has clear insight into specific factors and characteristics that indicate probability of graduation, they can put data-driven measures in place to promote success and offer recommendations to struggling students before they fail.

Use Case: Increase Fundraising through Associated Alumni

As budgets tighten, endowments shrink, and institutional needs expand, fundraising has become a vital source of funding for institutions.

Use Data Science to Conduct More Efficient Fundraising Campaigns

Some schools have started utilizing predictive models to conduct more efficient fundraising campaigns that target individuals that are more likely to donate. The model takes information about prospective donors, such as their history and interactions with the institution, and applies data mining techniques to make predictions around likely donors, first time donors, and large sum donors. Fundraising staff can then execute more targeted campaigns to prospects, saving time and money spent on fundraising efforts.

This success story from Brown University details how they used data mining efforts to look at attributes such as age, income, college majors, home values, and class reunion attendance to raise $1.6 billion.

A divergent approach is to focus on how deeply associated students are with the institution. Some studies show that having high association-the degree to which their sense of self is attributed to their alma mater-is a common trait among donors. A graduate’s alma mater association could be measured by their interactions with their alma mater on social media, volunteer time, engagement with newsletters, or attending alma mater-sponsored events.

Though machine learning approaches may differ between schools – with some valuing demographic data and others looking at alumni association – the data is key to effective and efficient fundraising. Institutions that use advanced analytics on robust sets of data to identify prospective donors are outperforming those who use traditional and less sophisticated methods.

Use Case: Measure Student Engagement

Student engagement is increasingly viewed as one of the keys to addressing problems such as low achievement, boredom, alienation, and high dropout rates.

Leading institutions are looking to predictive technology to anticipate when, and for whom, low engagement is going to be an issue. Common metrics for this machine learning scenario include grades, dropped classes, in class and extra-curricular activities, tuition payment history, visits to the library, and trips to the medical center. This data can be supplemented with surveys, needs assessments, and analysis surrounding students who are known to exhibit low engagement.

After identifying risk factors, administrators can create a plan to proactively address poor engagement, such as one-on-one student-professor meetings, working with a tutor, or having a mentor. Each remediation effort can then provide additional perspective and data to continually refine the machine learning model.

How to Approach a Data Science Project

Starting a Data Science project doesn’t begin with installing tools and software. A lot of groundwork should be laid before setting your data scientists loose. Here’s a suggested approach for this type of project:

Get Data Science Ready

  1. Determine the objective of the initiative. For example, “improve on-time graduation rates by 2%” and “increase ROI of fundraising investments by 15%.” Outline clear objectives to prevent scope creep.
  2. Prepare and profile the data. Identify, clean, and organize that data that is needed for machine learning models. Each data element is considered a “feature” and it may or may not have an association with the predicted output, but don’t discount your data elements. Unexpected elements can help uncover insight. Data types could include:
  • Graduation data: on-time graduates, delayed graduates and drop outs
  • Education related data: professors, classes and majors
  • Activity related data: extracurricular participation, on or off campus living
  • Demographic data: hometown, high school, family income, scholarships
  1. Ensure the quality of the data is good. Good data should be granular, accurate and complete. If it is not, choose different data or conduct additional feature engineering.
  2. Discover Valuable Use Cases for Data Science. Determine the use cases where data science could bring the most value and achieve your defined objectives. Data analytics consultants with data science experience can help you quickly identify these use cases.
  3. Determine Analysis Type. Determine the type of algorithms that will be most effective for solving the problem (regression or classification). If you are predicting which students are likely to experience a delayed graduation or choice to drop out, this would be binary classification machine learning. A more complex analysis option is linear regression which could be used to predict when students will graduate. Then, determine if this analysis will be on-going or one-time.
  4. Prioritize efforts. Evaluate projects in terms of value, difficulty, and cost and prioritize them.
  5. Consolidate the data into a workspace for analysis, with special attention on the appropriate data architecture for efficient data processing.


  1. Build, train, and evaluate the models. A traditional approach could leverage R or Python. But other options include commercial off the shelf tools such as Azure ML, Google ML, AWS ML, Compellon or DataRobot that will automate a large portion of the process. Verify output against known historical results.
  2. Operationalize, improve, and maintain. If factors like tutors, class schedule, and changing majors have high correlation with negative or positive results, start considering changes to implement. For ongoing analysis, the model should be retrained as data input evolves over time. As your models mature, so should your datasets. It’s important to continue to augment your data to answer more business questions and explore more possibilities.
  3. Be responsible. While this scenario may be inspiring, we must remember to proceed with caution with the information we glean from data science methods. It is important to continue to respect student privacy policies and not use insights at the expense of the students.

Our Approach to Data Science Project


Dave Williams Dave is our Managing Director of Customer Success. He has a passion for questioning the status quo, helping our clients make smart decisions, and building business-focused solutions. He is the father of 3 young adults and enjoys exploring the mountains near his new hometown outside Salt Lake City.
Subscribe to

The Insider

Sign up to receive our monthly newsletter, and get the latest insights, tips, and advice.

Thank You!