Improving College Success Through Predictive Modeling

John Jay College of Criminal Justice | location New York City, NY


  • Build a predictive model using John Jay’s data to identify students who are at risk of dropping out, to aid John Jay’s efforts to provide better and more timely support to these students and reduce dropout rates.
  • Explore historical student data, using statistical analysis and machine learning, to understand and identify factors that may affect students’ decisions to drop out after having completed 90 credits (75 percent of the credits needed to earn a degree) in order to design effective interventions and improve graduation rates.
  • Develop a prototype for an application that uses student data to produce dropout risk scores for each student.


  • Built predictive models to assist in identifying students who are likely or not likely to graduate within four semesters of completing 90 credits or more of coursework.
  • Developed a prototype for a tool that generates dropout risk scores for students and gives John Jay the ability to make more informed assessments of students at risk, identify those who have the greatest likelihood of not graduating and prioritize interventions.
  • Provided insights into the factors that may contribute to student dropout that could help John Jay design more effective interventions to better support students and improve graduation rates.


College completion rates in the U.S. continue to be a concern, as nearly half of all students entering college are at risk of leaving without earning a degree, according to a recent report by the National Student Clearinghouse (NSC).

Founded in 1964, John Jay College of Criminal Justice – a senior college of The City University of New York – has evolved into the preeminent international leader in educating for justice in its many dimensions. The College offers a rich liberal arts and professional curriculum that prepares students to serve the public interest as ethical leaders and engaged citizens. The College’s community of over 15,000 students is the most diverse among the City University of New York’s senior colleges and produces leaders, scholars and heroes in policing and beyond, including forensic science, law, fire and emergency management, social work, teaching, private security, forensic psychology and corrections.

John Jay wanted to see how existing student data, coupled with data science and machine learning, may be used to help address challenges around completion rates at their college. Could they identify students who are likely to drop out and in need of support? What are some of the factors that may influence a student’s decision to leave school before earning their degree? John Jay looked to DataKind to help answer these questions and develop models and tools to support their efforts to improve graduation rates.

What Happened

In a DataKind project sponsored by the MasterCard Center for Inclusive Growth, and supported by the Robin Hood Foundation, DataKind’s data science team analyzed  more than 10 years of historical student data in an effort to help John Jay College of Criminal Justice better understand and identify students that are at a higher risk of dropping out from college or not graduating in a timely manner. More specifically John Jay wanted to learn how the characteristics and behaviours of these students may differ from those who achieve graduation and to develop tools that could help its administration team determine which students may be at risk of not graduating and pinpoint areas where support is most needed.

The team was able to obtain data from the Institutional Research Database (IRDB), a CUNY-wide system intended to support research and reporting within CUNY at large, that includes student information such as basic demographics, grade point average (GPA), number of credits earned, admissions related scores, financial aid, declared majors and more.

The team decided to focus on students who had already completed 90 credits or more of coursework, meaning they had 75 percent of the number of credits needed to graduate. Approximately 750 John Jay students who had completed 90 credits or more by the end of the Fall 2016 semester failed to register the following semester. The college wanted to better understand this particular group of students at risk and what may be affecting their decision to leave school at this stage in their college career, when they are so close to earning a degree, and see how they can improve services to promote continued enrollment and ultimately college completion.

Performing exploratory analysis of the data, the team looked to reveal various dimensions along which students who fail to graduate in a timely manner, or drop out, are different than students who graduate on time. From the initial dive, the team found that things like high school scores or college admissions test scores such as the Scholastic Assessment Test (SAT) or New York State Regent Examination don’t seem to have an impact on whether one graduates or not. However, the analysis did show correlations between graduation and factors such as a student’s GPA, the number of classes one drops and/or fails, whether a student is enrolled full or part time, choice of major, the average number of credits passed in a term, the number of times one changes their major and the amount of time it takes for one to complete 90 credits of coursework. For example, a student who has an average GPA, is completing an average of only 10 credits a term, and has failed two courses in the past, will have a higher probability of not graduating.

The team also worked to develop models and tools that would allow John Jay to identify students at risk for not graduating. More than 20 different modeling approaches, algorithms, and combinations of models were tested. In the end, the team created two sets of models using machine learning, each designed to predict the likelihood that a student will graduate within four semesters after completing a minimum of 90 credits of coursework.

The first model can be used to assess a students likelihood of dropping out at the time that they complete 90 credits and the second model can be applied to determine a students risk at various periods or semesters beyond the 90 credit mark. With the secondary model John Jay can use current student information to generate and update a student’s probability of not graduating as the student progresses past 90 credits, as it allows for changes in a students performance, enrollment status and other factors. Each of the models performed with high accuracy, correctly predicting those students who will graduate with 80 to 90 percent accuracy and correctly predicting those students who will not graduate with about 70 to 80 percent accuracy.

A prototype for an application that can be used to provide risk scores for students was also created. For each student, with a given set of characteristics and behaviours, the application is able to generate the probability of a student not graduating and provide a dropout risk score. The tool not only gives John Jay’s advisors improved capabilities  to better assess students and quickly determine those in greater need but also allows them to sort by risk scores to identify those students who are at most risk in order to better prioritize interventions. Furthermore, it provides information about what factors are driving the prediction for each individual student. John Jay advisors use the tool as an aide to prioritize interventions, rather than use it to dictate mandatory interventions; this human oversight is critical to ensure ethical outputs from the algorithm.

Next Steps

The predictive models and tools generated by the DataKind team, will serve to provide John Jay advisors with greater insight and the support they need to better assess and identify currently enrolled students who are at risk of failing to graduate. It can also help advisors equitably prioritize those with more immediate needs and inform the design of more effective interventions to prevent student dropout and improve graduation rates.

John Jay plans to introduce the prototype application into their systems so that advisors can begin to examine the results of the model in the Spring of 2018 and offer their expertise to help in refining and finalizing the model. They hope to be able to use the final model to pinpoint students who may be at risk of dropping out or failing during the Fall 2018 semester and consider potential interventions.

With all technology, there are limitations and the models developed here are no exception. Based on the predicted risk scores generated, advisors may have insights into the outcomes of students that may not be reflected in the data used to develop the model –  this added information from these experts in the field could prove useful in informing the design of potential interventions and in determining other data that would be beneficial to add to a future model. John Jay plans to begin testing interventions in the Fall of 2018.

Scroll to Top