New tool helps balance data privacy and utility

An SOE team is using AI to quantify the accuracy of conclusions from privacy-enhanced data in the social and behavioral sciences.
Headshots of Mark Hempstead, Furkan Sarikaya, Shaohua Lu, and Johes Bater

Improving student learning and teaching techniques can help make the classroom a more inclusive and effective place for all—and is increasingly important in an age of advanced technology that is rapidly changing how students learn and instructors teach. Research from behavioral and social sciences can significantly inform the way classroom environments develop in the coming years. Progress within these areas heavily relies on qualitative data, (e.g., surveys, observations, etc.), but balancing the privacy, accuracy, and the utility of conclusions based on qualitative data remains a challenge.  

A team of SOE students and faculty recently developed a Conclusion Based Utility Evaluation (CBUE), which uses AI to analyze the accuracy of conclusions from privacy-enhanced data. Their research, titled “CBUE: Conclusion Based Utility Evaluation for Differentially Private Categorical Data,” was recently selected for publication at the 2026 Institute of Electrical and Electronics Engineers (IEEE) Symposium on Security and Privacy, and could be applied to various datasets to advance data protection and usability that could inform policies and identify issues in learning environments.

The research team included first author and M.S. student in computer engineering Furkan Sarikaya (EG25) who is now a Ph.D. student in electrical and computer engineering, M.S. student in data science Shaohua Lu (EG25), Assistant Professor Johes Bater of the Department of Computer Science, and Professor Mark Hempstead of the Department of Electrical and Computer Engineering. 

Evaluating conclusions from the “noise” 

In the social and behavioral sciences, scientists study learning through experiments in which participants are tested and observed. Participants share personal information such as medical and student records, but conclusions from this research must not contain information that could identify an individual participant.  

A common way to maintain privacy in conclusions based on qualitative data is by adding “noise,” which injects slight alterations to mask datasets and protect identifiable information of individuals. However, this is often done at the cost of making the data less useful or less accurate.  

CBUE uses AI to evaluate the accuracy and utility of human-drawn conclusions after noise has been added. This new tool can help ensure that data doesn't become skewed and remains useful.  

“By using AI to mimic the reasoning of human researchers, CBUE provides a new lens to quantify the utility of privacy-enhanced data,” said Hempstead. “It highlights when conclusions remain accurate, when they shift, and what that means for real-world decision making. More importantly it can be repeated thousands of times, which would otherwise require many hours and teams of human researchers.” 

CBUE uses a large language model (LLM) to draw conclusions from noisy data, compare it to the “ground truth” conclusion, measure the magnitude of error, and generate a utility score. The team tested this method on two learning sciences datasets, showcasing its potential to be applied across the field. They found CBUE offers a more comprehensive analysis and quantification of the utility-privacy tradeoff as opposed to other common statistical methods. 

Informing how to navigate student challenges  

This research is part of a larger effort to explore ambiguity, uncertainty, and confusion in STEM education funded by a National Science Foundation (NSF) Growing Convergence Research grant. The goal of the work is to understand how to help students engage with challenges in science, technology, engineering, and mathematics (STEM) environments. With deep strengths in the study of teaching pedagogy, STEM education, AI, data science, and cognitive science, Tufts is uniquely suited to drive this research. 

CBUE could play a helpful role in protecting data to encourage more behavioral and social study participants, while also ensuring research conclusions remain accurate and meaningful to inform the development of classroom environments. 

This May, Hempstead will present CBUE at the IEEE Symposium on Security and Privacy. Hempstead’s research group, the Tufts Computer Architecture Lab, advances high performance computing, energy efficiency, machine learning, and more. 

With the growing importance of social and behavioral sciences to help understand student challenges and inform teaching and learning strategies, CBUE offers a new way to balance the privacy and utility of qualitative research conclusions that are essential to driving this research forward.