Collaboration in multimodal data sets

Collaborative work leads to new representation of multimodal data sets that is both informationally and computationally efficient.
A brown man with shoulder-length curly black hair stands against a chalkboard with writing on it.
Associate Professor Shuchin Aeron

Associate Professor Shuchin Aeron of the Department of Electrical and Computer Engineering partnered with faculty from the Departments of Mathematics at Tufts University and North Carolina State University to develop promising new methods for the analysis of high dimensional data sets found in problems ranging from facial recognition and geophysics to video processing and medical tomography. One of the central Big Data computational challenged is efficiently and accurately extracting the most relevant information required to recognize a face or reconstruct an image from a large and noisy set of data. While well-established methods for this type of compression exist for Small Data cases, their generalization to larger problems is by no means obvious.

This is where the work of Aeron and collaborators is relevant.  Combining ideas developed by Professor Misha Kilmer in the Tufts Mathematics Department in the area of multi-linear algebra with random sampling methods, Aeron and colleagues have developed the randomized tensor singular value decomposition (tSVD). This algorithm can be used to approximate the pieces of Kilmer’s original tensor tSVD most needed for solving a given machine learning problem, but is more computationally efficient to generate, is highly parallelizable, and, in the case of facial recognition, is as accurate as the original tSVD methods.

Moving forward,  Aeron and collaborators are interested in extending these ideas to problems where the high dimensional data are structured in ways that can be exploited mathematically to reduce computation – for example, by combining the randomized tSVD with traditional data-compression methods based on Hadamard or Fourier Transforms. The group also hopes to design more sustainable methods of compression that can accommodate problems where new data is being streamed into an existing dataset.

The research conducted by Jiani Zhang, Misha Kilmer, Shuchin Aeron, and Arvind K. Saibaba is available in a paper titled “A randomized tensor singular value decomposition based on the t-product,” published this spring in Numerical Linear Algebra with Applications.