Computation, Vol. 12, Pages 24: Data Augmentation for Regression Machine Learning Problems in High Dimensions

3 months ago 31

Computation, Vol. 12, Pages 24: Data Augmentation for Regression Machine Learning Problems in High Dimensions

Computation doi: 10.3390/computation12020024

Authors: Clara Guilhaumon Nicolas Hascoët Francisco Chinesta Marc Lavarde Fatima Daim

Machine learning approaches are currently used to understand or model complex physical systems. In general, a substantial number of samples must be collected to create a model with reliable results. However, collecting numerous data is often relatively time-consuming or expensive. Moreover, the problems of industrial interest tend to be more and more complex, and depend on a high number of parameters. High-dimensional problems intrinsically involve the need for large amounts of data through the curse of dimensionality. That is why new approaches based on smart sampling techniques have been investigated to minimize the number of samples to be given to train the model, such as active learning methods. Here, we propose a technique based on a combination of the Fisher information matrix and sparse proper generalized decomposition that enables the definition of a new active learning informativeness criterion in high dimensions. We provide examples proving the performances of this technique on a theoretical 5D polynomial function and on an industrial crash simulation application. The results prove that the proposed strategy outperforms the usual ones.

Read Entire Article