Abstract:
Missing data arises in many situations and poses challenges in data analysis. It may seriously compromise inferences if not handled appropriately. Kernel machines, which are best known by the support vector machines, have appearing advantages, such as computational ease and robustness with respect to distributional assumption. In this research, we develop new kernel machines to solve inferential problems under three different types of missing data.
The first type of missing data concerns with missing responses. We develop two new kernel machines, which can be used for both regression and classification. The first proposed kernel machine uses only the complete cases. It is subject to some assumption limitations. The second proposed one is a doubly-robust kernel machine which overcomes such limitations regardless of the misspecification of either the missing mechanism or the conditional distribution of the response. The second type of missing data considers the occurrence of missing data in covariates. We develop a family of doubly robust kernel machines for classification assuming that the missing mechanism is missing at random. We construct a novel convex augmented loss function using inverse probability weighting, multiple imputation, and surrogacy. The third type of missing data concerns a special case of missing responses in multiple instance learning, where only one summarized response of a group (bag) is observed. We cast the multiple instance problem as a classification with nonignorable missing responses problem and develop three versions of the EM algorithm using linear, kernel machine, and neural network classifiers to accommodate different levels of the data complexity.
