Low-rank Singular Value Thresholding for Recovering Missing Air Quality Data

Yangwen Yu, James J.Q. Yu, Victor O.K. Li, and Jacqueline C.K. Lam
Proc. IEEE International Conference on Big Data, Boston, MA, Dec. 2017, Pages 508-513

With the increasing awareness of the harmful impacts of urban air pollution, air quality monitoring stations have been deployed in many metropolitan areas. These stations provide continuous air quality monitoring data to the public. However, due to sampling device failures and data processing errors, missing data in air quality measurement is common. Data integrity becomes a critical challenge when such data are employed for public services. In this paper, we investigate the mathematical property of air quality measurements, and attempt to recover the missing data. First, we empirically study the low rank property of the measurements. Second, we formulate the low rank matrix completion optimization problem to reconstruct the missing air quality data. The problem is transformed using duality theory, and singular value thresholding (SVT) is employed to develop sub-optimal solutions. Third, to evaluate the performance of our methodology, we conduct a series of case studies including different types of missing data patterns. The simulation results demonstrate that the proposed methodology can effectively recover missing air quality data, and outperform the existing interpolation method. Finally, we investigate the parameter sensitivity of SVT. Our study can serve as a guideline for real-world missing data recovery implementation.