Data-Driven Prediction of Particulate Pollution Episodes from Low-Cost Sensor Networks in a Coastal Industrial City: Lessons from Gijón (Spain)
Please login to view abstract download link
Air pollution remains a pressing global challenge, particularly in urban and industrial areas where complex emission patterns amplify exposure risks. While regulatory monitoring networks provide high-quality observations, their spatial sparsity may limit the capacity to capture local and short-lived events. Thus, low-cost sensors (LCS) represent a valuable complement, enabling dense networks, rapid deployment, and community-led data acquisition. Besides, Gijón provides a paradigmatic case study for this type of problems, being a coastal industrial region with strong topographic gradients and frequent thermal inversion episodes [1]. To investigate this effect, we designed and deployed a sensor network measuring PM levels and atmospheric conditions, each one integrating particle and meteorological probes and communication capabilities. These devices were installed across the city, following the “Valley–Mountain” conceptual model. This contribution presents ongoing research in Gijón (Spain), aimed at characterising and predicting particulate episodes (PM2.5 and PM10) linked to atmospheric stability, using a hybrid strategy combining LCS, reference instrumentation, and numerical methods (machine learning, regression and autoregressive models). Furthermore, three years of data from AEMET official records were integrated in the data. Correlation analyses revealed a coupling between inversion strength and PM levels, particularly for PM2.5, whereas PM10 exhibited higher sensitivity to short-range emission sources and wind-driven resuspension. Hence, neural network architectures were trained to predict PM levels. Different neural network configurations were analysed varying its hyperparameters. Best results were given by the three layer one using the ADAM convergence algorithm. Models based on neural network outperformed linear regression [2] and autoregressive methods, with 1-hour PM2.5 forecasts achieving R² ≈ 0.66 and RMSE ≈ 4–5 µg m⁻³. Predictability weakened at longer horizons and for PM10. This project highlights a scalable methodology for resource-constrained environments: leveraging low-cost devices, open-source pipelines, and machine learning to enhance situational awareness in areas where regulatory monitoring is insufficient. The work points toward future opportunities in calibration-as-a-service models, real-time anomaly detection, community dashboards, and international capacity building.
