¿¬±¸ Á¦¸ñ: A machine learning prediction of PM2.5(§¶/m³) in Seoul
ÀÏÁ¤: 2020.02.14
Àå¼Ò: LW ÄÁº¥¼Ç
¿¬±¸ ¹è°æ ¹× ¸ñÀû
With increase of life quality, people have started to care more about environment that can affect our health. In the same vein, particular matter(PM)(§¶/m³) has emerged as one of the serious regional problems in Korea and many research results about PM have been reported. Moreover, in 2018, the damage cost of fine dust in Korea ran up to 3.3 billion dollars. So the government needs to make policies for PM reduction to minimize damage of PM with order of priority in variables. Accordingly, the purpose of this study is to choose a PM prediction model and show which variables are high affect to PM for it to be used as an important reference in making PM reduction policies.
¿¬±¸¹æ¹ý
The model is based on machine learning (ML) algorithm. Linear Regression, Decision tree Regression, Support Vector Regression and Random forest Regression based on boosting algorithm are used to predict PM 2.5 in Seoul. In this model, the dependent variable is monthly density of PM 2.5 in Seoul. Because level of PM 2.5 in Seoul fluctuates depending on each season(high in spring and winter but low in summer and autumn). In addition, there are many reports stating that density of PM in Korea is affected by different environmental factors of China; therefore, independent variables are divided into two sectors: from Korea and China. Independent variables from China are PM 2.5 density level in Shandong, Hebei, Jiangsu. These regions show high density of PM 2.5 from November to next April but it gets lower from May to October, a pattern similar to PM2.5 in Seoul. On the other hand, independent variables from Korea are focused on the number of cars and the ratio of west wind per month. And the number of cars divided into 3 sectors(an official vehicle, a business vehicle and personal vehicle) and each sector is divided into 4 levels(a passenger car, a van, a truck and a special vehicle). There is a little change in consumption trend of fuel over the time, but a seasonal difference in ratio of west wind is similar to that of PM2.5 in Seoul as well. Time range of all data is from Jan, 2015 to October, 2019 and the number of sample is 58. And the data set used in this study are split into 70% for train and 30% for test. After building each model, root mean square error (RMSE) is used for an evaluation.
¿¬±¸°á°ú
Each model has been initiated to choose best model for prediction of PM2.5 in Seoul and find a variable importance. The result shows that Random Forest Regression is best model for prediction of PM2.5 and the variables ‘Yangzhou city, ratio of west wind and month’ have high priorities in affect of prediction. But this study has limit as follow. The number of sample is small, because the Chinese government offers PM2.5 data of some of cities only from Jan, 2015. The 58 samples are used to but it is not enough to initiate machine learning. And it causes insufficient train for each model and high RMSE of each model, even if a hyperparameter tuning has been done. So in a following research, by changing a train-test data split ratio, part and adding new variable for modeling, it is expected that the problem of sample size will be solved and a performance of model will be improved.
- ´ÙÀ½±Û
- 2019-2020³â Á¦3Â÷ SSK Networking Symposium (ÀÌÈ£ÁØ)
- / ¿¬±¸¼Ò
- ¿¬±¸ Á¦¸ñ: Áö¿ª°æÁ¦ ¼ºÀåÀÌ ¹Ì¼¼¸ÕÁö ¹ß»ý°ú ±³¿ª¿¡ ¹ÌÄ¡´Â ¿µÇâ
ÀÏÁ¤: 2020.02.14
Àå¼Ò: LW ÄÁº¥¼Ç
¿¬±¸ ¹è°æ ¹× ¸ñÀû
¹Ì¼¼¸ÕÁö´Â ´ë±â Áß¿¡ ¶°´Ù´Ï´Â ¸ÕÁöÀÇ Áö¸§ÀÌ 10μm ¶Ç´Â 2.5μmº¸´Ù ÀÛÀº ÀÔÀÚ»ó ¹°ÁúÀÌ´Ù. ¿ì¸®³ª¶ó´Â Áß±¹ ¹Ì¼¼¸ÕÁö À¯ÀÔ°ú ±¹³» ¹Ì¼¼¸ÕÁö ¹ß»ý µîÀ¸·Î ¹Ì¼¼¸ÕÁö ÁÖÀǺ¸ ¹× °æº¸ ¹ß·ÉÀϼö°¡..
- ÀÌÀü±Û
- 2019-2020³â Á¦3Â÷ SSK Networking Symposium (Áö¾Ó¹Î)
- / ¿¬±¸¼Ò
- ¿¬±¸ Á¦¸ñ: Áß±¹ ¹× Çѱ¹ ¹Ì¼¼¸ÕÁö ¹ß»ýÀÇ Áö¿ª »ý»ê ¹× ¼Òºñ±â¹Ý ȸ°è ºÐ¼®
ÀÏÁ¤: 2020.02.14
Àå¼Ò: LW ÄÁº¥¼Ç
¿¬±¸ ¹è°æ ¹× ¸ñÀû
Çѱ¹ ¹× Áß±¹À» Æ÷ÇÔÇÏ´Â µ¿ºÏ¾Æ½Ã¾Æ Áö¿ªÀº ¼¼°è ÃÖ´ëÀÇ ¹Ì¼¼¸ÕÁö ¹èÃâÁö¿ªÀ¸·Î, ¹Ì¼¼¸ÕÁö Àú°¨À» À§ÇÑ 2000³âºÎÅÍ ÇÑ·Áß·ÀÏ 3±¹ °úÇÐÀÚµéÀÇ °øµ¿¿¬±¸¸¦ ½ÃÀÛÇØ¼ ´Ù¾çÇÑ..