Please use this identifier to cite or link to this item: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4699
Title: A Model for the Estimation of Land Prices in Colombo District using Web Scraped Data
Authors: Naotunna, R.A.G.
Issue Date: 22-Jun-2023
Abstract: Sri Lankan people have been showing keen interest in real estate investments, especially in the Colombo district, as these assets do not depreciate with time like most tangible assets and as these investments cause a significant outflow of money from their overall wealth. However, at present lands in Sri Lanka are valued based on the experience and judgment of the individual valuation officers which could be highly subjective and questionable as the way of analyzing the features and providing a value could vary from person to person. In an attempt to address the above-mentioned issue, this research focuses on developing a machine learning model to estimate the land prices in the Colombo district by utilizing web scraped data. To achieve the above objective, web advertisements posted in the ikman.lk on lands for sale in the Colombo district for a 3 months period were scraped and obtained the land related data. These data were amalgamated with land price determinants data obtained from other web sources and formed the dataset which contained 3725 records distributed over 43 land price determinants. Further, when developing the required dataset, steps have been taken to collect data about different sub-categorical levels of each price determinant as it could add more value and make the dataset being built more meaningful. This dataset is utilized to fit five machine learning algorithms, namely; Multiple linear regression, Random Forests Regression, Support Vector Regression, Extra Trees Regression and Extreme Gradient Boosting. The performance of each machine learning model is gradually increased through feature reduction and hyper-parameter optimization. In feature reduction, two different approaches; a wrapper method (Recursive Feature Elimination) and a filter method (SelectKBest) were utilized, and selected the approach which provided the optimum results. Out of the five machine learning algorithms utilized, the hyper-parameter optimized Random Forests regression model outperformed the other linear, nonlinear, tree-based and ensemble machine learning models. The model performed exceptionally well for unseen data with R2 value of 90.24% and MAPE, MAE and RMSE values of 17.88%, 0.098065 and 0.313154 respectively
URI: https://dl.ucsc.cmb.ac.lk/jspui/handle/123456789/4699
Appears in Collections:2022

Files in This Item:
File Description SizeFormat 
2019 BA 020.pdf1.57 MBAdobe PDFView/Open


Items in UCSC Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.