Study Areas
The city of Edmonton in the Canadian province of Alberta is the subject of this study. Edmonton is the second-largest city in Alberta and the fifth-largest city in Canada, with a population of over a million. The city is renowned for its diversified economy, which includes businesses in the agriculture, technology, and oil and gas sectors. An extensive selection of residential and commercial properties may be found in Edmonton's thriving real estate market. All Edmonton residential properties that have garage availability, neighborhood, location, assessment class, distance to the city center, distance to green spaces, and distance to the LRT (Light Rail Transit) station available as variables are included in the research. The objective is to construct a model that can precisely forecast the assessed value of residential properties in Edmonton by looking at these characteristics.
DATA COLLECTION
The data was gathered and analyzed from Edmonton's official website for the current calendar year with more than 45 lakh observations and 21 variables. This data was cleaned, processed, scaled and then utilized for the model building.
DATA CLEANING
After gathering the data, the main task was to clean it for proper use of data. In this step, selecting important variables which were going to be useful for the analysis and the variables which were less important for the analysis were removed from the dataset after applying correlation analysis. Variables such as suite, house number, assessment percentage, point location were removed. Keeping missing values for some variables were necessary as they could be useful for the prediction.
Some other variables were added to the analysis which were useful for the prediction such as Distance from the center, distance from the Green Space, and Distance from the public transport(LRT station) using Euclidean Distance.
Some other variables were added to the analysis which were useful for the prediction such as Distance from the center, distance from the Green Space, and Distance from the public transport(LRT station) using Euclidean Distance.
Methods Utilized
1) Multiple Linear Regression : A number of explanatory variables are combined in a statistical process called multiple linear regression (MLR), also referred to as multiple regression. Modeling the linear relationship between the explanatory (independent) factors and response (dependent) variables is the aim of multiple line
The main benefit of using this algorithm was to establish a collection of independent factors that may be connected to property values in order to do a multiple linear regression analysis for Edmonton property evaluation. These characteristics included location in a neighborhood or zone, and other.
2) Ridge Regression: Ridge regression has a number of potential advantages for Edmonton property value assessment. For instance, it assists in determining variables that might be influencing changes in property prices over time while also taking multicollinearity's effects into account. Also, while adjusting for the impacts of other variables that might be connected with the policy or intervention, ridge regression is used to evaluate the influence of various policies or interventions on property values.
3)LASSO: The goal behind using this technique was to reduce the sum of squared errors between the predicted and actual values of the dependent variable while simultaneously including a penalty term that is proportionate to the absolute values of the regression coefficients. The less significant variables' coefficients are reduced by this penalty term, which also sets their coefficients to zero, essentially removing them from the model.
4) Neural Networks: Neural networks was used to anticipate property values based on a set of independent variables and to simulate the elements that affect property values in the context of Edmonton property assessment.
Statistical Analysis
RStudio was utilized for data cleaning, data processing and statistical analysis.
Correlation analysis was done while performing EDA using the correlation coefficient to depict the correlations between the Assessed Value and other variables, as well as the data was scaled using the Standard Scaler algorithm, in order to have good distribution of the data. All four methodologies were applied on the scaled data which revealed some interesting points from the analysis.
The most important variables were separated for analysis and they were used for the model building part which were scaled and were useful for the analysis.
Correlation analysis was done while performing EDA using the correlation coefficient to depict the correlations between the Assessed Value and other variables, as well as the data was scaled using the Standard Scaler algorithm, in order to have good distribution of the data. All four methodologies were applied on the scaled data which revealed some interesting points from the analysis.
The most important variables were separated for analysis and they were used for the model building part which were scaled and were useful for the analysis.
Disclaimer: This website's study design, data analysis, and subsequent results, discussion, and conclusions were created as part of a project for University of Alberta's Ren R 690 course. The assignment's parameters should not be construed outside of them because all data and findings are regarded as preliminary.