Analysis of robust measures in Random Forest Regression (RFR) is an extensive empirical analysis on a new method, Robust Random Forest Regression (RRFR). The application and analysis of this tree-based method has yet to be addressed and may provide additional insight in modeling complex data. Our approach is based on the RFR with two major differences ~ the introduction of robust prediction and error statistic. The current methodology utilizes the node mean for prediction and mean squared error (MSE) to derive the in-node and overall error. Herein, we introduce and assess the use of a median for prediction and mean absolute deviation (MAD) to derive the in-node and overall error. Extensive research has shown that the median is a better prediction of the centrality of the distribution in the presence of large or unbounded outliers because the median inherently ignores these outliers basing its prediction on the ordered, central value(s) of the data. We have shown that RRFR performs well under extreme conditions; with datasets that include unbounded outliers or heteroscedastic conditions.
| File | Size | Date | Attached by | |||
|---|---|---|---|---|---|---|
| RandomForrestRegression.pdf No description | 378.17 kB | 03:32, 11 Jun 2008 | Admin | Actions | ||