===== Random Forest Visualization =====
==== Facts ====
{{ :assignment:natalia.png?300|}}
^ Type           | master project|
^ Place          | internal      |
^ Supervisors    | Michel Westenberg |
^ Student        | Natalia Kuznetsova |
^ Thesis         | [[http://alexandria.tue.nl/extra1/afstversl/wsk-i/Kuznetsova_2014.pdf|download]]|
^ start/end date |  - 9/8/2014   |


==== Abstract ====
Classification is the process of assigning a class label to an observation based on its
proprieties or attributes. A classification algorithm is applied to a data set, producing a
model. By studying the model, insights about the data set structure can be gained. The
benefits that a model can bring depend on the model. In this work, a Random Forest
model is used for the analysis of data. A Random Forest model is explored by means
of visualization. The results include this report and the prototype of a visualization
analysis tool.

The tool, named ReFINE for Random Forest INspEctor, consists of several visualizations for a Random Forest model. ReFINE provides visualizations for Random Forest components - trees, and its special feature: proximity measure, variable importance, interactions and prototypes. Each of these aspects is presented with a different visualization technique; all the visualizations are integrated together to show the connections between them and allow a user to discover patterns in data sets. The effectiveness of the approach is validated with various data sets, including generated and real data.

As a result, ReFINE allows to investigate data, its most importance variables, theirs
split points, connection between instances and their distribution.