A visualization of just the path from the root to a decision tree leaf. We pushed through because it seemed likely that the machine learning community would find these visualization as useful as we will. boston = load_boston() viz.view() # pop up window to display image, clf = tree.DecisionTreeClassifier(max_depth=3) Here's a sample visualization for a tiny decision tree (click to enlarge): This article demonstrates the results of this work, details the specific choices we made for visualization, and outlines the tools and techniques used in the implementation. Note that the height in the Y axis of the stacked histogram is the total number of samples from all classes; multiple class counts are stacked on top of each other. SAS visualization (best image quality we could find with numeric features), the decision nodes show how the feature space is split, the split points for decision nodes are shown visually (as a wedge) in the distribution, the leaf size is proportional to the number of samples in that leaf, All colors were handpicked from colorblind safe palettes, one handpicked palette per number of target categories (2 through 10), We use a gray rather than black for text because it's easier on the eyes, We draw outlines of bars in bar charts and slices in pie charts. { Because we wanted scalable, vector graphics, we tried importing SVG images initially but we could not get graphviz to accept those files (pdf neither). We added a topng() method so users of Jupyter notebook can use Image(viz.topng()) to get inline images. }, cmd = ["dot", "-Tpng", "-o", filename, dotfilename] The box plot also doesn't show the distribution of target values nearly as well as a strip plot. For example, combining the histograms of nodes 9 and 12 yields the histogram of node 8. Any path from the root of the decision tree to a specific leaf predictor passes through a series of (internal) decision nodes. That is definitely the case with this decision tree visualization software. We used color to highlight an important dimension (target category) because humans quickly and easily pick out color differences. In Edward Tufte's seminar I learned that you can pack a lot of information into a rich diagram, as long as it's not an arbitrary mishmash; the human eye can resolve lots of details. As you can see, each AGE feature axis uses the same range, rather than zooming in, to make it easier to compare decision nodes. To highlight the decision-making process, we have to highlight the comparison operation. It is also uncommon for libraries to support visualizing a specific feature vector as it weaves down through a tree's decision nodes; we could only find one image showing this. But, those bar charts are hard to interpret because they have no horizontal axis. Construction stops when some stopping criterion is reached, such as having less than five observations in the node. The visual appearance of a treemap is highly configurable. Unfortunately, that meant we had to specify the actual size we wanted for the overall tree using an HTML table in graphviz using width and height parameters on tags. X_train = boston.data As before, our decision nodes show the feature space distribution, this time using a feature versus target value scatterplot. Scikit uses the same visualization approach for decision tree regressors. clf.fit(wine.data, wine.target), viz = dtreeviz(clf, wine.data, wine.target, target_name='wine', Each leaf in the decision tree is responsible for making a specific prediction. Categorical variables must be one hot encoded, binned, label encoded, etc... To train a decision node, the model examines a subset of the training observations (or the full training set at the root). The prediction leaves are not very pure because training a model on just a single variable leads to a poor model, but this restricted example demonstrates how decision trees carve up feature space. feature_names=boston.feature_names, To highlight how decision nodes carve up the feature space, we trained a regressor and classifier with a single (AGE) feature (code to generate images). Here are two sample trees showing test vectors (click on images to expand): The test vector x with feature names and values appears below the leaf predictor node (or to the right in left-to-right orientation). As with the regressor, the feature space of a left child is everything to the left of the parent's split point in the same feature space; similarly for the right child. We use a pie chart for classifier leaves, despite their bad reputation. Then we noticed that graphviz does not properly handle text in HTML labels when generating SVG. In terms of interaction, TreeMap provides a zooming interface as well as the We use matplotlib to generate the decision and leaf nodes and, to get the images into a graphviz/dot image, we use HTML graphviz labels and then reference the generated images via img tags like this: The 94806 number is the process ID, which helps isolate multiple instances of dtreeviz running on the same machine. viz.save("boston.svg") # suffix determines the generated image format LSTAT3 -> leaf5 [penwidth=0.3 color="#444443" label=<>] Occasionally, we used slightly different parameters on the dot command and so we just directly call run() like this for flexibility: We also use the run() function to execute the pdf2svg (PDF to SVG conversion) tool, as described in the next section. As we descend through decision nodes, the sample AGE values are boxed into narrower and narrower regions. In this section, we collect the various decision tree visualizations we could find and compare them to the visualizations made by our dtreeviz library. All visual elements had to be motivated. linked TreePlot and TreeTable views complement the analytic process. We set a low alpha for all scatterplot dots so that increased target value density corresponds to darker color. We are definitely not visualization aficionados, but for this specific problem we banged on it until we got effective diagrams. You might know Terence as the creator of the ANTLR parser generator.). We force the horizontal axis range to be the same for all PEG decision nodes so that decision nodes lower in the tree clearly box in narrower regions that are more and more pure. We scrunched that into a strip plot. Training of a decision node chooses feature xi and split value within xi's range of values (feature space) to group samples with similar target values into two buckets. Copyright © 2006-2020 Macrofocus GmbH - All rights reserved - Swiss Made. For regression, similarity in a leaf means a low variance among target values and, for classification, it means that most or all targets are of a single class. For decision nodes along the path to the leaf predictor node, we show an orange wedge at position xi in the horizontal feature space. For example, we couldn't find a library that visualizes how decision nodes split up the feature space. viz.view(), shadow_tree = ShadowDecTree(tree_model, X_train, y_train, feature_names, class_names), LSTAT3 -> leaf4 [penwidth=0.3 color="#444443" label=<>] Details on demand are available in the form of pop-ups whose content can be freely customized. Here's a classifier tree trained on the USER KNOWLEDGE data, again with a single feature (PEG) and with nodes labeled for discussion purposes: Ignoring color, the histogram shows the PEG feature space distribution. This makes the comparison easy to see; if the orange wedge is to the left of the black wedge, go left else go right. Let's start with the default scitkit visualization of a decision tree on the well-known Iris data set (click on images to enlarge).