Title: | Tree Diagram |
---|---|
Description: | Visualizing cuts for either axis-align or non axis-align tree methods (e.g. decision tree, random tessellation process). |
Authors: | Wen Tian (Wendy) Wang,Lloyd Elliott |
Maintainer: | Wen Tian (Wendy) Wang <[email protected]> |
License: | BSD_2_clause + file LICENCE |
Version: | 0.1.1 |
Built: | 2024-11-07 04:42:06 UTC |
Source: | https://github.com/teemteemwang/treediagram |
A dataset involves 9 quantitative predictors and a binary variable, indicating the present or absent of breast cancer.
data(cancer)
data(cancer)
A data frame with 116 rows and 10 variables:
Age (years)
BMI (kg/m2)
Glucose (mg/dL)
Insulin (µU/mL)
Optimal cut-off value for homeostasis model assessment(HOMA) index of insulin
Leptin (ng/mL)
Adiponectin (µg/mL)
Resistin (ng/mL)
MCP-1(pg/dL)
indicating the presence or absence of breast cancer. 1=Health controls;2=Patients
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra#
Patrício, M., Pereira, J., Crisóstomo, J., Matafome, P., Gomes, M., Seiça, R., & Caramelo, F. (2018). Using Resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer, 18(1).
treeDiagram
generates a tree diagram of any tree-based method and save automatically into user's current working directory. The classifying tree acts on the points provided in 'data' variable. The tree is specified by 'treedat' (for example, this could be a decision tree, or a character string generated from extended Newick's format, as described below). Both these variables must be provided. Output tree diagram will also provide tree information (e.g. node number, variable and value used at a split, minimum and maximum values of the split variable) for the first three nodes.
treeDiagram(data,treedat,cat_var,filename,pic_height=10,pic_width=10)
treeDiagram(data,treedat,cat_var,filename,pic_height=10,pic_width=10)
data |
An non-empty data frame.Variable names should not contain speical characters such as ">", "<", or "=" |
treedat |
A character string (in extended Newick's format as described below) or the first object returned by |
cat_var |
A character string indicaing the predictor(classification) variable name in tree method. |
filename |
A character string. The name of output plot |
pic_height |
A real positive numeric value. This argument is optional, and the default argument is set as 10 |
pic_width |
A real positive numeric value. This argument is optional, and the default argument is set as 10 |
treeDiagram
visualizes hierarchical clustering by projecting each tree split into an one dimensional density plot of the partitioned data. These projections are then arranged through rotation and translation to indicate the topology of the tree. As these projections are rotated and translated, a single plot with a fractal-like organisation is formed. The tree diagram allows users to access the quality of cut in terms of linearly separability, depth, balance of tree and distribution and classification of partitioned data in each cut.
Predictor variable must be a binary factor variable.
The extended Newick's format is revised such that it follows the following rules:
If a parent node has two children nodes, use the form of "variable = value"
If a parent node has only one child node, use the form of "var < value" if child node is on the left; otherwise use "var > value" for child node on the right
parent node is always at the end of brackets.
tip node is written in the form of "var = value"
tree is read from left to right
Six required packages, "ggplot2","ape","cowplot","tree","stringr", will be automatically installed if user has not yet installed.
Wen Tian (Wendy) Wang ([email protected]), Lloyd T. Elliott
## Not run: # read breast cancer data from UCI database website cancer <- read.csv( url("https://archive.ics.uci.edu/ml/machine-learning-databases/00451/dataR2.csv")) # make sure predictor variable is factored cancer$Classification <- factor(cancer$Classification) summary(cancer) #optional step giving an overview of the dataset # set working directory (this step is optional) # setwd("~/user location") # e.g.1 : draw tree diagram according to the first object returned by 'tree()' # create decision tree library(tree) t_cancer <- tree(Classification ~ ., data=cancer) # plot tree diagram and save to your working directory treeDiagram(cancer,t_cancer[[1]],"Classification","tree diagram for tree()") # e.g.2 : draw tree diagram giving a newick format file of a tree # newick format string of a decision tree for breast cancer breast_cancer <- paste0( "(((BMI=25.745,BMI=29.722)Resistin=13.248)Age>44.5,", "(((((Age=70)Adiponectin<9.3482)BMI<32.275)Glucose<", "111)Leptin>7.93315)Age>48.5)Glucose=91.5;") # plot tree diagram and save to your working directory treeDiagram(cancer,breast_cancer,"Classification","tree diagram for newicks format file") ## End(Not run)
## Not run: # read breast cancer data from UCI database website cancer <- read.csv( url("https://archive.ics.uci.edu/ml/machine-learning-databases/00451/dataR2.csv")) # make sure predictor variable is factored cancer$Classification <- factor(cancer$Classification) summary(cancer) #optional step giving an overview of the dataset # set working directory (this step is optional) # setwd("~/user location") # e.g.1 : draw tree diagram according to the first object returned by 'tree()' # create decision tree library(tree) t_cancer <- tree(Classification ~ ., data=cancer) # plot tree diagram and save to your working directory treeDiagram(cancer,t_cancer[[1]],"Classification","tree diagram for tree()") # e.g.2 : draw tree diagram giving a newick format file of a tree # newick format string of a decision tree for breast cancer breast_cancer <- paste0( "(((BMI=25.745,BMI=29.722)Resistin=13.248)Age>44.5,", "(((((Age=70)Adiponectin<9.3482)BMI<32.275)Glucose<", "111)Leptin>7.93315)Age>48.5)Glucose=91.5;") # plot tree diagram and save to your working directory treeDiagram(cancer,breast_cancer,"Classification","tree diagram for newicks format file") ## End(Not run)