Difference between revisions of "Orange: Box Plot"

From OnnoWiki
Jump to navigation Jump to search
(Created page with "Sumber: https://docs.biolab.si//3/visual-programming/widgets/visualize/boxplot.html Shows distribution of attribute values. Inputs Data: input dataset Outputs Sel...")
 
Line 10: Line 10:
  
 
     Selected Data: instances selected from the plot
 
     Selected Data: instances selected from the plot
 
 
     Data: data with an additional column showing whether a point is selected
 
     Data: data with an additional column showing whether a point is selected
  
 
The Box Plot widget shows the distributions of attribute values. It is a good practice to check any new data with this widget to quickly discover any anomalies, such as duplicated values (e.g. gray and grey), outliers, and alike.
 
The Box Plot widget shows the distributions of attribute values. It is a good practice to check any new data with this widget to quickly discover any anomalies, such as duplicated values (e.g. gray and grey), outliers, and alike.
  
../../_images/BoxPlot-Continuous-stamped.png
+
[[File:BoxPlot-Continuous-stamped.png|center|200px|thumb]]
  
 
     Select the variable you want to plot. Tick Order by relevance to order variables by Chi2 or ANOVA over the selected subgroup.
 
     Select the variable you want to plot. Tick Order by relevance to order variables by Chi2 or ANOVA over the selected subgroup.
Line 22: Line 21:
  
 
     When instances are grouped by a subgroup, you can change the display mode. Annotated boxes will display the end values, the mean and the median, while compare medians and compare means will, naturally, compare the selected value between subgroups. continuous
 
     When instances are grouped by a subgroup, you can change the display mode. Annotated boxes will display the end values, the mean and the median, while compare medians and compare means will, naturally, compare the selected value between subgroups. continuous
 +
 +
[[File:BoxPlot-Continuous-small.png|center|200px|thumb]]
  
 
     The mean (the dark blue vertical line). The thin blue line represents the standard deviation.
 
     The mean (the dark blue vertical line). The thin blue line represents the standard deviation.
Line 28: Line 29:
  
 
     The median (yellow vertical line).
 
     The median (yellow vertical line).
 
 
     If Send automatically is ticked, changes are communicated automatically. Alternatively, press Send.
 
     If Send automatically is ticked, changes are communicated automatically. Alternatively, press Send.
 
 
     Access help, save image or produce a report.
 
     Access help, save image or produce a report.
  
 
For discrete attributes, the bars represent the number of instances with each particular attribute value. The plot shows the number of different animal types in the Zoo dataset: there are 41 mammals, 13 fish, 20 birds and so on.
 
For discrete attributes, the bars represent the number of instances with each particular attribute value. The plot shows the number of different animal types in the Zoo dataset: there are 41 mammals, 13 fish, 20 birds and so on.
  
../../_images/BoxPlot-Discrete.png
+
[[File:BoxPlot-Discrete.png|center|200px|thumb]]
Example
+
 
 +
==Contoh==
  
 
The Box Plot widget is most commonly used immediately after the File widget to observe the statistical properties of a dataset. In the first example, we have used heart-disease data to inspect our variables.
 
The Box Plot widget is most commonly used immediately after the File widget to observe the statistical properties of a dataset. In the first example, we have used heart-disease data to inspect our variables.
  
../../_images/BoxPlot-Example1.png
+
[[File:BoxPlot-Example1.png|center|200px|thumb]]
  
 
Box Plot is also useful for finding the properties of a specific dataset, for instance a set of instances manually defined in another widget (e.g. Scatter Plot or instances belonging to some cluster or a classification tree node. Let us now use zoo data and create a typical clustering workflow with Distances and Hierarchical Clustering.
 
Box Plot is also useful for finding the properties of a specific dataset, for instance a set of instances manually defined in another widget (e.g. Scatter Plot or instances belonging to some cluster or a classification tree node. Let us now use zoo data and create a typical clustering workflow with Distances and Hierarchical Clustering.
Line 46: Line 46:
 
Now define the threshold for cluster selection (click on the ruler at the top). Connect Box Plot to Hierarchical Clustering, tick Order by relevance and select Cluster as a subgroup. This will order attributes by how well they define the selected subgroup, in our case a cluster. Seems like our clusters indeed correspond very well with the animal type!
 
Now define the threshold for cluster selection (click on the ruler at the top). Connect Box Plot to Hierarchical Clustering, tick Order by relevance and select Cluster as a subgroup. This will order attributes by how well they define the selected subgroup, in our case a cluster. Seems like our clusters indeed correspond very well with the animal type!
  
../../_images/BoxPlot-Example2.png
+
[[File:BoxPlot-Example2.png|center|200px|thumb]]
 
 
  
  

Revision as of 10:25, 22 January 2020

Sumber: https://docs.biolab.si//3/visual-programming/widgets/visualize/boxplot.html

Shows distribution of attribute values.

Inputs

   Data: input dataset

Outputs

   Selected Data: instances selected from the plot
   Data: data with an additional column showing whether a point is selected

The Box Plot widget shows the distributions of attribute values. It is a good practice to check any new data with this widget to quickly discover any anomalies, such as duplicated values (e.g. gray and grey), outliers, and alike.

BoxPlot-Continuous-stamped.png
   Select the variable you want to plot. Tick Order by relevance to order variables by Chi2 or ANOVA over the selected subgroup.
   Choose Subgroups to see box plots displayed by a discrete subgroup.
   When instances are grouped by a subgroup, you can change the display mode. Annotated boxes will display the end values, the mean and the median, while compare medians and compare means will, naturally, compare the selected value between subgroups. continuous
BoxPlot-Continuous-small.png
   The mean (the dark blue vertical line). The thin blue line represents the standard deviation.
   Values of the first (25%) and the third (75%) quantile. The blue highlighted area represents the values between the first and the third quartile.
   The median (yellow vertical line).
   If Send automatically is ticked, changes are communicated automatically. Alternatively, press Send.
   Access help, save image or produce a report.

For discrete attributes, the bars represent the number of instances with each particular attribute value. The plot shows the number of different animal types in the Zoo dataset: there are 41 mammals, 13 fish, 20 birds and so on.

BoxPlot-Discrete.png

Contoh

The Box Plot widget is most commonly used immediately after the File widget to observe the statistical properties of a dataset. In the first example, we have used heart-disease data to inspect our variables.

BoxPlot-Example1.png

Box Plot is also useful for finding the properties of a specific dataset, for instance a set of instances manually defined in another widget (e.g. Scatter Plot or instances belonging to some cluster or a classification tree node. Let us now use zoo data and create a typical clustering workflow with Distances and Hierarchical Clustering.

Now define the threshold for cluster selection (click on the ruler at the top). Connect Box Plot to Hierarchical Clustering, tick Order by relevance and select Cluster as a subgroup. This will order attributes by how well they define the selected subgroup, in our case a cluster. Seems like our clusters indeed correspond very well with the animal type!

BoxPlot-Example2.png


Referensi

Pranala Menarik