Data visualization

keywords: ALFA, visualization

OUTLINE

  1. Line plot
  2. Pie chart
  3. Histogram
  4. Bar plot
  5. Box plot
  6. Violin plot
  7. Scatter plot
  8. Multi-scatter plot
  9. Relationship heatmap
  10. Correlation map

Note: All the figures are editable. Once a figure is plot, a menu will appear on the right side of the GUI to manually modify different plot parameters.

Command Syntax: You can use any command as long as it contains enough information to unambiguously identify user's intentions. For example if we want to create a pie chart, any of the following commands will work:

plot pie chart
pie
pie plot
pie graph

We are going to illustrate each of the visualization functions on the following example dataset

Label Column1 Column2
Group1 250 54
Group1 350 27
Group1 234 25
Group1 223 36
Group1 210 45
Group2 45 220
Group2 35 300
Group2 25 400
Group2 33 620
Group2 27 270

1. Line plot

Line plot plots two variables against each other connected by a line segment. The figure below shows a line plot corresponding to column1.

Figure 1

2. Pie chart

A pie chart is a visualization technique used to visualize the numerical portion for different quantities. The arc length or the angle for each group is directly proportional to the quantity represented. For a categorical variable, a pie chart can be used to visualize the number of observations corresponding to each category.

For our example dataset, consider column1. If we divide the data into two groups (values greater than 30 and less than 30), the pie chart would look as follows:

Figure 2

3. Histogram

A histogram plot is used to visualize the distribution of data points for different variables in the dataset.

The figure below shows the histogram for our dataset with label as our reference/ground_truth

Figure 3

4. Bar plot

A bar plot is used to visualize and compare the distribution of different variables in terms of their mean and standard deviation. A bar plot can also be used to visualize the distribution of the same variable across multiple categories by setting the categorical variable as your reference.

The figure below shows the bar graph for our dataset with label as our reference/ground_truth

Figure 4

5. Box plot

A box plot is used to visualize and compare the distribution of different variables in terms of their median and the interquartile range.

The figure below shows the box plot for our dataset with label as our reference/ground_truth

Figure 5

6. Violin plot

A violin plot is used to visualize and compare the actual distribution of different variables.

The figure below shows the box plot for our dataset with label as our reference/ground_truth

Figure 6

7. Scatter plot

A scatter plot is used to visualize the relationship between two variables. The scatterplot can be further modified to add the trend line or the data distribution plots to the figure. The ground truth can be further visualized by adding a color scale to the points on the scatterplot.

The figure below shows the scatterplot for our dataset with label as our reference/ground_truth

Figure 7

8. Multi-scatter plot

A multi-scatter plot is used to visualize the relationship between multiple variables. All the features The figure below shows the multi-scatterplot for our dataset with label as our reference/ground_truth

Figure 8

9. Relationship heatmap

A comparative heatmap is similar to a scatterplot except that it plots the relationship in the form of a heatmap instead of a scatter plot and is only used to visualization relationship between categorical variables.

10. Correlation plot

A correlation plot visualizes the correlation coefficient between multiple variables and displays it in the form of a color coded matrix. An example correlation map for our dataset is shown below:

Figure 10