In this post, I explore visualizing continuous variables by combining a histogram (showing the frequency of values within specific intervals) and a density plot (illustrating probability distribution)
The geom_histogram() function from ggplot2 package is used to create a histogram plot. For example, let’s plot the distribution of Sepal.Length from iris data.
library(ggplot2) theme_set(theme_bw()) ggplot(iris, aes(Sepal.Length)) + geom_histogram(fill = "orange") To add a vertical line to show the mean value of Sepal.
When dealing with numerical data, the most common way to graphically explore the patterns and relationships between variables and draw conclusion about how varaibles relate to one another is by plotting the data points using a scatterplot. A scatterplot uses dots/markers to represent values for two numeric variables where the position of each dot indicates values for an individual data point in the (x,y) coordinates.
Box-plot is one of the effective ways to visually represent the distribution of data and it gives you an overall idea about how the data looks. And it is one of the best tools to identify the outliers to check if an association you find in your analysis can be explained by the presence of potential unusual observations. Through box plots we can find the minimum, lower quartile (25th percentile), median (50th percentile), upper quartile (75th percentile), and maximum of an continues variable.