R

Exploring Key Social, Economic, Environmental, and Institutional Indicators

This dashboard allows users to explore and compare key social, economic, environmental, and institutional indicators for different countries

Attacks on Healthcare Facilities

I developed this dashboard as as part of my studies at MIT Emerging Talent

Combining Histogram and Density Plot

In this post, I explore visualizing continuous variables by combining a histogram (showing the frequency of values within specific intervals) and a density plot (illustrating probability distribution)

Analyzing Multiple Response Questions

In this post you will learn how to analyze multiple response questions in R.

Read Data from Multiple Excel Sheets and Convert them to Individual Data Frames

In this post I explore different ways of reading data from multiple Excel sheets and converting them into individual data frames in R using lapply() and purrr::map() funciton.

WB Dashboard

This dashboard visualizes the economy & growth related indicators from the World Bank (WB)

A shinyapp to compare datasets

The purpose of developing this app is to compare two versions of the same dataset.

Making Predictions with Linear Regression

In this post you will learn how to build a linear regression, interpret the result, test its assumptions, and use the regression equation for predictions.

Using Survey Weight

In R working with survey weight is made possible using the `survey` package.

Combining Multiple Plots using Patchwork

In this post, I explore different ways to combine multiple ggplot plots using patchwork package to produce publication-ready plots

ggplot2: geom_histogram & facet_wrap with different vertical lines on each facet

The geom_histogram() function from ggplot2 package is used to create a histogram plot. For example, let’s plot the distribution of Sepal.Length from iris data. library(ggplot2) theme_set(theme_bw()) ggplot(iris, aes(Sepal.Length)) + geom_histogram(fill = "orange") To add a vertical line to show the mean value of Sepal.

Exploring Relationship Between Variables | scatter-plot

When dealing with numerical data, the most common way to graphically explore the patterns and relationships between variables and draw conclusion about how varaibles relate to one another is by plotting the data points using a scatterplot. A scatterplot uses dots/markers to represent values for two numeric variables where the position of each dot indicates values for an individual data point in the (x,y) coordinates.

Detecting Duplicates (base R vs. dplyr)

In this post, I provide an overview of duplicated( ) function from base R and the distinct( ) function from dplyr package to detect and remove duplicates.

Methods for Transforming Data to Normal Distribution

In this post, I try to cover the most common methods of transforming a skewed distribution into a normal distribution, and the foundational step that you must consider prior deciding which method to apply.

Skewness | Definition and its Importance in Data Science

In this post, you will learn the main concept of skewness, calculating the skewness in R and by hand, and its importance in the field of data analytics.

Analysis of the Relationship Between Two Quantitative Variables | Correlation

While studying, for example, the relationship between GDP and life expectancy, you might be interested to know whether there exists any relationship between the two indicators? is it a positive relationship or a negative relationship? and how strong the association is? These questions can be answered by computing the correlation coefficient between the two indicators. Depending on the type of data, different methods of correlation exist. In this post, you will learn the Pearson correlation coefficient and the Spearman correlation coefficient.

Measures of Dispersion

As the name suggests, the measures of dispersion show the extent of variability and the scattering of the data points. The main idea of the measures of dispersion is to get to know how the data are spread and how much the data points vary from the average value. There are mainly two types of measures of dispersion. 1) Absolute measures of dispersion 2) Relative measures of dispersion

What is Central Tendency?

The most common types of measure of central tendency are mean, median, and mode. Each of these measures shows the tendency of the data to clusture around a middle value using a different approach.

Visualizing Distribution of Data | box-plot

Box-plot is one of the effective ways to visually represent the distribution of data and it gives you an overall idea about how the data looks. And it is one of the best tools to identify the outliers to check if an association you find in your analysis can be explained by the presence of potential unusual observations. Through box plots we can find the minimum, lower quartile (25th percentile), median (50th percentile), upper quartile (75th percentile), and maximum of an continues variable.

Rename Data Frame Columns

In this post you will learn how to rename columns of a data frame in R.