Show the code
load("social-service-agency.RData") # simulated dataAndy Grogan-Kaylor
January 16, 2024
ggplot2 is a powerful graphing library that can make beautiful graphs. ggplot2 can also help us to understand ideas of an underlying “grammar of graphics”.
However, ggplot can be difficult to learn. I am thinking that one way to better understand ggplot2 might be to see how this graphing library could be applied to a concrete example of comparing program outcomes.
In this example, program is a factor and mental health at time 2 is numeric.
The mental health variables are scaled to have an average of 100. Lower numbers indicate lower mental health, while higher numbers indicate higher mental health.
There is a lot of code below. This is where we are setting up the grammatical logic of the graphing approach.
Devoting some time to setting up the initial logic of the plot will pay dividends in terms of exploring multiple geometries later on.
Note that I am adding optional
scale_...andtheme...arguments just to make the graphs look a little nicer, but these are not an essential part of the code.
myplot1 <- ggplot(clients, # the data I am using
aes(x = program, # x is program
y = mental_health_T2, # y is mental health
color = program, # color is also program
fill = program)) + # fill is also program
labs(y = "mental health at time 2") + # labels
scale_color_viridis_d() + # beautiful colors
scale_fill_viridis_d() + # beautiful fills
theme_minimal() + # minimal theme
theme(axis.text.x = element_text(size = rel(.75))) # smaller labelsNow that we have devoted a lot of code to setting up the grammar of the graph, it is a relatively simple matter to try out different geometries. The geometries show the average value.
The segments connecting the x axis with the points, require their own geometry that has its own aesthetic.
An extra element of the aesthetic is required for lines.
A line chart is likely not an appropriate way to show these program outcomes as a line chart is more appropriate when the x axis represents some kind of time trend.
Now that we have devoted a lot of code to setting up the grammar of the graph, it is a relatively simple matter to try out different geometries. The geometries show the distribution of all values.
Again, there is a lot of code below. This is where we are setting up the grammatical logic of the graphing approach.
myplot2 <- ggplot(clients, # the data I am using
aes(x = mental_health_T2, # x is mental health
fill = program)) + # fill is program
facet_wrap(~program) + # facet on this variable
labs(x = "mental health at time 2") + # labels
scale_color_viridis_d() + # beautiful colors
scale_fill_viridis_d() + # beautiful fills
theme_bw() # bw theme makes facets more clearHowever, now that we have devoted a lot of code to setting up the grammar of the graph, it is again a relatively simple matter to try out different geometries.
One last time, there is a lot of code below. This is where we are setting up the grammatical logic of the graphing approach.
And again, now that we have devoted a lot of code to setting up the grammar of the graph, it is again a relatively simple matter to try out different geometries.1
It is important to use (alpha = ...) to create transparency with these geoms.↩︎