%%{
init: {
'theme': 'base',
'look': 'handDrawn',
'themeVariables': {
'primaryColor': '#FFCB05',
'primaryTextColor': '#000000',
'primaryBorderColor': '#00274C',
'lineColor': '#00274C',
'secondaryColor': '#00274C',
'secondaryTextColor': '#000000',
'tertiaryColor': '#F2F2F2',
'tertiaryBorderColor': '#00274C'
}
}
}%%
block-beta
columns 3
A["have a <br>question"] space B["get <br> raw data"]
space space space
D["process and <br>clean data"] space C["select or <br>keep variables"]
space space space
E["analyze <br>data"] space F["visualize <br>data"]
space space space
space space G["make <br>conclusions"]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
Workflows
So many projects have the same workflow.
Thinking In Code
In my teaching, I often emphasize the importance of thinking in code, whether that code is Stata, or R, or some other program.
.do Files
.do files are Stata’s way of storing code. I’ve provided an example .do file below.
******************************
* Penguin Analysis
* Demonstration Do File
******************************
* So many projects have the same, or similar, workflow.
* DO YOUR THINKING IN CODE!!!
* have a question ->
* get data ->
* process and clean data ->
* analyze data ->
* visualize data ->
* make conclusions
/* do files are useful to preserve
a record of your work. They help
to keep an audit trail of the
decisions that you have made. */
/* do files thus serve as a way of creating an
automated, replicable and documented workflow
as well as finding and minimizing errors */
* A `*` character at the beginning of a line makes that line a comment
/* You can also use asterisk slash to denote multiple lines of comment */
******************************
* get data
******************************
* a good workflow habit is to
* always--or at least frequently--
* work from your raw data.
* i.e. run your script so you are always--
* or at least often--opening your raw data,
* cleaning the data, creating new variables,
* and then running analyses.
clear all // clear the workspace
* get data from web
use "penguins.dta", clear
******************************
* take a look at the data
******************************
* NB if you have a lot of variables, the commands below will produce a lot of (too much) output
* you may need to `describe` or `codebook` specific variables
* you might want to only `keep` certain variables so that you have a smaller working data set
describe // describe the variables
codebook // full descriptions of all the variables; produces a lot of output
******************************
* descriptive statistics
******************************
summarize // descriptive statistics for all variables
summarize body_mass_g // descriptive statistics for this variable
tabulate species // tabulate this categorical variable
* dtable is a useful new command
* for producing tables of descriptive statistics
* be sure to denote indicator variables with an `i.`
dtable culmen_length_mm body_mass_g i.species
******************************
* data wrangling
******************************
* find variables of interest
lookfor mass // look for a variable w a particular keyword
* sometimes it is useful to `keep` only the variables in which you have an interest
* to reduce the size of the data set
* recode variables
generate big_penguin = body_mass_g > 4000 // create a big penguin variable
tabulate big_penguin
******************************
* ANOVA
******************************
oneway body_mass_g species, tabulate
******************************
* regression
******************************
regress culmen_length_mm body_mass_g
est store M1 // store regression estimates
regress culmen_length_mm body_mass_g i.species
est store M2 // store regression estimates
* /// indicates that a command spans multiple lines
etable, estimates(M1 M2) /// nicely formatted table of regression estimates
cstat(_r_b) /// beta's only
showstars showstarsnote // show stars and note
******************************
* graph
******************************
graph bar body_mass_g, over(species) // bar graph
twoway scatter culmen_length_mm body_mass_g // scatterplot