use simulated_multilevel_data.dta // use data
2 Descriptive Statistics
2.1 Descriptive Statistics
We use summarize
for continuous variables, and tabulate
for categorical variables.
summarize outcome warmth physical_punishment HDI
tabulate identity
tabulate intervention
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
outcome | 3,000 52.43327 6.530996 29.60798 74.83553
warmth | 3,000 3.521667 1.888399 0 7
physical_p~t | 3,000 2.478667 1.360942 0 5
HDI | 3,000 64.76667 17.24562 33 87
hypothetica |
l identity |
group |
variable | Freq. Percent Cum.
------------+-----------------------------------
1 | 1,507 50.23 50.23
2 | 1,493 49.77 100.00
------------+-----------------------------------
Total | 3,000 100.00
recieved |
interventio |
n | Freq. Percent Cum.
------------+-----------------------------------
0 | 1,547 51.57 51.57
1 | 1,453 48.43 100.00
------------+-----------------------------------
Total | 3,000 100.00
library(haven) # read data in Stata format
<- read_dta("simulated_multilevel_data.dta") df
R’s descriptive statistics functions rely heavily on whether a variable is a numeric variable, or a factor variable. Below, I convert two variables to factors (factor
) before using summary
1 to generate descriptive statistics.
$country <- factor(df$country)
df
$identity <- factor(df$identity)
df
$intervention <- factor(df$intervention)
df
summary(df)
country HDI family id identity
1 : 100 Min. :33.00 Min. : 1.00 Length:3000 1:1507
2 : 100 1st Qu.:53.00 1st Qu.: 25.75 Class :character 2:1493
3 : 100 Median :70.00 Median : 50.50 Mode :character
4 : 100 Mean :64.77 Mean : 50.50
5 : 100 3rd Qu.:81.00 3rd Qu.: 75.25
6 : 100 Max. :87.00 Max. :100.00
(Other):2400
intervention physical_punishment warmth outcome
0:1547 Min. :0.000 Min. :0.000 Min. :29.61
1:1453 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:48.02
Median :2.000 Median :4.000 Median :52.45
Mean :2.479 Mean :3.522 Mean :52.43
3rd Qu.:3.000 3rd Qu.:5.000 3rd Qu.:56.86
Max. :5.000 Max. :7.000 Max. :74.84
using Tables, MixedModels, MixedModelsExtras, StatFiles, DataFrames, CategoricalArrays, DataFramesMeta
= DataFrame(load("simulated_multilevel_data.dta")) df
Similarly to R, Julia relies on the idea of variable type. I use transform
to convert the appropriate variables to categorical variables.
@transform!(df, :country = categorical(:country))
@transform!(df, :identity = categorical(:identity))
@transform!(df, :intervention = categorical(:intervention))
describe(df) # descriptive statistics
9×7 DataFrame
Row │ variable mean min median max nmissing eltype ⋯
│ Symbol Union… Any Union… Any Int64 Union ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ country 1.0 30.0 0 Union{ ⋯
2 │ HDI 64.7667 33.0 70.0 87.0 0 Union{
3 │ family 50.5 1.0 50.5 100.0 0 Union{
4 │ id 1.1 9.99 0 Union{
5 │ identity 1.0 2.0 0 Union{ ⋯
6 │ intervention 0.0 1.0 0 Union{
7 │ physical_punishment 2.47867 0.0 2.0 5.0 0 Union{
8 │ warmth 3.52167 0.0 4.0 7.0 0 Union{
9 │ outcome 52.4333 29.608 52.449 74.8355 0 Union{ ⋯
1 column omitted
2.2 Interpretation
Examining descriptive statistics is an important first step in any analysis. It is important to examine your descriptive statistics first, before skipping ahead to more sophisticated analyses, such as multilevel models.
In examining the descriptive statistics for this data, we get a sense of the data.
outcome
has a mean of approximately 52 and ranges from approximately 30 to 75.warmth
andphysical punishment
are both variables that represent the number of times that parents use each of these forms of discipline in a week. The average of the former is about 3.5, while the average of the latter is about 2.5.HDI
, the Human Development Index has an average of about 65, and a wide range.identity
is a categorical variable for a hypothetical identity group, and has values of 1 and 2.intervention
is also a categorical variable, and has values of 0 and 1.
skimr
is an excellent new alternative library for generating descriptive statistics in R.↩︎