country | HDI | family | id | group | t | physical_punishment | warmth | outcome |
---|---|---|---|---|---|---|---|---|
1 | 69 | 1 | 1.1 | 2 | 1 | 2 | 3 | 59.18 |
1 | 69 | 1 | 1.1 | 2 | 2 | 2 | 2 | 58.29 |
1 | 69 | 1 | 1.1 | 2 | 3 | 3 | 3 | 60.58 |
1 | 69 | 2 | 1.2 | 2 | 1 | 4 | 0 | 61.54 |
1 | 69 | 2 | 1.2 | 2 | 2 | 4 | 0 | 55.96 |
1 | 69 | 2 | 1.2 | 2 | 3 | 4 | 2 | 56.19 |
Appendix B — Reshaping Data in Stata
B.1 Introduction
Data can be reshaped from wide format to long format, and back again. Almost any software that is capable of estimating multilevel models is capable of reshaping data. The Stata command for reshaping data is reshape
.
Below, I detail the procedure for reshaping data in Stata. Here is a sample of the longitudinal data set used in this document.
These data are in long format (see Table 7.4).
Every individual in the data has multiple rows. Every row of the data is a person-timepoint.
B.2 Data Management
- Because
reshape
-ing your data dramatically changes the structure of your data, it is a good idea to have your raw data saved in a location where it will not be changed, and can be retrieved again if thereshape
command does not work correctly, or if you simply want to modify yourreshape
-ing data workflow. - Usually we want to work with only a subset of your data, so keep only the data in which you are interested. In Stata, the command to keep only variables of interest would be
keep y x z t
.
B.3 Reshaping Data From Long To Wide
While it is not often that we want to reshape data from long to wide, I do so here for illustrative purposes. The Stata command for reshaping the data to wide format is:
reshape wide physical_punishment warmth outcome, i(id) j(t)
Notice that I only list variables that vary over time, or are time varying. Stata assumes that variables that are not listed do not vary over time, or are time invariant.
The data are now in wide format (See Table 7.5).
Every individual in the data set has a single row of data. Every row in the data set is an individual.
id | physical_punishment1 | warmth1 | outcome1 | physical_punishment2 |
---|---|---|---|---|
1.1 | 2 | 3 | 59.18 | 2 |
1.10 | 3 | 1 | 52.09 | 3 |
1.100 | 1 | 4 | 49.3 | 0 |
1.11 | 2 | 3 | 61.99 | 2 |
1.12 | 3 | 4 | 47.45 | 3 |
1.13 | 5 | 3 | 61.11 | 3 |
warmth2 | outcome2 | physical_punishment3 | warmth3 | outcome3 | country | HDI |
---|---|---|---|---|---|---|
2 | 58.29 | 3 | 3 | 60.58 | 1 | 69 |
2 | 52.99 | 2 | 1 | 64.37 | 1 | 69 |
4 | 64 | 2 | 4 | 57.34 | 1 | 69 |
5 | 55.91 | 2 | 4 | 65.44 | 1 | 69 |
4 | 46.42 | 5 | 6 | 48.35 | 1 | 69 |
4 | 56.99 | 3 | 4 | 50.63 | 1 | 69 |
family | group |
---|---|
1 | 2 |
10 | 2 |
100 | 2 |
11 | 2 |
12 | 1 |
13 | 1 |
B.4 Reshaping Data From Wide To Long
Usually, we are more interested in reshaping data from wide to long, and that is what I do now.
Notice again that I only list variables that vary over time, or are time varying. As before, Stata assumes that variables that are not listed do not vary over time, or are time invariant.
Notice also that our time varying data are in the stub-time format, e.g. warmth1
, warmth2
, physical_punishment1
physical_punishment2
, etc. Because the variables are named in this way, Stata knows to use the stub (e.g. warmth
) as the variable name, and the numeric value, (e.g. 1, 2, 3) as the timepoint.
The command is:
reshape long physical_punishment warmth outcome, i(id) j(t)
The
id
variable, whatever it is named, has to uniquely identify the observations. A useful Stata command here isisid
, e.g.isid id
. If yourid
variable is not unique, it is often due to missing values.drop if id == .
usually solves the problem (assuming that yourid
variable is indeed namedid
, and not something else).
If we use this command, we are back to the original format of the data set.
country | HDI | family | id | group | t | physical_punishment | warmth | outcome |
---|---|---|---|---|---|---|---|---|
1 | 69 | 1 | 1.1 | 2 | 1 | 2 | 3 | 59.18 |
1 | 69 | 1 | 1.1 | 2 | 2 | 2 | 2 | 58.29 |
1 | 69 | 1 | 1.1 | 2 | 3 | 3 | 3 | 60.58 |
1 | 69 | 2 | 1.2 | 2 | 1 | 4 | 0 | 61.54 |
1 | 69 | 2 | 1.2 | 2 | 2 | 4 | 0 | 55.96 |
1 | 69 | 2 | 1.2 | 2 | 3 | 4 | 2 | 56.19 |