Data can be reshaped from wide format to long format, and back again. Almost any software that is capable of estimating multilevel models is capable of reshaping data. The Stata command for reshaping data is reshape.
Below, I detail the procedure for reshaping data in Stata. Here is a sample of the longitudinal data set used in this book.
Every individual in the data has multiple rows. Every row of the data is a person-timepoint.
Table A.1: Data in Long Format
Table continues below
country
HDI
family
id
identity
intervention
t
1
69
1
1.1
2
1
1
1
69
1
1.1
2
1
2
1
69
1
1.1
2
1
3
1
69
2
1.2
2
2
1
1
69
2
1.2
2
2
2
1
69
2
1.2
2
2
3
physical_punishment
warmth
outcome
3
3
58.47
3
4
56.06
1
2
59.77
2
1
51.1
3
0
54.31
3
1
50.79
A.2 Data Management
Because reshape-ing your data dramatically changes the structure of your data, it is a good idea to have your raw data saved in a location where it will not be changed, and can be retrieved again if the reshape command does not work correctly, or if you simply want to modify your reshape-ing data workflow.
Usually we want to work with only a subset of your data, so keep only the data in which you are interested. In Stata, the command to keep only variables of interest would be keep y x z t.
A.3 Reshaping Data From Long To Wide
While it is not often that we want to reshape data from long to wide, I do so here for illustrative purposes. The Stata command for reshaping the data to wide format is:
Notice that I only list variables that vary over time, or are time varying. Stata assumes that variables that are not listed do not vary over time, or are time invariant.
Every individual in the data set has a single row of data. Every row in the data set is an individual.
Table A.2: Data in Wide Format
Table continues below
id
physical_punishment1
warmth1
outcome1
physical_punishment2
1.1
2
3
59.18
2
1.10
3
1
52.09
3
1.100
1
4
49.3
0
1.11
2
3
61.99
2
1.12
3
4
47.45
3
1.13
5
3
61.11
3
Table continues below
warmth2
outcome2
physical_punishment3
warmth3
outcome3
country
HDI
2
58.29
3
3
60.58
1
69
2
52.99
2
1
64.37
1
69
4
64
2
4
57.34
1
69
5
55.91
2
4
65.44
1
69
4
46.42
5
6
48.35
1
69
4
56.99
3
4
50.63
1
69
family
group
1
2
10
2
100
2
11
2
12
1
13
1
A.4 Reshaping Data From Wide To Long
Usually, we are more interested in reshaping data from wide to long, and that is what I do now.
Notice again that I only list variables that vary over time, or are time varying. As before, Stata assumes that variables that are not listed do not vary over time, or are time invariant.
Notice also that our time varying data are in the stub-time format, e.g. warmth1, warmth2, physical_punishment1physical_punishment2, etc. Because the variables are named in this way, Stata knows to use the stub (e.g. warmth) as the variable name, and the numeric value, (e.g. 1, 2, 3) as the timepoint.
The id variable, whatever it is named, has to uniquely identify the observations. A useful Stata command here is isid, e.g. isid id. If your id variable is not unique, it is often due to missing values. drop if id == . usually solves the problem (assuming that your id variable is indeed named id, and not something else).
If we use this command, we are back to the original format of the data set.