Appendix A — Reshaping Data in Stata

A.1 Introduction

Data can be reshaped from wide format to long format, and back again. Almost any software that is capable of estimating multilevel models is capable of reshaping data. The Stata command for reshaping data is reshape.

Below, I detail the procedure for reshaping data in Stata. Here is a sample of the longitudinal data set used in this book.

These data are in long format (see Table 6.4).

Every individual in the data has multiple rows. Every row of the data is a person-timepoint.

Table A.1: Data in Long Format
Table continues below
country HDI family id identity intervention t
1 69 1 1.1 2 1 1
1 69 1 1.1 2 1 2
1 69 1 1.1 2 1 3
1 69 2 1.2 2 2 1
1 69 2 1.2 2 2 2
1 69 2 1.2 2 2 3
physical_punishment warmth outcome
3 3 58.47
3 4 56.06
1 2 59.77
2 1 51.1
3 0 54.31
3 1 50.79

A.2 Data Management

  1. Because reshape-ing your data dramatically changes the structure of your data, it is a good idea to have your raw data saved in a location where it will not be changed, and can be retrieved again if the reshape command does not work correctly, or if you simply want to modify your reshape-ing data workflow.
  2. Usually we want to work with only a subset of your data, so keep only the data in which you are interested. In Stata, the command to keep only variables of interest would be keep y x z t.

A.3 Reshaping Data From Long To Wide

While it is not often that we want to reshape data from long to wide, I do so here for illustrative purposes. The Stata command for reshaping the data to wide format is:


reshape wide physical_punishment warmth outcome, i(id) j(t)

Notice that I only list variables that vary over time, or are time varying. Stata assumes that variables that are not listed do not vary over time, or are time invariant.

The data are now in wide format (See Table 6.5).

Every individual in the data set has a single row of data. Every row in the data set is an individual.

Table A.2: Data in Wide Format
Table continues below
id physical_punishment1 warmth1 outcome1 physical_punishment2
1.1 2 3 59.18 2
1.10 3 1 52.09 3
1.100 1 4 49.3 0
1.11 2 3 61.99 2
1.12 3 4 47.45 3
1.13 5 3 61.11 3
Table continues below
warmth2 outcome2 physical_punishment3 warmth3 outcome3 country HDI
2 58.29 3 3 60.58 1 69
2 52.99 2 1 64.37 1 69
4 64 2 4 57.34 1 69
5 55.91 2 4 65.44 1 69
4 46.42 5 6 48.35 1 69
4 56.99 3 4 50.63 1 69
family group
1 2
10 2
100 2
11 2
12 1
13 1

A.4 Reshaping Data From Wide To Long

Usually, we are more interested in reshaping data from wide to long, and that is what I do now.

Notice again that I only list variables that vary over time, or are time varying. As before, Stata assumes that variables that are not listed do not vary over time, or are time invariant.

Notice also that our time varying data are in the stub-time format, e.g. warmth1, warmth2, physical_punishment1 physical_punishment2, etc. Because the variables are named in this way, Stata knows to use the stub (e.g. warmth) as the variable name, and the numeric value, (e.g. 1, 2, 3) as the timepoint.

The command is:


reshape long physical_punishment warmth outcome, i(id) j(t)

The id variable, whatever it is named, has to uniquely identify the observations. A useful Stata command here is isid, e.g. isid id. If your id variable is not unique, it is often due to missing values. drop if id == . usually solves the problem (assuming that your id variable is indeed named id, and not something else).

If we use this command, we are back to the original format of the data set.

Table A.3: Data in Long Format
Table continues below
country HDI family id identity intervention t
1 69 1 1.1 2 1 1
1 69 1 1.1 2 1 2
1 69 1 1.1 2 1 3
1 69 2 1.2 2 2 1
1 69 2 1.2 2 2 2
1 69 2 1.2 2 2 3
physical_punishment warmth outcome
3 3 58.47
3 4 56.06
1 2 59.77
2 1 51.1
3 0 54.31
3 1 50.79