National Health and Nutrition Examination Survey (NHANES) is a large ongoing study conducted by the US CDC to assess the health and nutritional status of the non-institutionalized US population. The study is conducted in two-year waves with approximately 10,000 participants per wave. More information can be found on their website.
The version of the NHANES data that we used in the book can be found here: single-level, multi-level.
The data were collected from NHANES 2011-2012 and NHANES 2013-2014 cohorts. Each participant recruited in these waves was asked to wear a wrist-worn physical activity monitor (Actigraph GT3X+) for seven consecutive days. The raw 80hz tri-axial acceleration data were extracted from the device, processed, and released on the NHANES website in different units (2011-2012, 2013-2014).
The data we used were extracted from the NHANES website and were
organized into an analyzable format following a similar pipeline
described in the rnhanesdata
package. Here is the link to the
github repo of the rnhanesdata
package, which contains the
code for data cleaning and vignettes for data analysis. Note that the
rnhanesdata
package was designed to clean the data
collected from NHANES 2003-2004 and NHANES 2005-2006. Therefore, the
data cleaning steps we performed here were similar but not identical to
those described in the vignette.
Two NHANES data sets are shared with this book. The single-level
version is primarily used in the book. For a participant with multiple
days of minute-level accelerometry data, the data was compressed by take
the average at each minute across available days, resulting in 1,440
observations per participant. The minute-level data were stored in the
MIMS
matrix of the data frame. The standard deviation of
the intensity values at each minute across available days was also
calculated and stored in the MIMS_sd
matrix. This dataset
has 12610 rows and 13 columns.
The uncompressed physical activity data were stored in the
MIMS
column of the multi-level version of the dataset. Each
row represents data collected from one eligible day of one participant.
Therefore, there are multiple rows for each participant. This dataset
has 79910 rows and 6 columns. It is mainly used in the Multilevel
Functional Data Analysis chapter of the book as well as in data
illustrations.