Overview

National Health and Nutrition Examination Survey (NHANES) is a large ongoing study conducted by the US CDC to assess the health and nutritional status of the non-institutionalized US population. The study is conducted in two-year waves with approximately 10,000 participants per wave. More information can be found on their website.

Data Sources and Cleaning

The version of the NHANES data that we used in the book can be found here: single-level, multi-level.

The data were collected from NHANES 2011-2012 and NHANES 2013-2014 cohorts. Each participant recruited in these waves was asked to wear a wrist-worn physical activity monitor (Actigraph GT3X+) for seven consecutive days. The raw 80hz tri-axial acceleration data were extracted from the device, processed, and released on the NHANES website in different units (2011-2012, 2013-2014).

The data we used were extracted from the NHANES website and were organized into an analyzable format following a similar pipeline described in the rnhanesdata package. Here is the link to the github repo of the rnhanesdata package, which contains the code for data cleaning and vignettes for data analysis. Note that the rnhanesdata package was designed to clean the data collected from NHANES 2003-2004 and NHANES 2005-2006. Therefore, the data cleaning steps we performed here were similar but not identical to those described in the vignette.

Data Description

Two NHANES data sets are shared with this book. The single-level version is primarily used in the book. For a participant with multiple days of minute-level accelerometry data, the data was compressed by take the average at each minute across available days, resulting in 1,440 observations per participant. The minute-level data were stored in the MIMS matrix of the data frame. The standard deviation of the intensity values at each minute across available days was also calculated and stored in the MIMS_sd matrix. This dataset has 12610 rows and 13 columns.

The uncompressed physical activity data were stored in the MIMS column of the multi-level version of the dataset. Each row represents data collected from one eligible day of one participant. Therefore, there are multiple rows for each participant. This dataset has 79910 rows and 6 columns. It is mainly used in the Multilevel Functional Data Analysis chapter of the book as well as in data illustrations.