Increasing access to the NHANES 1988-2018 surveys & mortality linkage data via a user-friendly Stata program

Junming Gong, Mu Jin, Sohyeon Kwon, and Xueer Zhang

Background:

The National Health and Nutrition Examination Survey (NHANES) is developed to assess the helath and nutritional status of general population in United States. The survey investigates various sections including demographic characterististics, physical examinations, laboratory data, and so on. The survey sample popoulation is a population representative sample selected by complex survey design.

From this activity, we aimed to bring Stata work into the open science society.

. capture program drop nhanes

. program define nhanes 
  1.     
.         preserve 
  2.             
.                 qui {
  3.                         
.                         if 0 { //background:r(mean) 
  4.                                 
.                                 1. Stata/BE or IC
  5.                                 2. r(k) < 2048
  6.                                 3. exam.DAT: r(k) == 2368
  7.                                 4. inaccessible to jhustata
  8.                                 5. program to grant access
  9.                                 
.                         }
 10.                         
.                         if 1 { //methods:$keepvars
 11.                                 
.                                 timer on 1
 12.                                 
.                                 global github https://raw.githubusercontent.com/
 13.                                 global jhustata jhustata/book/main/
 14.                                 global keepvars HSAGEIR BMPHT BMPWT HAZA8AK1 CEP GHP
>  HAB1
 15.                                 
.                                 timer off 1
 16.                                                                                     
>     
.                         }
 17.                         
.                         if 2 { //results:.dofiles
 18.                         
.                             timer on 2
 19.                                 
.                                 clear
 20.                                 
.                                 do ${github}${$jhustata}nh3mort.do 
 21.                                 
.                                 if c(edition_real) == "BE"  | c(edition_real) == "IC" {
 22.                                         
.                                         clear 
 23.                                         
.                                         do ${github}${$jhustata}nhanes-alpha-if2.do 
 24.                                         
.                                 }
 25.                                 
.                                 else { 
 26.                                         
.                                         clear 
 27.                                         
.                                         do ${github}${$jhustata}nhanes-alpha-if0.do
 28.                                         
.                                 }
 29.                                 
.                                 
.                                 timer off 2
 30.                                 
.                         }
 31.                         
.                         if 3 { //conclusions:queueing
 32.                         
.                             timer on 3
 33.                         
.                             timer on 31
 34.                                 clear
 35.                                 do adult.do
 36.                                 rename *,lower
 37.                                 save adult.dta,replace 
 38.                                 timer off 31
 39.                                 
.                                 timer on 32
 40.                                 clear 
 41.                                 do exam.do
 42.                                 rename *,lower
 43.                                 save exam.dta,replace 
 44.                                 timer off 32
 45.                                 
.                                 timer on 33
 46.                                 clear
 47.                                 do lab.do
 48.                                 rename *,lower
 49.                                 save lab.dta,replace 
 50.                                 timer off 33
 51.                                 
.                                 timer off 3
 52.                                 
.                         }
 53. 
.                         if 4 { //acknowledge:linkage
 54.                                 
.                                 timer on 4
 55.                                 
.                                 use adult, clear
 56.                                 merge 1:1 seqn using exam,nogen
 57.                                 merge 1:1 seqn using lab,nogen
 58.                                 merge 1:1 seqn using nh3mort,nogen keep(matched)
 59.                                 
.                                 timer off 4
 60.                                 
.                         }
 61.                         
.                         if 5 { //dataset4class:
 62.                                 
.                                 timer on 5
 63.                                 
.                                 compress
 64.                                 lab dat "NHANES 1988-1994, survey & mortality"
 65.                                 save "nh3andmort.dta", replace 
 66.                                 
.                                 timer off 5
 67.                                 
.                         }
 68.                         
.                         if 6 { //survivalanalysis:
 69.                                 
.                                 timer on 6
 70.                                 
.                         lookfor mort
 71.                         codebook mortstat
 72.                         lookfor follow
 73.                         g years=permth_exm/12
 74. 
.                         lookfor health
 75.                         codebook hab1
 76.                         global subgroup: var lab hab1
 77.     
.                         stset years, fail(mortstat)
 78. 
.                         #delimit ;
delimiter now ;
.                         sts graph if inrange(hab1,1,5),
>                            by(hab1)
>                            fail
>                            ti("Mortality in NHANES III",pos(11))
>                            subti("by self report: ${subgroup}",pos(11))
>                            yti("%",orientation(horizontal))
>                            xti("Years")
>                            per(100)
>                            ylab(0(20)80,
>                                format(%3.0f)
>                                angle(360)
>                            )
>                            legend(on
>                                lab(1 "Excellent")
>                                lab(2 "Good")
>                                lab(3 "Fair")
>                                lab(4 "Bad")
>                                lab(5 "Poor")
>                                ring(0)
>                                pos(11)
>                                col(1)
>                                order(5 4 3 2 1)
>                            )
>                            note("Source: RDC/NCHS/CDC/DHHS")  
>                         ;
 79.                         #delimit cr
delimiter now cr
.                         
.                         graph export nh3andmort.png,replace 
 80.                         
.                         stcox i.hab1 if inrange(hab1,1,5)
 81.                 
.                 
.                                 timer off 6
 82.                                 
.                         }
 83.                         
.                         noi timer list 
 84.                         
.                 }
 85.                         
.         restore 
 86.         
. end 

. nhanes
   1:      0.01 /        6 =       0.0023
   2:   9643.33 /        5 =    1928.6658
   3:   3519.88 /        5 =     703.9752
   4:     27.87 /        5 =       5.5738
   5:    202.89 /        5 =      40.5782
   6:     24.20 /        5 =       4.8400
  21:      0.00 /        1 =       0.0000
  31:    633.15 /        5 =     126.6294
  32:   2316.87 /        5 =     463.3748
  33:    569.86 /        5 =     113.9710

Methods:

For Stata/BE or IC users this current program outputs an NHANES dataset with 22 pre-specified variables. We used Stata/SE for this program, therefore, there was no restriction in the variables.

We created the two-way plot presenting the mortlaity in the United States from 1995-2017. To investigate the general health status by age groups during 2017-2018, we additionally created a simple two-way scatter plot which showed mean general health score by age groups. We provided the additional plot to provide more information about recent health status of US population which may be used to predict mortality in the future study.

The general health status was measured by score (0:poor, 1: fair, 2: good, 3: very good: 4: Excellent).

Results:

According to the first Mortality plot, we could detect that the moratlity during 1988-2018 was higher when people self-reported their health as poor or bad.

According to the second scatter plot, we observed that the self-reported health score was higher in the younger aged group in 2017-2018.

. set scheme s2color

. nhanes

. use nh3andmort, clear
(NHANES 1988-1994, survey & mortality)

. di "obs: `c(N)' & vars: `c(k)'"      
obs: 19599 & vars: 3643

. use merged1.dta, clear

. 
. #delimit ;
delimiter now ;
. twoway 
>     scatter  
>              health_s age_g , 
>                  col(white) 
>                          mcolor(%20) 
>                          jitter(3) || 
> 
>         dot  
>              mean_health_s age_g, 
>                  col(blue) 
>                  msize(2)  
>                         legend(off ) 
>                         xlab(1 "10s" 
>                              2 "20s" 
>                                  3 "30s" 
>                                  4 "40s" 
>                                  5 "50s"
>                                  6 "60s") 
>                 yline(90, lcol(red) lp(dash)) 
>                 xti("Age groups")
>                 yti("General Health Status")
>                 title("General Health Score by Age in 2017-2018 NHANES")
>                 ylabel(0(1)5);

. #delimit cr
delimiter now cr
. graph export twoway_1.png, replace
file twoway_1.png saved as PNG format

Conclusions:

Undesirable status of self-reported health was associated with the increment of mortality throughout the 1988-2018. Younger aged group reported better general health status.

Acknowledgments:

We initially published our Stata output in a Jupiter-book hosted by Github. All the .html content of the book was produced in a Python environment; however, Stata .html output will gradually replace the Python-based output of the book as we truly become advanced Stata users!

VS Code terminal is our IDE choice for committing and pushing our git content to our hub and have established a seamless process for updating our publication.

References:

  1. https://jhustata.github.io/book/jjj.html
  2. https://jupyterbook.org/en/stable/start/your-first-book.html
  3. https://www.stata.com/stata-news/news36-1/spotlight-markdown/
  4. https://wwwn.cdc.gov/nchs/data/nhanes3/1a/adult.sas
  5. https://jhustata.github.io/class700/intro.html
  6. https://www.jhsph.edu/courses/course/37447/2022/340.700.71/advanced-stata-programming
  7. Muzaale AD. Databases for surgical health services research: National Health and Nutrition Examination Survey. Surgery. 2019 May;165(5):873-875
  8. https://www.ssc.wisc.edu/~hemken/Stataworkshops/dyndoc%20review/Review.html
  9. https://towardsdatascience.com/write-markdown-latex-in-the-jupyter-notebook-10985edb91fd