Genetic Regulators of Sputum Mucin Concentration and Their Associations With COPD Phenotypes

Eric Van Buren, Giorgia Radicioni, Sarah Lester, Wanda K. O’Neal, Hong Dang, Silva Kasela, Suresh Garudadri, Jeffrey L. Curtis, MeiLan K. Han, Jerry A. Krishnan, Emily S. Wan, Edwin K. Silverman, Annette Hastie, Victor E. Ortega, Tuuli Lappalainen, Martijn C. Nawijn, Maarten van den Berge, Stephanie A. Christenson, Yun Li, Michael H. Cho, Mehmet Kesimer, Samir N. P. Kelada


Hyper-secretion and/or hyper-concentration of mucus is a defining feature of multiple obstructive lung diseases, including chronic obstructive pulmonary disease (COPD). Mucus itself is composed of a mixture of water, ions, salt and proteins, of which the gel-forming mucins, MUC5AC and MUC5B, are the most abundant. Recent studies have linked the concentrations of these proteins in sputum to COPD phenotypes, including chronic bronchitis (CB) and acute exacerbations (AE). We sought to determine whether common genetic variants influence sputum mucin concentrations and whether these variants are also associated with COPD phenotypes, specifically CB and AE.


Chronic obstructive pulmonary disease (COPD) is a smoking-related disease that affects more than 200 million people and is the fourth leading cause of death worldwide [1,2]. The disease is characterized by the presence of emphysema and/or chronic bronchitis (CB). Chronic mucus hyper-secretion is a defining phenotype of CB and is associated with airway obstruction due to mucus plugs [3], acute exacerbations (AE) [4], and accelerated loss of lung function over time [5,6].

Materials and methods

The primary analyses presented here are based on study participants in SPIROMICS ( Identifier: NCT01969344), and a schematic of the SPIROMICS datasets used here is shown in S1 Fig. The study design has been described previously [31]. SPIROMICS participants were genotyped using the Illumina OmniExpress Human Exome Beadchip [32]. Quality controls included testing for sex concordance and removal of SNPs with high genotype missing rates (>5%) and/or violations of Hardy Weinberg equilibrium at p < 1x10-6. Genotype imputation was performed using the Michigan Imputation Server [33] using haplotypes from Phase3 of the 1000 Genomes Project [34].


We then examined associations between rs140324259 and clinical phenotypes in the larger SPIROMICS population for which genotype and clinical data exist but there is not sputum mucin concentration data (n≈1,250). In this sample, rs140324259 was associated with CB at baseline (p = 0.02, Table 2). Similar to results in the smaller subset of subjects described above, the C allele was associated with increased risk of CB (odds ratio (OR) = 1.42; 95% confidence interval (CI): 1.10–1.80). The effect of rs140324259 on AE in the larger SPIROMICS sample with clinical data was examined using both retrospectively and prospectively ascertained data. rs140324259 was not significantly associated with AE in the year prior to enrollment (S7 Table).


Surprisingly, rs140324259 does not appear to be an eQTL for MUC5B, though we note that our sample size for eQTL analysis was not large and that the tagSNP we used is not in very high LD with rs140324259. One nearby variant, rs11604917, is intriguing given that it potentially disrupts binding of the transcription factor RBP-J, a key player in the Notch signaling pathway that determines ciliated vs. secretory cell fate in murine airways [19]. This could suggest that the MUC5B pQTL is a function of cell type composition of the airway epithelium, an idea supported by the lack of an association with gene expression. However, this variant is in low LD with rs140324259, and the association of rs11604917 with CB-related phenotypes in the UK Biobank was not nearly as strong as for rs140324259, arguing against a causal role for rs11604917. 


In summary, we identified pQTL for MUC5AC and MUC5B in sputum, demonstrating that common genetic variants influence these biomarkers. The lead MUC5B pQTL, rs140324259, was associated with CB and prospectively ascertained AE in SPIROMICS and was also associated with CB-related phenotypes in the UK Biobank. Additional studies are needed to determine how this variant influences MUC5B concentration in sputum and to further evaluate whether rs140324259 may be a biomarker of CB and AE susceptibility in COPD in other populations.


The authors thank Dr. Wesley Crouse for helpful discussions related to mediation analysis. The authors are grateful to SPIROMICS participants, participating physicians, investigators and staff for making this research possible. More information about the study and how to access SPIROMICS data is available at The authors would like to acknowledge the University of North Carolina at Chapel Hill BioSpecimen Processing Facility for sample processing, storage, and sample disbursements (

Citation: Van Buren E, Radicioni G, Lester S, O’Neal WK, Dang H, Kasela S, et al. (2023) Genetic regulators of sputum mucin concentration and their associations with COPD phenotypes. PLoS Genet 19(6): e1010445.

Editor: Gregory S. Barsh, HudsonAlpha Institute for Biotechnology, UNITED STATES

Received: September 27, 2022; Accepted: April 26, 2023; Published: June 23, 2023

Copyright: © 2023 Buren et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Whole genome genotype data are provided in dbGaP under the following accession numbers: phs001927.v1.p1 (SPIROMICS) and phs000951.v5.p5 (COPDGene). More information about the study and how to access SPIROMICS data is available at SPIROMICS subject-level data used here and the corresponding data dictionaries are provided in Supporting Files 3-10. We note, however, that not all subjects consented to sharing their genotype data publicly. Thus, the data we share represents all of the data than can be shared according to the consents provided. The distributions of demographic, exposure, and clinical variables (age, sex, smoking status, disease severity) were not significantly different between subjects who consented to genotype data sharing vs. those that did not. Summary statistics from genome-wide scans have been deposited in the GWAS catalog under accession numbers GCST90269923, GCST90269924, GCST90269925, GCST90269926, GCST90269927, GCST90269928, GCST90269929, GCST90269930, and GCST90269931. Our analysis code (R and SAS) is provided in Supporting Files 11-12.

Funding: Y.L. was partially supported by NIH grants R01GM105765 and U01DA052713. S.K. and T.L. were supported by NIH grant R01HL142028. MK was supported by R01HL110906. SPIROMICS was supported by contracts from the NIH/NHLBI (HHSN268200900013C, HHSN268200900014C, HHSN268200900015C, HHSN268200900016C, HHSN268200900017C, HHSN268200900018C, HHSN268200900019C, HHSN268200900020C), grants from the NIH/NHLBI (U01HL137880 and U24HL141762), and supplemented by contributions made through the Foundation for the NIH and the COPD Foundation from AstraZeneca/MedImmune; Bayer; Bellerophon Therapeutics; Boehringer-Ingelheim Pharmaceuticals, Inc.; Chiesi Farmaceutici S.p.A.; Forest Research Institute, Inc.; GlaxoSmithKline; Grifols Therapeutics, Inc.; Ikaria, Inc.; Novartis Pharmaceuticals Corporation; Nycomed GmbH; ProterixBio; Regeneron Pharmaceuticals, Inc.; Sanofi; Sunovion; Takeda Pharmaceutical Company; and Theravance Biopharma and Mylan. The COPDGene project described was supported by NHLBI Grants U01HL089856 and U01HL089897. The COPDGene project is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, Novartis, Pfizer, Siemens and Sunovion. A full listing of COPDGene investigators can be found at: Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). Genome sequencing for “NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene)” (phs000951.v4.p4) was performed at the Northwest Genomics Center (3R01HL089856-08S1) and Broad Institute Genomics Platform (HHSN268201500014C). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL120393; U01HL120393; contract HHSN268201800001I). S.N.P.K. was partially supported by NIH grant R01 HL122711. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: EKS received grant support from GlaxoSmithKline and Bayer. ESW received honoraria from Pri-Med. JAK reports a grant from U.S. National Institutes of Health (NIH) paid to the institution during the conduct of the study. He also reports personal fees from GlaxoSmithKline consulting on antibodies for acute COVID-19; personal fees from AstraZeneca consulting on antibodies for severe asthma, paid to the institution; personal fees from CereVu Medical consultant for medical device for dyspnea, paid to the institution; and personal fees from BData consultant for severe asthma registry outside the submitted work. JLC reports grants from NIH/NHLBI and the COPD Foundation for the current work and paid to his institution; grants from NIH/NHLBI, the COPD Foundation, NIH/NIAID, and the Departments of Veterans Affairs and of Defense, outside the current work and paid to his institution; and consulting fees from CSL Behring, LLC, AstraZenca, and Novartis Corp., outside the current work and paid to his institution. MCN has received grant support from GlaxoSmithKline. MHC has received grant support from GlaxoSmithKline and Bayer, consulting fees from AstraZeneca, and speaking fees from Illumina. MK received consulting fees from Arrowhead pharmaceuticals and has a patent pending for mucin measurements in chronic bronchitis. MKH reports personal fees from GlaxoSmithKline, AstraZeneca, Boehringer Ingelheim, Cipla, Chiesi, Novartis, Pulmonx, Teva, Verona, Merck, Mylan, Sanofi, DevPro, Aerogen, Polarian, Regeneron, Amgen, UpToDate, Altesa Biopharma, Medscape, NACE, MDBriefcase, Integrity and Medwiz. She has received either in kind research support or funds paid to the institution from the NIH, Novartis, Sunovion, Nuvaira, Sanofi, Astrazeneca, Boehringer Ingelheim, Gala Therapeutics, Biodesix, the COPD Foundation and the American Lung Association. She has participated in Data Safety Monitoring Boards for Novartis and Medtronic with funds paid to the institution. She has received stock options from Meissa Vaccines and Altesa Biopharma. MVDB reports research grants paid to their institution by GlaxoSmithKline, Novartis, AstraZeneca, Roche and Genentech. SAC received consulting fees from Sanofi/Regeneron, GlaxoSmithKline, AstraZeneca, Glenmark Pharmaceuticals, and Amgen; honoraria from Sanofi/Regeneron, MJH Holdings LLC: Physicians? Education Resource, Sunovion, UpToDate, and Wolters Kluwer Health; travel support from AstraZeneca and GlaxoSmithKline; and sits on the data safety board or advisory boards of Sanofi/Regeneron, and AstraZeneca, and GlaxoSmithKline. TL has stock options from Variant Bio. VEO is on the Data Safety Monitoring Board or Advisory Board for Sanofi and Regeneron. ATH, EVB, GR, HD, SG, SL, SK, SNPK, WKO, and YL have no competing interests to declare.

Harvard Medical School - Leadership in Medicine Southeast Asia47th IHF World Hospital Congress