Published on in Vol 10, No 5 (2021): May

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/27065, first published .
Using Computational Methods to Improve Integrated Disease Management for Asthma and Chronic Obstructive Pulmonary Disease: Protocol for a Secondary Analysis

Using Computational Methods to Improve Integrated Disease Management for Asthma and Chronic Obstructive Pulmonary Disease: Protocol for a Secondary Analysis

Using Computational Methods to Improve Integrated Disease Management for Asthma and Chronic Obstructive Pulmonary Disease: Protocol for a Secondary Analysis

Protocol

1Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States

2Department of Pediatrics, University of Utah, Salt Lake City, UT, United States

3College of Nursing, University of Utah, Salt Lake City, UT, United States

4Care Transformation and Information Systems, Intermountain Healthcare, West Valley City, UT, United States

5Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States

Corresponding Author:

Gang Luo, DPhil

Department of Biomedical Informatics and Medical Education

University of Washington

UW Medicine South Lake Union

850 Republican Street, Building C, Box 358047

Seattle, WA, 98195

United States

Phone: 1 206 221 4596

Fax:1 206 221 2671

Email: gangluo@cs.wisc.edu


Background: Asthma and chronic obstructive pulmonary disease (COPD) impose a heavy burden on health care. Approximately one-fourth of patients with asthma and patients with COPD are prone to exacerbations, which can be greatly reduced by preventive care via integrated disease management that has a limited service capacity. To do this well, a predictive model for proneness to exacerbation is required, but no such model exists. It would be suboptimal to build such models using the current model building approach for asthma and COPD, which has 2 gaps due to rarely factoring in temporal features showing early health changes and general directions. First, existing models for other asthma and COPD outcomes rarely use more advanced temporal features, such as the slope of the number of days to albuterol refill, and are inaccurate. Second, existing models seldom show the reason a patient is deemed high risk and the potential interventions to reduce the risk, making already occupied clinicians expend more time on chart review and overlook suitable interventions. Regular automatic explanation methods cannot deal with temporal data and address this issue well.

Objective: To enable more patients with asthma and patients with COPD to obtain suitable and timely care to avoid exacerbations, we aim to implement comprehensible computational methods to accurately predict proneness to exacerbation and recommend customized interventions.

Methods: We will use temporal features to accurately predict proneness to exacerbation, automatically find modifiable temporal risk factors for every high-risk patient, and assess the impact of actionable warnings on clinicians’ decisions to use integrated disease management to prevent proneness to exacerbation.

Results: We have obtained most of the clinical and administrative data of patients with asthma from 3 prominent American health care systems. We are retrieving other clinical and administrative data, mostly of patients with COPD, needed for the study. We intend to complete the study in 6 years.

Conclusions: Our results will help make asthma and COPD care more proactive, effective, and efficient, improving outcomes and saving resources.

International Registered Report Identifier (IRRID): PRR1-10.2196/27065

JMIR Res Protoc 2021;10(5):e27065

doi:10.2196/27065

Keywords



The Gap in Identifying Patients With Exacerbation-Prone Asthma and Patients With Exacerbation-Prone Chronic Obstructive Pulmonary Disease for Preventive Care

Management of Asthma and Chronic Obstructive Pulmonary Disease

In the United States, 9.6% of children and 8% of adults have asthma, leading to 1.8 million emergency department visits, 493,000 inpatient stays, US $56 billion in cost, and 3630 deaths every year [1-4]. Approximately 6.5% of adults have chronic obstructive pulmonary disease (COPD), the third leading cause of death, leading to 1.5 million emergency department visits, 0.7 million inpatient stays, and US $32 billion in cost every year [5]. One main goal in managing patients with asthma and patients with COPD is to reduce exacerbations, which expend approximately 40% to 75% of their total care cost [6-8] and accelerate their lung function decline [9]. Approximately one-fourth of patients with asthma and patients with COPD are prone to exacerbation [10-14], meaning that a patient has (1) ≥2 systemic corticosteroid orders in a year or (2) ≥1 emergency department visit or inpatient stay for asthma or COPD with systemic corticosteroid treatment in a year (Figure 1) [10,13,15]. These patients incur approximately two-thirds of all exacerbations [12,13,16] and experience a low quality of life; sleep disturbance; limitations of daily activities impacting independence, relationships, family life, socialization, and career; anxiety; distress; missed work with lost earnings; missed school; high care costs; high hospital use; intubation; and death [10,17-19]. Even a brief use of systemic corticosteroids to treat exacerbations can greatly increase the risk of venous thromboembolism, sepsis, and fracture [20,21].

Figure 1. Determining when a patient with asthma or chronic obstructive pulmonary disease becomes prone to exacerbation. COPD: chronic obstructive pulmonary disease.
View this figure

Many health care systems and health plans use predictive models as the best method [22] to identify high-risk patients for preventive care to improve outcomes and save resources [23-25]. For instance, this is the case with health plans in 9 of the 12 American metropolitan communities mentioned in the study by Mays et al [26]. However, no model exists to predict proneness to exacerbation, which only partly correlates with disease severity [16]. Exacerbation-prone patients are currently identified after exacerbations occur, making it too late to apply integrated disease management (IDM) for preventing exacerbations. IDM is defined as “a group of coherent interventions, designed to prevent or manage 1 or more chronic conditions using a community wide, systematic and structured multidisciplinary approach potentially employing multiple treatment modalities” [27]. IDM typically has several components, such as self-management education, skills training, care management, and structured follow-up [28,29]. Having a limited service capacity [29-33], IDM can lower hospital use by up to 40%; cut costs by up to 31%; greatly reduce symptoms; and enhance treatment adherence, patient satisfaction, and quality of life by 30%-60% [26,28-32,34-42]. Neither patient registries nor dashboards are able to identify exacerbation-prone patients before exacerbations occur and, thus, to apply IDM in a timely manner. A patient registry tracks a given patient cohort but cannot make predictions. Although many attributes are often needed to achieve high prediction accuracy [43-45], a dashboard tracks only a few attributes. To have prediction capability, a dashboard needs to be supported by a predictive model in the backend. Models for proneness to exacerbation are needed to guide the use of IDM and to prevent exacerbations. This cannot be done well with the current model building approach for other asthma and COPD outcomes, which has 2 major gaps due to the limited use of temporal features showing early health changes and general directions [46-94]. Each temporal feature is an independent variable computed on one or more longitudinal attributes, such as the slope of pulmonary function last year, the slope of BMI last year, the number of days in the previous week during which the sulfur dioxide level was ≥4 parts per million, and whether the patient’s filling frequency of oral corticosteroid prescription increased over time. Although this study focuses on exacerbation-prone asthma and COPD as use cases, the proposed computing techniques and software can be harnessed to forecast outcomes of other diseases such as congestive heart failure and diabetes, with temporal features such as the slopes of cardiac function and blood glucose level over time.

Gap 1: Low Prediction Accuracy

Existing models for predicting an individual asthma or COPD patient’s health outcomes typically have low accuracy [46-94]. The systematic review by Loymans et al [52] and our review [43] showed that for forecasting hospital use (emergency department visits and inpatient stays) for asthma in patients with asthma, each previous model, excluding the models of Zein et al [58], has an area under the receiver operating characteristic curve (AUC) within 0.61-0.81, a sensitivity within 25%-49%, and a positive predictive value within 4%-22% [46-57]. The models of Zein et al [58] and our recent new models [43-45] have similarly higher accuracy but are still not good enough for aligning preventive care with the patients needing it the most. The case with COPD is similar [59-94].

Existing models for predicting asthma and COPD outcomes typically have low accuracy for several reasons:

  1. Existing models use elementary temporal features such as the count of inpatient stays and ever intubated last year, but they rarely use more advanced temporal features such as the slope of the number of days to albuterol refill showing general directions. Many highly predictive temporal features are yet to be identified or are unused. In 2018, Google used all of the attributes in the electronic medical record along with long short-term memory (LSTM) [95,96], one type of deep neural network, to discover temporal features automatically from longitudinal data [97]. This raised the AUC by approximately +10% for projecting each of long hospital stay, in-hospital mortality, and unanticipated readmissions in 30 days [97]. Several other studies [98-100] obtained similar results for various clinical prediction tasks. This matches recent progress in areas such as video classification, speech recognition, and natural language processing, where temporal features LSTM automatically discovered from data beat those that experts provided or other temporal and sequential pattern mining methods [101-104] mined from data. The LSTM model of Xiang et al for predicting asthma outcome [57] had a low AUC of 0.7 because it used only 3 types of attributes and mostly inpatient data without much outpatient data, not because LSTM is ineffective.
  2. Although >100 potential risk factors for poor outcomes in asthma and COPD are known [50-52,105-112], a typical previous model uses only a few (eg, ≤17) [46-57,59-93]. None of the published models adopt all established risk factors contained in contemporary electronic medical records [113].
  3. Weather and air quality variables impact asthma and COPD outcomes [114-117], but they are seldom used in existing models.
Gap 2: No Information Given on the Reason Why a Patient is Deemed High Risk and the Potential Interventions to Reduce the Risk

To provide preventive care well, clinicians need to know the reason a patient is deemed high risk and the potential interventions to reduce the risk. Sophisticated predictive models, including the bulk of machine learning models such as LSTM, are black boxes and provide no such information, although explanation is critical for users’ acceptance, satisfaction, trust, and decision correctness [118-121]. Often, a patient’s clinical records include numerous variables on many pages recorded over multiple years [122]. As the model gives no explanation, already occupied clinicians need to expend extra time on chart review to identify the reasons. This is difficult and time consuming. In fact, the black-box issue has been a major reason for the slow adoption of machine learning in clinical practice, despite machine learning often producing the highest prediction accuracy among all predictive modeling methods [33,123-127].

A clinician can develop a care plan using subjective, variable clinical judgment. However, this care plan often misses some suitable interventions because of the following reasons:

  1. Big practice variation, frequently by 1.6-5.6 times, shows up across facilities, clinicians, and regions [128-135].
  2. A patient can become high risk for many reasons, each shown by a risk pattern given by a feature combination, for example, the sulfur dioxide level was ≥4 parts per million for ≥4 days in the previous week and the number of days to albuterol refill rose over 12 months. Many features and feature combinations exist. A clinician is a human, can typically process ≤9 information items at once [136], and can easily miss some key reasons for which the patient is high risk. Outcomes can degrade if suitable interventions are not used. Regular automatic explanation methods [137-140] cannot deal with longitudinal data and address this issue well.

Our Proposed Solutions

To enable more patients with asthma and patients with COPD to obtain suitable and timely care to prevent exacerbations, we will (1) use temporal features to develop the first set of models to accurately predict exacerbation-prone asthma and COPD, (2) automate finding modifiable temporal risk factors for every high-risk patient, and (3) assess the impact of actionable warnings on clinicians’ decisions to use IDM to prevent proneness to exacerbation.

Innovation

We will develop new techniques to automate the extraction of temporal features from longitudinal data and explain machine learning predictions on longitudinal data. We will improve preventive care, notably for asthma and COPD, by steering it to the patients who need it more precisely and in a more timely manner than the current risk modeling methods:

  1. To the best of our knowledge, this study will construct the first set of models to predict which patients with asthma and which patients with COPD will be prone to exacerbation. Currently, these patients are found after exacerbations occur, making it too late to apply IDM for preventing exacerbations. This is a major public health issue [29,31,32]. Our models can improve IDM and guide its use to avert exacerbations. Compared with the current model building method for other asthma and COPD outcomes that often produces low accuracy, our model building method will lead to more accurate predictions.
  2. To the best of our knowledge, this will be the first study to extract comprehensible and predictive temporal features semiautomatically from longitudinal data without needing any manually prespecified pattern template, which is required by many sequential and temporal pattern mining methods [102-104]. This helps raise the model accuracy and reduce the effort required to construct clinically usable models. At present, clinicians usually have to manually identify such features to construct such models. However, this is time consuming and difficult. Previous models for asthma and COPD rarely use more advanced temporal features, such as slope [46-94]. In addition, although current deep neural network methods can automatically discover temporal features, the discovered features are hidden in neurons and are often incomprehensible, making it difficult to explain the predictions [137,138].
  3. To the best of our knowledge, this will be the first study to automate giving rule-formed explanations for machine learning predictions directly on longitudinal data. Clinicians need explanations to understand the predictions and decide IDM enrollment and interventions. Rule-formed explanations are easier to comprehend and can better hint at actionable interventions than other forms of automatic explanations. Most automatic explanation methods [137,138] for machine learning predictions cannot deal with longitudinal data. Our previous automatic explanation method [140-142] is no exception. It has 5 hyperparameters whose effective values vary by modeling problem and data set. A computing expert often requires several months to perform many trials to find these values laboriously for a data set. We will improve our previous method to deal with longitudinal data and automatically and efficiently select hyperparameter values; therefore, health care researchers with limited computing expertise can use our method with low overhead.
  4. To the best of our knowledge, this will be the first study to automate finding modifiable temporal risk factors and recommending interventions on the basis of objective data, making IDM more efficient and effective. At present, clinicians rely on subjective, variable judgment to create care plans manually and overlook some suitable interventions for high-risk patients.
  5. To the best of our knowledge, this will be the first study to assess the impact of actionable warnings on clinicians’ decisions to use IDM to prevent proneness to exacerbation.

Computing Resources

We will conduct all experiments on a password-protected and encrypted computer cluster hosted at the University of Washington Medicine (UWM). With appropriate authorization and using their university computers, all research team members and test participants at UWM can remotely access this computer cluster.

Data Sets

All data that will be used in this study are structured. We will use clinical and administrative data stored in the enterprise data warehouses of 3 prominent American health care systems: UWM, Kaiser Permanente Southern California (KPSC), and Intermountain Healthcare (IH). We will use >200 clinical and administrative variables listed in our papers’ [43-45] appendices, with differing names of the same concept in distinct electronic medical record systems already manually matched by us. These variables cover a wide range of aspects, such as patient demographics, encounters, medications, laboratory tests, diagnoses, procedures, vital signs, and allergies. We can form the temporal features of most variables, which are longitudinal with timestamps.

In Utah, IH is the largest health care system, with 24 hospitals and 215 clinics. As in our previous work on asthma outcome prediction [43-45], an IH data analyst will run Oracle database queries to retrieve a deidentified IH data set (eg, shift dates, replace identifiers, and replace ages that are ≥90 years) and use Secure Shell (SSH) to encrypt it and transfer it to the password-protected and encrypted computer cluster, where we will perform analysis. The IH data set covers patient encounters from 2005 to 2020. For the previous 5 years, data for children cover >5000 pediatric patients with asthma (aged <18 years) per year. Data for adults cover >14,000 adult patients with asthma (aged ≥18 years) and >6000 adult patients with COPD per year. IH expends many resources on data integrity and accuracy. Owing to its large size and variable richness [143], the data set offers many advantages for exploring the proposed methods.

UWM and KPSC have similar strengths. In Washington, UWM is the largest academic health care system, with 4 hospitals and 12 clinics for adults. A UWM data analyst will execute SQL Server database queries to retrieve a deidentified UWM data set (eg, shift dates, replace identifiers, and replace ages that are ≥90 years) and use SSH to encrypt it and transfer it to the password-protected and encrypted computer cluster. The UWM data set covers adult patient encounters from 2011 to 2020. For the previous 5 years, data cover >12,000 adult patients with asthma and >5000 adult patients with COPD per year.

In Southern California, KPSC is the largest integrated health care system, with 15 hospitals and 231 clinics [144]. A KPSC data analyst will run database queries to retrieve a deidentified KPSC data set (eg, shift dates, replace identifiers, and replace ages that are ≥90 years) and use SSH to encrypt it and transfer it to the password-protected and encrypted computer cluster. The KPSC data set covers patient encounters from 2009 to 2020. For the previous 5 years, data for children cover >77,000 pediatric patients with asthma per year. Data for adults cover >172,000 adult patients with asthma and >78,000 adult patients with COPD per year.

In addition to the clinical and administrative data, we will adopt 11 weather and air quality variables that we have downloaded from public sources [145,146]: daily mean particulate matter ≤2.5 μm in diameter, daily maximum 8-hour carbon monoxide, daily mean particulate matter ≤10 μm in diameter, daily maximum 8-hour ozone, daily maximum 1-hour nitrogen dioxide, daily maximum 1-hour sulfur dioxide, hourly mean precipitation, hourly mean relative humidity, hourly mean wind speed, hourly mean temperature, and hourly mean dew point. These variables were recorded over 16 years (2005-2020) by monitoring stations located in the areas covered by IH, UWM, and KPSC.

The following discussion focuses on asthma. Whenever we refer to asthma, the same applies to COPD.

Aim 1: Use Temporal Features to Accurately Predict Exacerbation-Prone Asthma and COPD

We will extract comprehensible and predictive temporal features semiautomatically from patient, weather, and air quality data and construct models to predict proneness to exacerbation. Each feature uses ≥1 raw variable. There is an almost infinite number of possible features. Traits of pediatric patients’ parents and other factors could also impact patient outcomes. Our goal is not to test all possible useful features and obtain the theoretically maximum possible prediction accuracy. Instead, we intend to show that temporal features can be used to improve prediction accuracy and IDM. We will create a separate model for every disease and health care system pair. This study will focus on associations, as is sufficient for decision support for IDM and common with predictive modeling.

Data Preprocessing

All data sets will be converted into the Observational Medical Outcomes Partnership (OMOP) common data model format [147] and its linked standardized terminologies [148]. Much of the UWM data are already in this format. IH and KPSC have provided their data in an internal normalized format that is similar to this format. We will expand the data model to include patient, weather, and air quality variables that the original data model misses but exist in our data sets. We will use the method described in our paper [149] to choose the most pertinent laboratory tests. To reduce the number of features, we will use the Agency for Healthcare Research and Quality Clinical Classifications Software system [150,151] to merge diseases, use the Berenson-Eggers Type of Service system [152] to merge procedures, and use the Hierarchical Ingredient Code 3 system [153] to merge drugs. We will adopt the method used in our previous work [43-45] to identify, correct, or delete invalid values. To deal with missing values, we will test various imputation techniques [154,155], such as the last observation carried forward, replacement with mean values, and replacement with median values, and use the technique that works the best.

The patient, weather, and air quality variables will be used. The patient variables will cover standard variables studied in the clinical predictive modeling literature [128,129,154], such as diagnoses, and >100 known potential risk factors for poor asthma outcomes listed in our papers [43-45,156]. One such risk factor is the frequency of nighttime awakening recorded on the validated Asthma Control Test questionnaire [157] in the electronic medical record system. For weather and air quality variables, we will perform inverse distance weighting spatial interpolation [158] to compute their daily average values at the patient’s residence zip code from their values at local monitoring stations, as we and others did before for asthma outcome prediction [159-161].

Asthma and COPD Cases and Outcomes

We will implement and test our method using (1) pediatric asthma, (2) adult asthma, and (3) COPD. We will use our previous method [44] adapted from the literature [47,162,163] to identify patients with asthma. We deem a patient to have asthma in a given year if the patient has ≥1 asthma diagnosis code (International Classification of Diseases, Ninth Revision [ICD-9] 493.x or International Classification of Diseases, Tenth Revision [ICD-10] J45 and J46.x) in the year. The outcome is whether the patient became prone to exacerbation (ie, had either ≥2 systemic corticosteroid orders or ≥1 emergency department visit or inpatient stay with a principal diagnosis of asthma and systemic corticosteroid treatment) in the following year [10,15].

We will use our previous method [164] adapted from the literature [165-168] to identify patients with COPD. As shown in Figure 2, we deem a patient to have COPD if the patient is aged ≥40 years and fulfills any of the following 4 conditions:

  1. An outpatient visit diagnosis code of COPD (ICD-9: 491.22, 491.21, 491.9, 491.8, 493.2x, 492.8, and 496; ICD-10: J42, J41.8, J44.x, and J43.x), followed by ≥1 prescription of long-acting muscarinic antagonists (aclidinium, glycopyrrolate, tiotropium, and umeclidinium) within 6 months
  2. ≥1 emergency department or ≥2 outpatient visit diagnosis codes of COPD (ICD-9: 491.22, 491.21, 491.9, 491.8, 493.2x, 492.8, and 496; ICD-10: J42, J41.8, J44.x, and J43.x)
  3. ≥1 inpatient stay discharge with a principal diagnosis code of COPD (ICD-9: 491.22, 491.21, 491.9, 491.8, 493.2x, 492.8, and 496; ICD-10: J42, J41.8, J44.x, and J43.x)
  4. ≥1 inpatient stay discharge with a principal diagnosis code of respiratory failure (ICD-9: 518.82, 518.81, 799.1, and 518.84; ICD-10: J96.0x, J80, J96.9x, J96.2x, and R09.2) and a secondary diagnosis code of acute COPD exacerbation (ICD-9: 491.22, 491.21, 493.22, and 493.21; ICD-10: J44.1 and J44.0) [164].
Figure 2. Determining when a patient starts to have chronic obstructive pulmonary disease. COPD: chronic obstructive pulmonary disease.
View this figure

The outcome is whether the patient became prone to exacerbation (ie, had either ≥2 systemic corticosteroid orders or ≥1 emergency department visit or inpatient stay with a principal diagnosis of COPD and systemic corticosteroid treatment) in the following year [13].

Extracting Temporal Features

We will adopt the method described in our design paper [149] to extract comprehensible and predictive temporal features semiautomatically from longitudinal data. In aim 1, we will use the extracted features to construct the final predictive models. In aim 2, we will use the extracted features to automate finding modifiable temporal risk factors for every high-risk patient. The main idea of our temporal feature extraction method is to build a so-called multi-component LSTM deep neural network model on longitudinal data, use a so-called exclusive group Lasso (least absolute shrinkage and selection operator) regularization method to restrict the number of attributes used in each component LSTM network, and then perform visualization to identify comprehensible temporal features from certain cell vector elements in each component LSTM network. The final step of using visualization to identify temporal features and providing their definitions involves humans and is semiautomatic. All other steps are automatic. Our temporal feature extraction method is general and can be used for many clinical applications. Our method has never been implemented in computer code. In addition, some of its technical details are not provided in our design paper [149]. In this study, we will fill in all of the missing technical details and code and test this method.

The Final Predictive Models in Aim 1

We will use the extracted temporal features, such as the slope of the number of days to albuterol refill, to transform longitudinal data into tabular data, producing 1 column per temporal feature, and add static features. We will place no artificial upper or lower bound and use as many features as needed (likely several dozen to several hundred features based on our previous experience [43-45]). Our data are relatively balanced [10-14]. We will harness Weka [169], a major open-source machine learning toolkit, to create the final models in aim 1. As aim 2 shows, these models are suitable for automatic explanations. Weka implements many classic machine learning algorithms and feature selection techniques. We will adopt supervised algorithms and our previous method [170] to automate selection of the machine learning algorithm, feature selection technique, and hyperparameter values out of all applicable ones. When needed, we will manually perform fine-tuning.

We will use past data up to the prediction time point to construct 5 sets of models, 1 set for each of 5 combinations: pediatric asthma at IH and KPSC and adult asthma at IH, UWM, and KPSC. UWM has rather incomplete data on many of its patients, partly because most of its patients are referred from elsewhere. To reduce the impact of incomplete data on model performance, we will harness our previous constraint-based method [164,171] to identify the patients apt to get most of their care from UWM, and we will construct models for them. As mentioned earlier, we will also implement and test our method on COPD.

Evaluating Model Performance and Power Analysis

The discussion below focuses on IH data. The cases with UWM and KPSC data are analogous. As we need to calculate outcomes in the following year, we effectively have 15 years of IH data over the previous 16 years. We will train and test the models in a standard way. On the data of the first 14 years, we will perform stratified 10-fold cross validation [169] to train models and gauge their performance. On the data of the 15th year, we will appraise the performance of the best models, reflecting future use in practice. We will use the standard performance metric AUC [169] to choose the best model and record its AUC. We will show the model’s accuracy, sensitivity, specificity, and positive and negative predictive values when the cutoff threshold of binary classification varies from the top 1% to the top 50% of patients with asthma with the highest predicted risk. To find the variables essential for achieving high model performance, backward elimination [154] will be adopted to remove features as long as AUC drops by ≤0.002. We will compare the variables essential for achieving high model performance on IH data with those on UWM and KPSC data. The gender’s predictive power will be checked explicitly. We will use the variables appearing in both the UWM and IH data sets to construct a best model on IH data and compare its performance on UWM data with that on IH data. We will use the variables appearing in both the KPSC and IH data sets to construct a best model on IH data and compare its performance on KPSC data with that on IH data.

We will test the hypothesis that adopting our techniques could enhance model performance twice, once for adults and once for children. To do this, we will compare the AUCs of 2 predictive models built using the attributes in our data set and the best machine learning algorithm. The first model will harness all the features essential for achieving high model performance. The second model will be performed in the same way as our recent model for predicting hospital use for asthma [44] related to proneness to exacerbation. We anticipate that the second model will have an AUC around our recent model’s AUC of 0.86. Our hypothesis is as follows:

  1. Null hypothesis: the second model has the same AUC as the first model.
  2. Alternative hypothesis: the second model has a smaller AUC than the first model.

The categorical outcome variable of proneness to exacerbation has 2 values (classes). According to the standard method developed by Obuchowski and McClish [172] for AUC-related sample size computation, using a 2-sided Z test at a significance level of 0.05 and assuming for both classes a Pearson correlation coefficient of 0.6 between the 2 models’ predictions, a sample size of 464 instances per class provides 90% power to identify an AUC difference of 0.05 between the 2 models. The 15th year’s IH data cover >5000 children with asthma and >14,000 adults with asthma, offering sufficient power to test our hypothesis. If the real correlation coefficient is different from the assumed one by no more than a moderate degree, the conclusion would remain valid.

Sensitivity Analysis

IH, UWM, and KPSC each recorded many variables. Another health care system could record fewer variables. We will test miscellaneous variable combinations and assess the performance of the corresponding modified models. This will help us ensure generalizability and identify critical variables. If a health care system does not record a particular critical variable, the assessed performance numbers can suggest alternative variables with minimal degradation of model performance. On the basis of our clinical experts’ judgment, we will merge variables apt to co-occur, such as the variables appearing in a lab test panel, into groups. We will form and publish a table listing possible combinations of variables by groups, accompanied by the performance numbers and the trained parameters of the corresponding predictive models. A health care system interested in deploying the model can use the table to assess the expected model performance for their data environment and determine the variables to be recorded. The table contains a distinct column for each of IH, UWM, and KPSC. Many variables recorded by IH, UWM, and KPSC and used in this study are common and recorded by many other health care systems. Hence, these health care systems already have all the variables appearing in each of many rows in the table.

Aim 2: Automate Finding Modifiable Temporal Risk Factors for Every High-Risk Patient

Overview of Aim 2

For patients with predicted risk over a fixed bar, such as the 75th percentile, we will automate explaining warnings, finding modifiable temporal risk factors, and recommending customized interventions. This will help clinicians make decisions regarding IDM enrollment and develop customized care plans. To create the new function, we will enhance our previous method [140] of automatically explaining machine learning predictions with no loss of model performance. Our previous method cannot deal with longitudinal data, has hard-to-tune hyperparameters, and has not been previously used for COPD or IDM.

Explanation Method

As aim 1 shows, we will use temporal features to transform longitudinal data into tabular data, producing one column per temporal feature. Our previous automatic explanation method [140] can then be used. Each patient is labeled as either high risk or not high risk. Our method mines from past data association rules tied to high risk. One example rule is as follows: the sulfur dioxide level was ≥4 parts per million for ≥4 days in the previous week AND the number of days to albuterol refill rose over the previous 12 months → the patient is high risk. The second item on the left-hand side of the rule is a modifiable temporal risk factor. Three interventions for it are to (1) assess the patient on asthma triggers and ensure that the patient avoids them; (2) evaluate compliance with asthma controller medications and prescribe, modify, or increase the doses of the medications if necessary; and (3) create a new asthma action plan to use more aggressive interventions when the patient is in the yellow zone [173]. Our paper [149] presented multiple interventions for several other temporal risk factors. Through discussion and consensus, our clinical team will examine the mined rules and remove those that make little or no clinical sense. For each rule left, our clinical team will identify the modifiable temporal risk factors in the rule and provide zero or more evidence-based interventions from the literature addressing the reason that the rule provides. The rules are used to provide explanations instead of predictions.

At prediction time, for each patient our most accurate model (initially resulting from aim 1) marks high risk, we will identify and present all association rules tied to high risk and whose left-hand side conditions are fulfilled by the patient, as well as show the rules’ linked interventions as our recommendations. Every rule presents a reason why the patient is predicted to be at high risk. Users of the automatic explanation function could provide input to facilitate the identification and removal of unreasonable rules [174].

Automatically and Efficiently Selecting Hyperparameter Values

Our previous automatic explanation method [140-142] uses 5 hyperparameters. Their effective values differ according to the modeling problem and data set. In our previous work [140-142], for each data set, a computing expert took several months to perform many trials to laboriously find these values. To reduce this overhead and to allow health care researchers with no extensive computing background to use our method, we will extend the progressive sampling-based approach, which we previously developed for expediting automatic machine learning model selection [170], to automatically and efficiently select the values of the 5 hyperparameters. On average, our progressive sampling-based approach performs the search process 2 orders of magnitude faster than the modern Auto-Weka automatic selection approach [170,175]. Our approach generalizes to many clinical applications.

We will also develop our techniques on COPD.

Aim 3: Assess the Impact of Actionable Warnings on Clinicians’ Decisions to Use IDM to Prevent Proneness to Exacerbation

Goal of Aim 3

To prepare for future clinical use, in a UWM test setting, we will assess the impact of actionable warnings on clinicians’ decisions to use IDM in patients with asthma to prevent proneness to exacerbation. We will also access UWM physicians’ (primary care doctors, pulmonologists, and allergists) and nurses’ subjective opinions of automatic explanations.

Recruiting Subjects

As an UWM operational project, we are building asthma outcome prediction models and have access to approximately 700 physicians and approximately 1700 nurses managing adult patients with asthma. Through personal contact and advertising in their email lists, we will recruit 20 test participants (10 physicians and 10 nurses) with purposeful sampling to guarantee sufficient variability in their work experience [176]. Every test participant will offer consent before participation and be current on UWM’s policy training on information security and privacy. To protect privacy, every test participant will receive a pseudonym linking their responses. Upon task completion, each physician will receive US $2300 as compensation for participation and for approximately 20 hours of work. Each nurse will receive US $1200 as compensation for participation and for approximately 20 hours of work.

Procedures

Using the 15th year’s (2019) IH data, we will randomly select 800 IH adult patients with asthma and automatically explain the predictions of the best performing IH model formed in aim 1. Using patients outside the UWM can help ensure that no test participant knows the outcome of any of these patients in the following year. We will present a distinct subset of 40 patients to each test participant and proceed in the following 4 steps:

  1. Step 1: For each patient, we will display to the test participant the 2005-2019 deidentified patient data in reverse chronological order, as in the electronic medical records, and ask the test participant to write down the IDM enrollment decision (yes or no) and any interventions that the test participant plans to adopt on the patient.
  2. Step 2: For each patient, we will display to the test participant the 2005-2019 deidentified patient data, the prediction, the automatic explanations, and the interventions connected to them. We will ask the test participant to write down their IDM enrollment decision (yes or no) on the patient after seeing the prediction and the explanations, the linked interventions they agree with, those they disagree with, and the interventions that they come up with in step 1 but whose concepts are missed by the linked interventions.
  3. Step 3: Perceived usefulness is closely linked to future use intentions and actual function use [177,178]. Using the classic Technology Acceptance Model satisfaction questionnaire [179], we will survey the test participant to know their perceived ease of use and usefulness of automatic explanations.
  4. Step 4: We will conduct a focus group with 10 randomly chosen test participants to assess what helps them use or prevents them from using the automatic explanations in clinical practice and why they agree or disagree with the automatically recommended interventions.
Quantitative and Qualitative Analyses
Quantitative Analyses

We will provide descriptive statistics for each quantitative outcome measure, including the mean and SD of each of the following: (1) the number of times that a test participant changes their IDM enrollment decision on a patient after seeing the prediction and the explanations, (2) the number of linked interventions for a patient a test participant agrees with, (3) the number of linked interventions for a patient a test participant disagrees with, (4) the number of interventions that a test participant comes up with for a patient in step 1 but whose concepts are missed by the linked interventions, and (5) the rating of every item in the Technology Acceptance Model satisfaction questionnaire. We will test the hypothesis that giving actionable warnings will improve clinicians’ decision to use IDM to prevent proneness to exacerbation, that is, the degree of IDM enrollment decision matching whether the patient will become prone to exacerbation in the following year. Our hypothesis is as follows:

  1. Null hypothesis: The degree of IDM enrollment decision matching whether the patient will become prone to exacerbation in the following year in step 2 is the same as that in step 1.
  2. Alternative hypothesis: The degree of IDM enrollment decision matching whether the patient will become prone to exacerbation in the following year in step 2 is larger than that in step 1.

We will fit a random effect logistic model that accounts for the correlation among the outcomes related to the same test participant.

Power Analysis for the Quantitative Analyses

Assuming a modest intraclass correlation of 0.1 within the same test participant on the outcome, a sample size of 40 patients per test participant for the 20 test participants is equivalent to a total of 82 independent patients after factoring in the clustering effect. We will have, at a 2-sided significance level of .05, 80% power to detect a 9.7% increase in the chances of improving clinicians’ decisions to use IDM with actionable warnings. If the real correlation is different from the assumed one by no more than a moderate degree, a similar conclusion would hold.

Qualitative Analyses

Using the inductive method described in Patton et al [176,180], test participants’ comments recorded in text during the focus group will be loaded into ATLAS.ti qualitative analysis software (ATLAS.ti Scientific Software Development GmbH) [181]. Three people from our research team will highlight the quotations independently. Through discussion and negotiated consensus in multiple iterations, these people will review quotations, categorize quotations into precodes, merge codes into categories, and synthesize categories to identify general themes.

Exploring for Other Diseases

Preventive care is also widely adopted for patients with heart diseases and diabetes. To explore what will be needed to generalize our techniques to predict outcomes of these diseases in the future, we will conduct 2 phases of focus groups, each phase with a distinct set of 6 UWM clinical experts on these diseases, and add more phases if these 2 phases do not reach saturation.

As stated immediately before aim 1, the discussion above concentrates on asthma. Whenever we refer to asthma, the same applies to COPD and will be implemented and tested on COPD in aims 1 and 2 but not in aim 3.

Ethics Approval

We have received approval from the UWM institutional review board for this study and are applying for approval from IH and KPSC.


We have downloaded 2005-2020 weather and air quality data from public sources [145,146]. For the clinical and administrative data, GL at UWM has obtained the 2005-2018 data of patients with asthma from IH [44], the 2009-2018 data of patients with asthma from KPSC [45], and the 2011-2018 data of patients with asthma from UWM [43]. We are retrieving the other clinical and administrative data, mostly of patients with COPD, from IH, UWM, and KPSC. We intend to complete the study in 6 years.


Using Our Results in Clinical Practice

IH, UWM, KPSC, and many other health care systems use IDM and use inaccurate predictive models with AUC<0.8 and sensitivity ≤49% for preventive care via care management [22,24-26,46-57,59-93]. Similar to our recent work of using IH, UWM, and KPSC data to greatly increase prediction accuracy for hospital use for asthma [43-45] related to exacerbation proneness, we expect our models predicting exacerbation proneness to be more accurate than those inaccurate models, benefit many patients, and have practical value. We will automate explaining warnings and recommending interventions to aid clinicians to review structured data in patient clinical records faster and create customized care plans based on objective data. After our methods find patients with the greatest predicted risks and offer explanations, clinicians will review patient clinical records, look at factors such as social dimensions [182], and make IDM enrollment and intervention decisions. As feature patterns linked to high risk and patient status keep changing, our techniques can be used continuously to move patients out of and into IDM and to discover new feature patterns.

In addition to making the predictive model more accurate, using temporal features showing early health changes and general directions could also boost warning timeliness. If a patient will be admitted to the hospital for COPD or asthma and the model would not predict this until 1 week before the hospital admission, intervening at that time could be too late to avoid the admission. If the model uses suitable temporal features and runs continuously, this patient could be found several weeks earlier, when health decline just begins and preventing hospital admission is likely.

Generalizability

Predictive models vary by diseases and other factors and could be dissimilar to each other. However, our proposed methods and software for extracting temporal features and automatically explaining machine learning predictions are general and do not rely on any special property of a specific health care system, disease, or patient cohort. Given a new data set with a different disease, set of variables, patient cohort, or prediction target, one can plug in our software to extract temporal features and to automatically explain machine learning predictions. Besides being used for patients with asthma and patients with COPD, preventive care is also widely adopted for patients with heart disease and patients with diabetes [128], where our techniques could be harnessed, for example, to predict hospital use. Our sensitivity analysis results in aim 1 can be used to identify critical variables and determine how to generalize a predictive model to a health care system recording a different set of variables from IH, UWM, and KPSC.

We will use data retrieved from 3 health care systems, UWM, IH, and KPSC, to demonstrate our techniques on patients with asthma and patients with COPD. These systems include an academic system that has most of its patients referred from elsewhere (UWM), 2 integrated systems (IH and KPSC), and 42 hospitals and 458 clinics. Spreading across 3 large geographic areas, these heterogeneous facilities range from tertiary care hospitals in large cities served by subspecialists to community rural and urban clinics served by general practitioners and family physicians with limited resources. These health care systems use 4 distinct electronic medical record systems: KPSC uses Epic; UWM uses Epic and Cerner; and IH uses Health Evolution through Logical Processing, Health Evolution through Logical Processing 2, and Cerner. Variations in health care system type, patient population, geographic location, cultural background, staff composition, electronic medical record system, and scope of services enable us to identify factors that generalize to other facilities nationwide. The OMOP common data model [147] and its linked standardized terminologies [148] standardize administrative and clinical variables from ≥10 major American health care systems [183,184]. Our models will be based on OMOP and apply to these health care systems using OMOP.

With appropriate extension, our techniques can be adopted for miscellaneous diseases and decision support applications and can improve clinical machine learning. For example, our techniques can enhance the prediction accuracy of other outcomes such as no-shows [185], hospital use [186], and treatment adherence [187]. This will enable us to target resources, such as telephone reminders to reduce no-shows [185], home visits by nurses and care management to reduce hospital use [186], and interventions to boost treatment adherence [187].

We can use the features extracted by our temporal feature extraction method to create a feature library to ease feature reuse [188]. This will help reduce the effort required to create predictive models for other modeling projects.

Significance Thresholds

In both the Evaluating Model Performance and Power Analysis and Quantitative and Qualitative Analyses sections, we use the widely adopted significance level of .05 to perform power analysis. The statistics community has debated a lot about the P value and its dichotomization [189-191]. Setting a threshold for the P value is essential for power analysis and sample size estimation [189]. In addition, to the best of our knowledge, no consensus has been reached on what the best alternative is if P values and statistical significance are not used [189]. Following the advice given by Amrhein et al [191], after obtaining the results of this study, we will report the actual P values, treat them as continuous measures of evidence against the null hypotheses rather than as parts of binary decision rules, and acknowledge that multiple independent studies are needed to provide stronger support for or against our hypotheses.

Conclusions

Our results will help make IDM for asthma and COPD more proactive, effective, and efficient, improving outcomes and saving resources. Future studies will evaluate our methods for heart diseases, diabetes, and other diseases; deploy our methods at UWM, KPSC, and IH for IDM for asthma and COPD; and test the performance against the current IDM practice.

Acknowledgments

The authors thank Peter Tarczy-Hornoch and Siyang Zeng for helpful discussions. GL, BS, XS, SH, CK, and FN were partially supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under award number R01HL142503. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Authors' Contributions

GL was mainly responsible for this study. He conceptualized and designed the study, performed the literature review, and wrote the paper. FN offered feedback on study design and medical issues, participated in performing the literature review, and revised the paper. BS offered feedback on study design and medical issues and revised the paper. XS took part in conceptualizing and writing the statistical analysis sections. CK took part in retrieving the KPSC data set of patients with asthma and interpreting its detected peculiarities. SH took part in retrieving the IH data set and interpreting its detected peculiarities. All authors read and approved the final manuscript.

Conflicts of Interest

None declared.

  1. Asthma. Centers for Disease Control and Prevention. 2021.   URL: http://www.cdc.gov/nchs/fastats/asthma.htm [accessed 2021-04-28]
  2. Akinbami LJ, Moorman JE, Liu X. Asthma prevalence, health care use, and mortality: United States, 2005-2009. Natl Health Stat Report 2011 Jan 12(32):1-14 [FREE Full text] [Medline]
  3. Akinbami LJ, Moorman JE, Bailey C, Zahran HS, King M, Johnson CA, et al. Trends in asthma prevalence, health care use, and mortality in the United States, 2001-2010. NCHS Data Brief 2012 May(94):1-8 [FREE Full text] [Medline]
  4. Asthma in the US. Centers for Disease Control and Prevention. 2021.   URL: http://www.cdc.gov/vitalsigns/asthma [accessed 2021-04-28]
  5. Ford ES, Murphy LB, Khavjou O, Giles WH, Holt JB, Croft JB. Total and state-specific medical and absenteeism costs of COPD among adults aged ≥ 18 years in the United States for 2010 and projections through 2020. Chest 2015 Jan;147(1):31-45. [CrossRef] [Medline]
  6. Dougherty RH, Fahy JV. Acute exacerbations of asthma: epidemiology, biology and the exacerbation-prone phenotype. Clin Exp Allergy 2009 Feb;39(2):193-202 [FREE Full text] [CrossRef] [Medline]
  7. Andersson F, Borg S, Jansson SA, Jonsson AC, Ericsson A, Prütz C, et al. The costs of exacerbations in chronic obstructive pulmonary disease (COPD). Respir Med 2002 Sep;96(9):700-708 [FREE Full text] [CrossRef] [Medline]
  8. Toy EL, Gallagher KF, Stanley EL, Swensen AR, Duh MS. The economic impact of exacerbations of chronic obstructive pulmonary disease and exacerbation definition: a review. COPD 2010 Jun;7(3):214-228. [CrossRef] [Medline]
  9. Donaldson GC, Seemungal TA, Bhowmik A, Wedzicha JA. Relationship between exacerbation frequency and lung function decline in chronic obstructive pulmonary disease. Thorax 2002 Oct;57(10):847-852 [FREE Full text] [Medline]
  10. Denlinger LC, Heymann P, Lutter R, Gern JE. Exacerbation-prone asthma. J Allergy Clin Immunol Pract 2020 Feb;8(2):474-482 [FREE Full text] [CrossRef] [Medline]
  11. Denlinger LC, Phillips BR, Ramratnam S, Ross K, Bhakta NR, Cardet JC, National Heart‚ Lung‚ and Blood Institute’s Severe Asthma Research Program-3 Investigators. Inflammatory and comorbid features of patients with severe asthma and frequent exacerbations. Am J Respir Crit Care Med 2017 Feb 01;195(3):302-313 [FREE Full text] [CrossRef] [Medline]
  12. Peters MC, Mauger D, Ross KR, Phillips B, Gaston B, Cardet JC, et al. Evidence for exacerbation-prone asthma and predictive biomarkers of exacerbation frequency. Am J Respir Crit Care Med 2020 Oct 01;202(7):973-982. [CrossRef] [Medline]
  13. Punekar YS, Shukla A, Müllerova H. COPD management costs according to the frequency of COPD exacerbations in UK primary care. Int J Chron Obstruct Pulmon Dis 2014;9:65-73 [FREE Full text] [CrossRef] [Medline]
  14. Soler-Cataluña JJ, Rodriguez-Roisin R. Frequent chronic obstructive pulmonary disease exacerbators: how much real, how much fictitious? COPD 2010 Aug;7(4):276-284. [CrossRef] [Medline]
  15. Sprio AE, Carriero V, Levra S, Botto C, Bertolini F, Di Stefano A, et al. Clinical characterization of the frequent exacerbator phenotype in asthma. J Clin Med 2020 Jul 14;9(7):2226 [FREE Full text] [CrossRef] [Medline]
  16. Loymans RJ, Sterk PJ. Exacerbation-prone asthma: a separate bioclinical phenotype? Am J Respir Crit Care Med 2017 Feb 01;195(3):275-277. [CrossRef] [Medline]
  17. Foster JM, McDonald VM, Guo M, Reddel HK. "I have lost in every facet of my life": the hidden burden of severe asthma. Eur Respir J 2017 Dec;50(3):1700765. [CrossRef] [Medline]
  18. Doll H, Miravitlles M. Health-related QOL in acute exacerbations of chronic bronchitis and chronic obstructive pulmonary disease: a review of the literature. Pharmacoeconomics 2005;23(4):345-363. [CrossRef] [Medline]
  19. Nicolson P, Anderson P. The patient's burden: physical and psychological effects of acute exacerbations of chronic bronchitis. J Antimicrob Chemother 2000 Mar;45:25-32. [CrossRef] [Medline]
  20. Waljee AK, Rogers MA, Lin P, Singal AG, Stein JD, Marks RM, et al. Short term use of oral corticosteroids and related harms among adults in the United States: population based cohort study. Br Med J 2017 Apr 12;357:j1415 [FREE Full text] [CrossRef] [Medline]
  21. Diagnosis and management of difficult-to-treat and severe asthma in adolescent and adult patients. Global Initiative for Asthma. 2019.   URL: https://ginasthma.org/severeasthma/ [accessed 2021-04-28]
  22. Curry N, Billings J, Darin B, Dixon J, Williams M, Wennberg D. Predictive risk project literature review. London: King’s Fund; 2005.   URL: http:/​/www.​kingsfund.org.uk/​sites/​files/​kf/​field/​field_document/​predictive-risk-literature-review-june2005.​pdf [accessed 2021-04-28]
  23. Vogeli C, Shields AE, Lee TA, Gibson TB, Marder WD, Weiss KB, et al. Multiple chronic conditions: prevalence, health consequences, and implications for quality, care management, and costs. J Gen Intern Med 2007 Dec;22 Suppl 3:391-395 [FREE Full text] [CrossRef] [Medline]
  24. Nelson L. Lessons from Medicare's demonstration projects on disease management and care coordination. 2012.   URL: https:/​/www.​cbo.gov/​sites/​default/​files/​112th-congress-2011-2012/​workingpaper/​WP2012-01_Nelson_Medicare_DMCC_Demonstrations_1.​pdf [accessed 2021-04-28]
  25. Caloyeras JP, Liu H, Exum E, Broderick M, Mattke S. Managing manifest diseases, but not health risks, saved PepsiCo money over seven years. Health Aff (Millwood) 2014 Jan;33(1):124-131. [CrossRef] [Medline]
  26. Mays GP, Claxton G, White J. Managed care rebound? Recent changes in health plans' cost containment strategies. Health Aff (Millwood) 2004;Suppl Web Exclusives:W4-427-W4-436 [FREE Full text] [CrossRef] [Medline]
  27. Peytremann-Bridevaux I, Burnand B. Disease management: a proposal for a new definition. Int J Integr Care 2009 May 27;9:e16 [FREE Full text] [CrossRef] [Medline]
  28. Nici L, ZuWallack R, American Thoracic Society Subcommittee on Integrated Care of the COPD Patient. An official American Thoracic Society workshop report: the integrated care of the COPD patient. Proc Am Thorac Soc 2012 Mar;9(1):9-18. [CrossRef] [Medline]
  29. Ferrone M, Masciantonio MG, Malus N, Stitt L, O'Callahan T, Roberts Z, Primary Care Innovation Collaborative. The impact of integrated disease management in high-risk COPD patients in primary care. NPJ Prim Care Respir Med 2019 Mar 28;29(1):8 [FREE Full text] [CrossRef] [Medline]
  30. Lemmens KM, Nieboer AP, Huijsman R. A systematic review of integrated use of disease management interventions in asthma and COPD. Respir Med 2009 May;103(5):670-691 [FREE Full text] [CrossRef] [Medline]
  31. Colorado Region. Asthma disease management program. Perm J 2000;4(2):48-56 [FREE Full text] [Medline]
  32. Jain VV, Allison R, Beck SJ, Jain R, Mills PK, McCurley JW, et al. Impact of an integrated disease management program in reducing exacerbations in patients with severe asthma and COPD. Respir Med 2014 Dec;108(12):1794-1800 [FREE Full text] [CrossRef] [Medline]
  33. Axelrod RC, Vogel D. Predictive modeling in health plans. Disease Manage Health Outcomes 2003;11(12):779-787. [CrossRef]
  34. Rice KL, Dewan N, Bloomfield HE, Grill J, Schult TM, Nelson DB, et al. Disease management program for chronic obstructive pulmonary disease: a randomized controlled trial. Am J Respir Crit Care Med 2010 Oct 1;182(7):890-896. [CrossRef] [Medline]
  35. Bandurska E, Damps-Konstańska I, Popowski P, Jędrzejczyk T, Janowiak P, Świętnicka K, et al. Impact of integrated care model (ICM) on direct medical costs in management of advanced chronic obstructive pulmonary disease (COPD). Med Sci Monit 2017 Jun 12;23:2850-2862 [FREE Full text] [CrossRef] [Medline]
  36. Levine SH, Adams J, Attaway K, Dorr DA, Leung M, Popescu P, et al. Predicting the financial risks of seriously ill patients. California HealthCare Foundation. 2011.   URL: http://www.chcf.org/publications/2011/12/predictive-financial-risks [accessed 2021-04-28]
  37. Rubin RJ, Dietrich KA, Hawk AD. Clinical and economic impact of implementing a comprehensive diabetes management program in managed care. J Clin Endocrinol Metab 1998 Aug;83(8):2635-2642. [CrossRef] [Medline]
  38. Greineder DK, Loane KC, Parks P. A randomized controlled trial of a pediatric asthma outreach program. J Allergy Clin Immunol 1999 Mar;103(3 Pt 1):436-440. [Medline]
  39. Kelly CS, Morrow AL, Shults J, Nakas N, Strope GL, Adelman RD. Outcomes evaluation of a comprehensive intervention program for asthmatic children enrolled in Medicaid. Pediatrics 2000 May;105(5):1029-1035. [Medline]
  40. Axelrod RC, Zimbro KS, Chetney RR, Sabol J, Ainsworth VJ. A disease management program utilizing life coaches for children with asthma. J Clin Outcomes Manag 2001;8(6):38-42 [FREE Full text]
  41. Dorr DA, Wilcox AB, Brunker CP, Burdon RE, Donnelly SM. The effect of technology-supported, multidisease care management on the mortality and hospitalization of seniors. J Am Geriatr Soc 2008 Dec;56(12):2195-2202. [CrossRef] [Medline]
  42. Beaulieu N, Cutler DM, Ho K, Isham G, Lindquist T, Nelson A, et al. The business case for diabetes disease management for managed care organizations. Forum Health Econ Policy 2006;9(1):1-37. [CrossRef]
  43. Tong Y, Messinger AI, Wilcox AB, Mooney SD, Davidson GH, Suri P, et al. Forecasting future asthma hospital encounters of patients with asthma in an academic health care system: predictive model development and secondary analysis study. J Med Internet Res 2021 Apr 16;23(4):e22796 [FREE Full text] [CrossRef] [Medline]
  44. Luo G, He S, Stone BL, Nkoy FL, Johnson MD. Developing a model to predict hospital encounters for asthma in asthmatic patients: secondary analysis. JMIR Med Inform 2020 Jan 21;8(1):e16080 [FREE Full text] [CrossRef] [Medline]
  45. Luo G, Nau CL, Crawford WW, Schatz M, Zeiger RS, Rozema E, et al. Developing a predictive model for asthma-related hospital encounters in patients with asthma in a large, integrated health care system: secondary analysis. JMIR Med Inform 2020 Nov 09;8(11):e22689 [FREE Full text] [CrossRef] [Medline]
  46. Schatz M, Nakahiro R, Jones CH, Roth RM, Joshua A, Petitti D. Asthma population management: development and validation of a practical 3-level risk stratification scheme. Am J Manag Care 2004 Jan;10(1):25-32 [FREE Full text] [Medline]
  47. Schatz M, Cook EF, Joshua A, Petitti D. Risk factors for asthma hospitalizations in a managed care organization: development of a clinical prediction rule. Am J Manag Care 2003 Aug;9(8):538-547 [FREE Full text] [Medline]
  48. Lieu TA, Quesenberry CP, Sorel ME, Mendoza GR, Leong AB. Computer-based models to identify high-risk children with asthma. Am J Respir Crit Care Med 1998 Apr;157(4 Pt 1):1173-1180. [CrossRef] [Medline]
  49. Lieu TA, Capra AM, Quesenberry CP, Mendoza GR, Mazar M. Computer-based models to identify high-risk adults with asthma: is the glass half empty of half full? J Asthma 1999 Jun;36(4):359-370. [Medline]
  50. Forno E, Fuhlbrigge A, Soto-Quirós ME, Avila L, Raby BA, Brehm J, et al. Risk factors and predictive clinical scores for asthma exacerbations in childhood. Chest 2010 Nov;138(5):1156-1165 [FREE Full text] [CrossRef] [Medline]
  51. Miller MK, Lee JH, Blanc PD, Pasta DJ, Gujrathi S, Barron H, et al. TENOR risk score predicts healthcare in adults with severe or difficult-to-treat asthma. Eur Respir J 2006 Dec;28(6):1145-1155 [FREE Full text] [CrossRef] [Medline]
  52. Loymans RJ, Debray TP, Honkoop PJ, Termeer EH, Snoeck-Stroband JB, Schermer TR, et al. Exacerbations in adults with asthma: a systematic review and external validation of prediction models. J Allergy Clin Immunol Pract 2018;6(6):1942-1952. [CrossRef] [Medline]
  53. Loymans RJ, Honkoop PJ, Termeer EH, Snoeck-Stroband JB, Assendelft WJ, Schermer TR, et al. Identifying patients at risk for severe exacerbations of asthma: development and external validation of a multivariable prediction model. Thorax 2016 Sep;71(9):838-846. [CrossRef] [Medline]
  54. Eisner MD, Yegin A, Trzaskoma B. Severity of asthma score predicts clinical outcomes in patients with moderate to severe persistent asthma. Chest 2012 Jan;141(1):58-65. [CrossRef] [Medline]
  55. Sato R, Tomita K, Sano H, Ichihashi H, Yamagata S, Sano A, et al. The strategy for predicting future exacerbation of asthma using a combination of the Asthma Control Test and lung function test. J Asthma 2009 Sep;46(7):677-682. [CrossRef] [Medline]
  56. Yurk RA, Diette GB, Skinner EA, Dominici F, Clark RD, Steinwachs DM, et al. Predicting patient-reported asthma outcomes for adults in managed care. Am J Manag Care 2004 May;10(5):321-328 [FREE Full text] [Medline]
  57. Xiang Y, Ji H, Zhou Y, Li F, Du J, Rasmy L, et al. Asthma exacerbation prediction and risk factor analysis based on a time-sensitive, attentive neural network: retrospective cohort study. J Med Internet Res 2020 Jul 31;22(7):e16981 [FREE Full text] [CrossRef] [Medline]
  58. Zein JG, Wu CP, Attaway AH, Zhang P, Nazha A. Novel machine learning can predict acute asthma exacerbation. Chest 2021 May 01;159(5):1747-1757. [CrossRef] [Medline]
  59. Guerra B, Gaveikaite V, Bianchi C, Puhan MA. Prediction models for exacerbations in patients with COPD. Eur Respir Rev 2017 Jan;26(143):160061 [FREE Full text] [CrossRef] [Medline]
  60. Bellou V, Belbasis L, Konstantinidis AK, Tzoulaki I, Evangelou E. Prognostic models for outcome prediction in patients with chronic obstructive pulmonary disease: systematic review and critical appraisal. Br Med J 2019 Oct 04;367:l5358 [FREE Full text] [CrossRef] [Medline]
  61. Alcázar B, García-Polo C, Herrejón A, Ruiz LA, de Miguel J, Ros JA, et al. Factors associated with hospital admission for exacerbation of chronic obstructive pulmonary disease. Arch Bronconeumol 2012 Mar;48(3):70-76. [CrossRef] [Medline]
  62. Tavakoli H, Chen W, Sin DD, FitzGerald JM, Sadatsafavi M. Predicting severe chronic obstructive pulmonary disease exacerbations. Developing a population surveillance approach with administrative data. Ann Am Thorac Soc 2020 Sep;17(9):1069-1076. [CrossRef] [Medline]
  63. Orchard P, Agakova A, Pinnock H, Burton CD, Sarran C, Agakov F, et al. Improving prediction of risk of hospital admission in chronic obstructive pulmonary disease: application of machine learning to telemonitoring data. J Med Internet Res 2018 Sep 21;20(9):e263 [FREE Full text] [CrossRef] [Medline]
  64. Yii AC, Loh CH, Tiew PY, Xu H, Taha AA, Koh J, et al. A clinical prediction model for hospitalized COPD exacerbations based on "treatable traits". Int J Chron Obstruct Pulmon Dis 2019;14:719-728 [FREE Full text] [CrossRef] [Medline]
  65. Annavarapu S, Goldfarb S, Gelb M, Moretz C, Renda A, Kaila S. Development and validation of a predictive model to identify patients at risk of severe COPD exacerbations using administrative claims data. Int J Chron Obstruct Pulmon Dis 2018;13:2121-2130 [FREE Full text] [CrossRef] [Medline]
  66. Adibi A, Sin DD, Safari A, Johnson KM, Aaron SD, FitzGerald JM, et al. The Acute COPD Exacerbation Prediction Tool (ACCEPT): a modelling study. Lancet Respir Med 2020 Oct;8(10):1013-1021. [CrossRef] [Medline]
  67. Stanford RH, Nag A, Mapel DW, Lee TA, Rosiello R, Schatz M, et al. Claims-based risk model for first severe COPD exacerbation. Am J Manag Care 2018 Feb 01;24(2):45-53 [FREE Full text] [Medline]
  68. Stanford RH, Nag A, Mapel DW, Lee TA, Rosiello R, Vekeman F, et al. Validation of a new risk measure for chronic obstructive pulmonary disease exacerbation using health insurance claims data. Ann Am Thorac Soc 2016 Jul;13(7):1067-1075. [CrossRef] [Medline]
  69. Stanford RH, Korrer S, Brekke L, Reinsch T, Bengtson LG. Validation and assessment of the COPD treatment ratio as a predictor of severe exacerbations. Chronic Obstr Pulm Dis 2020 Jan;7(1):38-48 [FREE Full text] [CrossRef] [Medline]
  70. Stanford RH, Lau MS, Li Y, Stemkowski S. External validation of a COPD risk measure in a commercial and Medicare population: the COPD treatment ratio. J Manag Care Spec Pharm 2019 Jan;25(1):58-69. [CrossRef] [Medline]
  71. Suetomo M, Kawayama T, Kinoshita T, Takenaka S, Matsuoka M, Matsunaga K, et al. COPD assessment tests scores are associated with exacerbated chronic obstructive pulmonary disease in Japanese patients. Respir Investig 2014 Sep;52(5):288-295. [CrossRef] [Medline]
  72. Faganello MM, Tanni SE, Sanchez FF, Pelegrino NR, Lucheta PA, Godoy I. BODE index and GOLD staging as predictors of 1-year exacerbation risk in chronic obstructive pulmonary disease. Am J Med Sci 2010 Jan;339(1):10-14. [CrossRef] [Medline]
  73. Bertens LC, Reitsma JB, Moons KG, van Mourik Y, Lammers JW, Broekhuizen BD, et al. Development and validation of a model to predict the risk of exacerbations in chronic obstructive pulmonary disease. Int J Chron Obstruct Pulmon Dis 2013;8:493-499 [FREE Full text] [CrossRef] [Medline]
  74. Lee SD, Huang MS, Kang J, Lin CH, Park MJ, Oh YM, Investigators of the Predictive Ability of CAT in Acute Exacerbations of COPD (PACE) Study. The COPD assessment test (CAT) assists prediction of COPD exacerbations in high-risk patients. Respir Med 2014 Apr;108(4):600-608 [FREE Full text] [CrossRef] [Medline]
  75. Thomsen M, Ingebrigtsen TS, Marott JL, Dahl M, Lange P, Vestbo J, et al. Inflammatory biomarkers and exacerbations in chronic obstructive pulmonary disease. J Am Med Assoc 2013 Jun 12;309(22):2353-2361. [CrossRef] [Medline]
  76. Fan VS, Curtis JR, Tu SP, McDonell MB, Fihn SD, Ambulatory Care Quality Improvement Project Investigators. Using quality of life to predict hospitalization and mortality in patients with obstructive lung diseases. Chest 2002 Aug;122(2):429-436. [CrossRef] [Medline]
  77. Moy ML, Teylan M, Danilack VA, Gagnon DR, Garshick E. An index of daily step count and systemic inflammation predicts clinical outcomes in chronic obstructive pulmonary disease. Ann Am Thorac Soc 2014 Feb;11(2):149-157. [CrossRef] [Medline]
  78. Miravitlles M, Guerrero T, Mayordomo C, Sánchez-Agudo L, Nicolau F, Segú JL. Factors associated with increased risk of exacerbation and hospital admission in a cohort of ambulatory COPD patients: a multiple logistic regression analysis. The EOLO Study Group. Respiration 2000;67(5):495-501. [CrossRef] [Medline]
  79. Niewoehner DE, Lokhnygina Y, Rice K, Kuschner WG, Sharafkhaneh A, Sarosi GA, et al. Risk indexes for exacerbations and hospitalizations due to COPD. Chest 2007 Jan;131(1):20-28. [CrossRef] [Medline]
  80. Marin JM, Carrizo SJ, Casanova C, Martinez-Camblor P, Soriano JB, Agusti AG, et al. Prediction of risk of COPD exacerbations by the BODE index. Respir Med 2009 Mar;103(3):373-378 [FREE Full text] [CrossRef] [Medline]
  81. Make BJ, Eriksson G, Calverley PM, Jenkins CR, Postma DS, Peterson S, et al. A score to predict short-term risk of COPD exacerbations (SCOPEX). Int J Chron Obstruct Pulmon Dis 2015;10:201-209 [FREE Full text] [CrossRef] [Medline]
  82. Montserrat-Capdevila J, Godoy P, Marsal JR, Barbé F. Predictive model of hospital admission for COPD exacerbation. Respir Care 2015 Sep;60(9):1288-1294 [FREE Full text] [CrossRef] [Medline]
  83. Kerkhof M, Freeman D, Jones R, Chisholm A, Price DB, Respiratory Effectiveness Group. Predicting frequent COPD exacerbations using primary care data. Int J Chron Obstruct Pulmon Dis 2015;10:2439-2450 [FREE Full text] [CrossRef] [Medline]
  84. Samp JC, Joo MJ, Schumock GT, Calip GS, Pickard AS, Lee TA. Predicting acute exacerbations in chronic obstructive pulmonary disease. J Manag Care Spec Pharm 2018 Mar;24(3):265-279. [CrossRef] [Medline]
  85. Briggs A, Spencer M, Wang H, Mannino D, Sin DD. Development and validation of a prognostic index for health outcomes in chronic obstructive pulmonary disease. Arch Intern Med 2008 Jan 14;168(1):71-79. [CrossRef] [Medline]
  86. Lange P, Marott JL, Vestbo J, Olsen KR, Ingebrigtsen TS, Dahl M, et al. Prediction of the clinical course of chronic obstructive pulmonary disease, using the new GOLD classification: a study of the general population. Am J Respir Crit Care Med 2012 Nov 15;186(10):975-981. [CrossRef] [Medline]
  87. Austin PC, Stanbrook MB, Anderson GM, Newman A, Gershon AS. Comparative ability of comorbidity classification methods for administrative data to predict outcomes in patients with chronic obstructive pulmonary disease. Ann Epidemiol 2012 Dec;22(12):881-887 [FREE Full text] [CrossRef] [Medline]
  88. Abascal-Bolado B, Novotny PJ, Sloan JA, Karpman C, Dulohery MM, Benzo RP. Forecasting COPD hospitalization in the clinic: optimizing the chronic respiratory questionnaire. Int J Chron Obstruct Pulmon Dis 2015;10:2295-2301 [FREE Full text] [CrossRef] [Medline]
  89. Blanco-Aparicio M, Vázquez I, Pita-Fernández S, Pértega-Diaz S, Verea-Hernando H. Utility of brief questionnaires of health-related quality of life (Airways Questionnaire 20 and Clinical COPD Questionnaire) to predict exacerbations in patients with asthma and COPD. Health Qual Life Outcomes 2013 May 27;11:85 [FREE Full text] [CrossRef] [Medline]
  90. Chen X, Wang Q, Hu Y, Zhang L, Xiong W, Xu Y, et al. A nomogram for predicting severe exacerbations in stable COPD patients. Int J Chron Obstruct Pulmon Dis 2020;15:379-388 [FREE Full text] [CrossRef] [Medline]
  91. Yoo JW, Hong Y, Seo JB, Chae EJ, Ra SW, Lee JH, et al. Comparison of clinico-physiologic and CT imaging risk factors for COPD exacerbation. J Korean Med Sci 2011 Dec;26(12):1606-1612 [FREE Full text] [CrossRef] [Medline]
  92. Jones RC, Price D, Chavannes NH, Lee AJ, Hyland ME, Ställberg B, UNLOCK Group of the IPCRG. Multi-component assessment of chronic obstructive pulmonary disease: an evaluation of the ADO and DOSE indices and the global obstructive lung disease categories in international primary care data sets. NPJ Prim Care Respir Med 2016 Apr 07;26:16010 [FREE Full text] [CrossRef] [Medline]
  93. Jones RC, Donaldson GC, Chavannes NH, Kida K, Dickson-Spillmann M, Harding S, et al. Derivation and validation of a composite index of severity in chronic obstructive pulmonary disease: the DOSE Index. Am J Respir Crit Care Med 2009 Dec 15;180(12):1189-1195. [CrossRef] [Medline]
  94. Ställberg B, Lisspers K, Larsson K, Janson C, Müller M, Łuczko M, et al. Predicting hospitalization due to COPD exacerbations in Swedish primary care patients using machine learning - based on the ARCTIC study. Int J Chron Obstruct Pulmon Dis 2021;16:677-688 [FREE Full text] [CrossRef] [Medline]
  95. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997 Nov 15;9(8):1735-1780. [CrossRef] [Medline]
  96. Gers FA, Schmidhuber J, Cummins FA. Learning to forget: continual prediction with LSTM. Neural Comput 2000 Oct;12(10):2451-2471. [CrossRef] [Medline]
  97. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med 2018 May 8;1:18 [FREE Full text] [CrossRef] [Medline]
  98. Lipton ZC, Kale DC, Elkan C, Wetzel RC. Learning to diagnose with LSTM recurrent neural networks. In: Proceedings of the International Conference on Learning Representations. 2016 Presented at: International Conference on Learning Representations; May 2-4, 2016; San Juan, Puerto Rico p. 1-18   URL: https://arxiv.org/abs/1511.03677
  99. Kam HJ, Kim HY. Learning representations for the early detection of sepsis with deep neural networks. Comput Biol Med 2017 Dec 01;89:248-255. [CrossRef] [Medline]
  100. Razavian N, Marcus J, Sontag D. Multi-task prediction of disease onsets from longitudinal laboratory tests. In: Proceedings of the Machine Learning in Health Care Conference. 2016 Presented at: Machine Learning in Health Care Conference; August 19-20, 2016; Los Angeles, CA p. 73-100   URL: http://proceedings.mlr.press/v56/Razavian16.html
  101. Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, MA: MIT Press; 2016.
  102. Combi C, Keravnou-Papailiou E, Shahar Y. Temporal Information Systems in Medicine. New York, NY: Springer; 2010.
  103. Moskovitch R, Shahar Y, Wang F, Hripcsak G. Temporal biomedical data analytics. J Biomed Inform 2019 Feb;90:103092 [FREE Full text] [CrossRef] [Medline]
  104. Dong G, Duan L, Nummenmaa J, Zhang P. Feature generation and feature engineering for sequences. In: Dong G, Liu H, editors. Feature Engineering for Machine Learning and Data Analytics. Boca Raton, FL: CRC Press; 2018:145-166.
  105. Puranik S, Forno E, Bush A, Celedón JC. Predicting severe asthma exacerbations in children. Am J Respir Crit Care Med 2017 Dec 01;195(7):854-859 [FREE Full text] [CrossRef] [Medline]
  106. Buelo A, McLean S, Julious S, Flores-Kim J, Bush A, Henderson J, ARC Group. At-risk children with asthma (ARC): a systematic review. Thorax 2018 Dec;73(9):813-824 [FREE Full text] [CrossRef] [Medline]
  107. Greenberg S. Asthma exacerbations: predisposing factors and prediction rules. Curr Opin Allergy Clin Immunol 2013 Jun;13(3):225-236. [CrossRef] [Medline]
  108. Fleming L. Asthma exacerbation prediction: recent insights. Curr Opin Allergy Clin Immunol 2018 Dec;18(2):117-123. [CrossRef] [Medline]
  109. Ledford DK, Lockey RF. Asthma and comorbidities. Curr Opin Allergy Clin Immunol 2013 Feb;13(1):78-86. [CrossRef] [Medline]
  110. Blakey JD, Price DB, Pizzichini E, Popov TA, Dimitrov BD, Postma DS, et al. Identifying risk of future asthma attacks using UK medical record data: a respiratory effectiveness group initiative. J Allergy Clin Immunol Pract 2017;5(4):1015-1024. [CrossRef] [Medline]
  111. Das LT, Abramson EL, Stone AE, Kondrich JE, Kern LM, Grinspan ZM. Predicting frequent emergency department visits among children with asthma using EHR data. Pediatr Pulmonol 2017 Jul;52(7):880-890. [CrossRef] [Medline]
  112. Bahadori K, FitzGerald JM. Risk factors of hospitalization and readmission of patients with COPD exacerbation - systematic review. Int J Chron Obstruct Pulmon Dis 2007;2(3):241-251 [FREE Full text] [Medline]
  113. Evans RS. Electronic health records: then, now, and in the future. Yearb Med Inform 2016 May 20;Suppl 1:48-61. [CrossRef] [Medline]
  114. Schatz M. Predictors of asthma control: what can we modify? Curr Opin Allergy Clin Immunol 2012 Jun;12(3):263-268. [CrossRef] [Medline]
  115. Dick S, Doust E, Cowie H, Ayres JG, Turner S. Associations between environmental exposures and asthma control and exacerbations in young children: a systematic review. BMJ Open 2014;4(2):e003827 [FREE Full text] [CrossRef] [Medline]
  116. Li J, Sun S, Tang R, Qiu H, Huang Q, Mason TG, et al. Major air pollutants and risk of COPD exacerbations: a systematic review and meta-analysis. Int J Chron Obstruct Pulmon Dis 2016;11:3079-3091 [FREE Full text] [CrossRef] [Medline]
  117. Hansel NN, McCormack MC, Kim V. The effects of air pollution and temperature on COPD. COPD 2016 Dec;13(3):372-379 [FREE Full text] [CrossRef] [Medline]
  118. Teach RL, Shortliffe EH. An analysis of physician attitudes regarding computer-based clinical consultation systems. Comput Biomed Res 1981 Dec;14(6):542-558. [CrossRef] [Medline]
  119. Ye LR, Johnson PE. The impact of explanation facilities on user acceptance of expert systems advice. MIS Q 1995 Jun;19(2):157-172. [CrossRef]
  120. Biran O, McKeown KR. Human-centric justification of machine learning predictions. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. 2017 Presented at: Twenty-Sixth International Joint Conference on Artificial Intelligence; August 19-25, 2017; Melbourne, Australia p. 1461-1467. [CrossRef]
  121. Kim B, Koyejo O, Khanna R. Examples are not enough, learn to criticize! Criticism for interpretability. In: Proceedings of 2016 Annual Conference on Neural Information Processing Systems. 2016 Presented at: NIPS'16; December 5-10, 2016; Barcelona, Spain p. 2280-2288   URL: https://papers.nips.cc/paper/2016/file/5680522b8e2bb01943234bce7bf84534-Paper.pdf
  122. Halamka JD. Early experiences with big data at an academic medical center. Health Aff (Millwood) 2014 Jul;33(7):1132-1138. [CrossRef] [Medline]
  123. Asadi H, Dowling R, Yan B, Mitchell P. Machine learning for outcome prediction of acute ischemic stroke post intra-arterial therapy. PLoS One 2014;9(2):e88225 [FREE Full text] [CrossRef] [Medline]
  124. Hale AT, Stonko DP, Brown A, Lim J, Voce DJ, Gannon SR, et al. Machine-learning analysis outperforms conventional statistical models and CT classification systems in predicting 6-month outcomes in pediatric patients sustaining traumatic brain injury. Neurosurg Focus 2018 Nov 01;45(5):E2. [CrossRef] [Medline]
  125. Bazoukis G, Stavrakis S, Zhou J, Bollepalli SC, Tse G, Zhang Q, et al. Machine learning versus conventional clinical methods in guiding management of heart failure patients-a systematic review. Heart Fail Rev 2021 Jan;26(1):23-34 [FREE Full text] [CrossRef] [Medline]
  126. Singal AG, Mukherjee A, Elmunzer BJ, Higgins PD, Lok AS, Zhu J, et al. Machine learning algorithms outperform conventional regression models in predicting development of hepatocellular carcinoma. Am J Gastroenterol 2013 Nov;108(11):1723-1730 [FREE Full text] [CrossRef] [Medline]
  127. Triantafyllidis AK, Tsanas A. Applications of machine learning in real-life digital health interventions: review of the literature. J Med Internet Res 2019 Apr 05;21(4):e12286 [FREE Full text] [CrossRef] [Medline]
  128. Duncan I. Healthcare Risk Adjustment and Predictive Modeling, 2nd Ed. Winsted, CT: ACTEX Publications Inc; 2018.
  129. Iezzoni LI. Risk Adjustment for Measuring Health Care Outcomes, 4th Ed. Chicago, IL: Health Administration Press; 2012.
  130. Luo G. A roadmap for designing a personalized search tool for individual healthcare providers. J Med Syst 2014 Feb;38(2):6. [CrossRef] [Medline]
  131. James BC, Savitz LA. How Intermountain trimmed health care costs through robust quality improvement efforts. Health Aff (Millwood) 2011 Jun;30(6):1185-1191 [FREE Full text] [CrossRef] [Medline]
  132. The Dartmouth Atlas of Health Care. 2021.   URL: http://www.dartmouthatlas.org/data/topic/topic.aspx?cat=21, [accessed 2021-04-27]
  133. Advancing physician performance measurement: using administrative data to assess physician quality and efficiency. Pacific Business Group on Health. 2005.   URL: http://www.pbgh.org/storage/documents/reports/PBGHP3Report_09-01-05final.pdf [accessed 2015-08-11] [WebCite Cache]
  134. Gifford E, Foster EM. Provider-level effects on psychiatric inpatient length of stay for youth with mental health and substance abuse disorders. Med Care 2008 Mar;46(3):240-246. [CrossRef] [Medline]
  135. Kramer TL, Daniels AS, Zieman GL, Williams C, Dewan NA. Psychiatric practice variations in the diagnosis and treatment of major depression. Psychiatr Serv 2000 Mar;51(3):336-340. [Medline]
  136. Miller GA. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 1956 Mar;63(2):81-97. [Medline]
  137. Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Comput Surv 2019 Jan 23;51(5):93. [CrossRef]
  138. Molnar C. Interpretable Machine Learning. Morrisville, NC: lulu.com; 2020.
  139. Luo G. Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction. Health Inf Sci Syst 2016;4:2 [FREE Full text] [CrossRef] [Medline]
  140. Luo G, Johnson MD, Nkoy FL, He S, Stone BL. Automatically explaining machine learning prediction results on asthma hospital visits in patients with asthma: secondary analysis. JMIR Med Inform 2020 Dec 31;8(12):e21965 [FREE Full text] [CrossRef] [Medline]
  141. Tong Y, Messinger AI, Luo G. Testing the generalizability of an automated method for explaining machine learning predictions on asthma patients' asthma hospital visits to an academic healthcare system. IEEE Access 2020;8:195971-195979 [FREE Full text] [CrossRef] [Medline]
  142. Luo G, Nau CL, Crawford WW, Schatz M, Zeiger RS, Koebnick C. Generalizability of an automatic explanation method for machine learning prediction results on asthma-related hospital visits in patients with asthma: quantitative analysis. J Med Internet Res 2021 Apr 15;23(4):e24153 [FREE Full text] [CrossRef] [Medline]
  143. Evans RS, Lloyd JF, Pierce LA. Clinical use of an enterprise data warehouse. AMIA Annu Symp Proc 2012;2012:189-198 [FREE Full text] [Medline]
  144. Koebnick C, Langer-Gould AM, Gould MK, Chao CR, Iyer RL, Smith N, et al. Sociodemographic characteristics of members of a large, integrated health care system: comparison with US Census Bureau data. Perm J 2012;16(3):37-41 [FREE Full text] [Medline]
  145. Air data: air quality data collected at outdoor monitors across the US. United States Environmental Protection Agency. 2021.   URL: https://www.epa.gov/outdoor-air-quality-data [accessed 2021-04-28]
  146. MesoWest homepage. 2021.   URL: https://mesowest.utah.edu [accessed 2021-04-28]
  147. Observational Health Data Sciences and Informatics data standardization homepage. 2021.   URL: https://www.ohdsi.org/data-standardization [accessed 2021-04-28]
  148. Observational Health Data Sciences and Informatics standardized vocabularies homepage. 2021.   URL: https://www.ohdsi.org/web/wiki/doku.php?id=documentation:vocabulary:sidebar [accessed 2021-04-28]
  149. Luo G. A roadmap for semi-automatically extracting predictive and clinically meaningful temporal features from medical data for predictive modeling. Glob Transit 2019;1:61-82 [FREE Full text] [CrossRef] [Medline]
  150. Clinical Classifications Software (CCS) for ICD-9-CM. Agency for Healthcare Research and Quality. 2017.   URL: https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp [accessed 2021-04-28]
  151. Clinical Classifications Software Refined (CCSR). Agency for Healthcare Research and Quality. 2021.   URL: https://www.hcup-us.ahrq.gov/toolssoftware/ccsr/ccs_refined.jsp [accessed 2021-04-28]
  152. BETOS 2.0 classification code assignments 2019. Urban Institute. 2019.   URL: https://datacatalog.urban.org/dataset/betos-20-classification-code-assignments-2019 [accessed 2021-04-28]
  153. Drug classification. ProVantage Health Systems Inc. 2021.   URL: https://reference.pivotrock.net/HealthCareTraining/Drugs/RXC.html [accessed 2021-04-28]
  154. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating, 2nd Ed. New York, NY: Springer; 2019.
  155. Pyle D. Data Preparation for Data Mining. San Francisco, CA: Morgan Kaufmann; 1999.
  156. Luo G, Stone BL, Johnson MD, Tarczy-Hornoch P, Wilcox AB, Mooney SD, et al. Automating construction of machine learning models with clinical big data: proposal rationale and methods. JMIR Res Protoc 2017 Aug 29;6(8):e175 [FREE Full text] [CrossRef] [Medline]
  157. Nathan RA, Sorkness CA, Kosinski M, Schatz M, Li JT, Marcus P, et al. Development of the Asthma Control Test: a survey for assessing asthma control. J Allergy Clin Immunol 2004 Jan;113(1):59-65. [CrossRef] [Medline]
  158. Bivand RS, Pebesma E, Gómez-Rubio V. Applied Spatial Data Analysis with R, 2nd Ed. New York, NY: Springer; 2013.
  159. Luo G, Stone BL, Fassl B, Maloney CG, Gesteland PH, Yerram SR, et al. Predicting asthma control deterioration in children. BMC Med Inform Decis Mak 2015 Oct 14;15(1):84 [FREE Full text] [CrossRef] [Medline]
  160. Tolbert PE, Mulholland JA, MacIntosh DL, Xu F, Daniels D, Devine OJ, et al. Air quality and pediatric emergency room visits for asthma in Atlanta, Georgia, USA. Am J Epidemiol 2000 Apr 15;151(8):798-810. [CrossRef] [Medline]
  161. Leibel S, Nguyen M, Brick W, Parker J, Ilango S, Aguilera R, et al. Increase in pediatric respiratory visits associated with Santa Ana wind-driven wildfire smoke and PM 2.5 levels in San Diego county. Ann Am Thorac Soc 2020 Mar;17(3):313-320. [CrossRef] [Medline]
  162. Desai JR, Wu P, Nichols GA, Lieu TA, O'Connor PJ. Diabetes and asthma case identification, validation, and representativeness when using electronic health data to construct registries for comparative effectiveness and epidemiologic research. Med Care 2012 Jul;50 Suppl:30-35. [CrossRef] [Medline]
  163. Wakefield DB, Cloutier MM. Modifications to HEDIS and CSTE algorithms improve case recognition of pediatric asthma. Pediatr Pulmonol 2006 Oct;41(10):962-971. [CrossRef] [Medline]
  164. Tong Y, Liao ZC, Tarczy-Hornoch P, Luo G. Evaluating the performance stability of a constraint-based method to pinpoint patients apt to obtain care mostly within a given healthcare system: secondary analysis. 2021.   URL: http://pages.cs.wisc.edu/~gangluo/identify_chronic_disease_patients.pdf [accessed 2021-04-28]
  165. Lindenauer PK, Grosso LM, Wang C, Wang Y, Krishnan JA, Lee TA, et al. Development, validation, and results of a risk-standardized measure of hospital 30-day mortality for patients with exacerbation of chronic obstructive pulmonary disease. J Hosp Med 2013 Aug;8(8):428-435. [CrossRef] [Medline]
  166. NQF #1891 Hospital 30-day, all-cause, risk-standardized readmission rate (RSRR) following chronic obstructive pulmonary disease (COPD) hospitalization. National Quality Forum. 2012.   URL: http:/​/www.​qualityforum.org/​Projects/​n-r/​Pulmonary_Endorsement_Maintenance/​1891_30_Day_RSRR_COPD.​aspx [accessed 2021-04-28]
  167. Nguyen HQ, Chu L, Amy Liu IL, Lee JS, Suh D, Korotzer B, et al. Associations between physical activity and 30-day readmission risk in chronic obstructive pulmonary disease. Ann Am Thorac Soc 2014 Jun;11(5):695-705. [CrossRef] [Medline]
  168. Cooke CR, Joo MJ, Anderson SM, Lee TA, Udris EM, Johnson E, et al. The validity of using ICD-9 codes and pharmacy records to identify patients with chronic obstructive pulmonary disease. BMC Health Serv Res 2011 Feb 16;11:37 [FREE Full text] [CrossRef] [Medline]
  169. Witten IH, Frank E, Hall MA, Pal CJ. Data Mining: Practical Machine Learning Tools and Techniques, 4th Ed. Burlington, MA: Morgan Kaufmann; 2016.
  170. Zeng X, Luo G. Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection. Health Inf Sci Syst 2017 Dec;5(1):2 [FREE Full text] [CrossRef] [Medline]
  171. Luo G, Tarczy-Hornoch P, Wilcox AB, Lee ES. Identifying patients who are likely to receive most of their care from a specific health care system: demonstration via secondary analysis. JMIR Med Inform 2018 Nov 05;6(4):e12241 [FREE Full text] [CrossRef] [Medline]
  172. Obuchowski NA, McClish DK. Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. Stat Med 1997 Jul 15;16(13):1529-1542. [Medline]
  173. Asthma action plans. Centers for Disease Control and Prevention. 2020.   URL: https://www.cdc.gov/asthma/actionplan.html [accessed 2021-05-08]
  174. Luo G, Sward K. A roadmap for optimizing asthma care management via computational approaches. JMIR Med Inform 2017 Sep 26;5(3):e32 [FREE Full text] [CrossRef] [Medline]
  175. Thornton C, Hutter F, Hoos H, Leyton-Brown K. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013 Presented at: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; August 11-14, 2013; Chicago, IL p. 847-855. [CrossRef]
  176. Patton MQ. Qualitative Research & Evaluation Methods: Integrating Theory and Practice, 4th Ed. Thousand Oaks, CA: SAGE Publications; 2014.
  177. Davis FD, Venkatesh V. Toward preprototype user acceptance testing of new information systems: implications for software project management. IEEE Trans Eng Manage 2004 Feb;51(1):31-46. [CrossRef]
  178. Davis FD. User acceptance of information technology: system characteristics, user perceptions and behavioral impacts. Int J Man Mach Stud 1993 Mar;38(3):475-487. [CrossRef]
  179. Davis FD. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q 1989 Sep;13(3):319-340. [CrossRef]
  180. Thomas DR. A general inductive approach for analyzing qualitative evaluation data. Am J Eval 2016 Jun 30;27(2):237-246. [CrossRef]
  181. ATLAS.ti qualitative data analysis software. 2021.   URL: http://www.atlasti.com/index.html [accessed 2021-04-28]
  182. Duncan I. Managing and Evaluating Healthcare Intervention Programs, 2nd Ed. Winsted, CT: ACTEX Publications; 2014.
  183. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 2015;216:574-578 [FREE Full text] [Medline]
  184. Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc 2012;19(1):54-60 [FREE Full text] [CrossRef] [Medline]
  185. Chariatte V, Berchtold A, Akré C, Michaud PA, Suris JC. Missed appointments in an outpatient clinic for adolescents, an approach to predict the risk of missing. J Adolesc Health 2008 Jul;43(1):38-45. [CrossRef] [Medline]
  186. Luo G, Stone BL, Sakaguchi F, Sheng X, Murtaugh MA. Using computational approaches to improve risk-stratified patient management: rationale and methods. JMIR Res Protoc 2015;4(4):e128 [FREE Full text] [CrossRef] [Medline]
  187. Kumamaru H, Lee MP, Choudhry NK, Dong YH, Krumme AA, Khan N, et al. Using previous medication adherence to predict future adherence. J Manag Care Spec Pharm 2018 Nov;24(11):1146-1155. [CrossRef] [Medline]
  188. Gupta P, Malhotra P, Vig L, Shroff G. Transfer learning for clinical time series analysis using recurrent neural networks. In: Proceedings of the KDD Workshop on Machine Learning for Medicine and Healthcare. 2018 Presented at: KDD Workshop on Machine Learning for Medicine and Healthcare; August 20, 2018; London, United Kingdom p. 1-4   URL: https://arxiv.org/abs/1807.01705
  189. Andrade C. The P value and statistical significance: misunderstandings, explanations, challenges, and alternatives. Indian J Psychol Med 2019;41(3):210-215 [FREE Full text] [CrossRef] [Medline]
  190. Leo GD, Sardanelli F. Statistical significance: P value, 0.05 threshold, and applications to radiomics-reasons for a conservative approach. Eur Radiol Exp 2020 Mar 11;4(1):18 [FREE Full text] [CrossRef] [Medline]
  191. Amrhein V, Korner-Nievergelt F, Roth T. The earth is flat (P>0.05): significance thresholds and the crisis of unreplicable research. PeerJ 2017;5:e3544 [FREE Full text] [CrossRef] [Medline]


AUC: area under the receiver operating characteristic curve
COPD: chronic obstructive pulmonary disease
ICD-9: International Classification of Diseases, Ninth Revision
ICD-10: International Classification of Diseases, Tenth Revision
IDM: integrated disease management
IH: Intermountain Healthcare
KPSC: Kaiser Permanente Southern California
LSTM: long short-term memory
OMOP: Observational Medical Outcomes Partnership
SSH: Secure Shell
UWM: University of Washington Medicine


Edited by G Eysenbach; submitted 16.01.21; peer-reviewed by A Rovetta, H Tibble; comments to author 05.04.21; revised version received 12.04.21; accepted 19.04.21; published 18.05.21

Copyright

©Gang Luo, Bryan L Stone, Xiaoming Sheng, Shan He, Corinna Koebnick, Flory L Nkoy. Originally published in JMIR Research Protocols (https://www.researchprotocols.org), 18.05.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on https://www.researchprotocols.org, as well as this copyright and license information must be included.