The Medicare Provider Analysis and Review File Is Made Up oftrue or False

Health Serv Res. 2004 Dec; 39(half-dozen Pt 1): 1733–1750.

An Algorithm for the Utilize of Medicare Claims Information to Identify Women with Incident Breast Cancer

Abstract

Objective

To develop and validate a clinically informed algorithm that uses solely Medicare claims to identify, with a high positive predictive value, incident breast cancer cases.

Data Source

Population-based Surveillance, Epidemiology, and End Results (SEER) Tumor Registry information linked to Medicare claims, and Medicare claims from a 5 percent random sample of beneficiaries in SEER areas.

Study Pattern

An algorithm was developed using claims from 1995 breast cancer patients from the SEER-Medicare database, also equally 1995 claims from Medicare control subjects. The algorithm was validated on claims from breast cancer subjects and controls from 1994. The algorithm development procedure used both clinical insight and logistic regression methods.

Data Extraction

Grooming set: Claims from seven,700 SEER-Medicare breast cancer subjects diagnosed in 1995, and 124,884 controls. Validation set: Claims from 7,607 SEER-Medicare chest cancer subjects diagnosed in 1994, and 120,317 controls.

Principal Findings

A iv-footstep prediction algorithm was developed and validated. It has a positive predictive value of 89 to 93 percent, and a sensitivity of lxxx percent for identifying incident chest cancer. The sensitivity is 82–87 percentage for stage I or II, and lower for other stages. The sensitivity is 82–83 percent for women who underwent either breast-conserving surgery or mastectomy, and is similar across geographic sites. A cohort identified with this algorithm will have 89–93 per centum incident chest cancer cases, 1.v–6 percent cancer-free cases, and four–5 percent prevalent breast cancer cases.

Conclusions

This algorithm has ameliorate operation characteristics than previously proposed algorithms. The ability to examine national patterns of breast cancer intendance using Medicare claims data would open new avenues for the cess of quality of intendance.

Keywords: Breast tumour, incidence, sensitivity and specificity, registries, Medicare

The quality of cancer care in the The states is known to exist variable, and factors determining quality of cancer care have been insufficiently studied (Hewitt and Simone 1999). The development of methods for using existing databases to study the quality of cancer intendance would exist a major accelerate (Hewitt and Simone 2000). Methods to let the use of Medicare administrative databases to study cancer quality of care would be especially helpful because about 60 percent of persons diagnosed with cancer are aged 65 and older (Hewitt and Simone 2000), and the Medicare claims information represent a about population-based source of data.

With respect to breast cancer specifically, several challenges have been identified in the use of Medicare claims in studying the care provided. The use of inpatient Medicare claims to identify incident breast cancer cases offers first-class specificity just poor sensitivity considering xxx–40 percent of initial breast cancer operations are done on an outpatient basis (Warren et al. 1999; Warren et al. 1996). Inpatient records are also more than likely to place patients undergoing mastectomy for initial therapy than those undergoing chest-conserving surgery (Warren et al. 1996; Cooper et al. 2000). Compared to inpatient information lone, the utilize of combined inpatient, outpatient, and md claims increases sensitivity to eighty–90 pct (Freeman et al. 2000; Cooper et al. 1999), but decreases specificity (Warren et al. 1999; Freeman et al. 2000). Considering only a small percentage of the female Medicare population develops breast cancer in a given twelvemonth, even minor decreases in specificity lead to large decreases in the positive predictive value (PPV) (Freeman et al. 2000).

Our major goal in the evolution of this algorithm was to place a cohort of incident chest cancer patients, whose surgical, medical, and follow-up intendance could exist studied over fourth dimension. Inherent in this goal was a requirement for a loftier positive predictive value (PPV), ensuring that a high percentage of the accomplice was fabricated upwards of true chest cancer patients. The requirement for a high PPV was considered more important than the algorithm'due south sensitivity, particularly for the pocket-size percent (half-dozen–7 percentage) of women not undergoing initial surgical therapy. Still, we also considered important the consistency of the algorithm's sensitivity across subgroups defined by geographic location, age, and type of initial surgery undergone (chest-conserving surgery [BCS] or mastectomy.)

The prior work of the other investigators cited had fairly demonstrated that a relatively simple algorithm (generally consisting of the identification of a claim with a coincident chest cancer diagnosis and operative process) would non permit us to achieve our goal. Our strategy was to use an interaction of clinical rationale and statistical analysis in developing the iv-step algorithm presented herein.

Methods

Sources of Data

The key data source for this study was the linked SEER-Medicare database (SEER-Medicare Linked Database 2003). This database links data from the National Cancer Institute's Surveillance, Epidemiology, and Terminate Results (SEER) tumor registries and the Centers for Medicare and Medicaid Services (CMS) Medicare claims data. The population-based SEER registries cumulatively correspond well-nigh 14 per centum of the U.S. population, and include information on incident cancer patients, such as demographics, month and twelvemonth of diagnosis, extent of illness, and initial treatment undergone. The Medicare files required for this study include the Medicare Provider Analysis and Review (MEDPAR) file, which contains inpatient hospital claims; Outpatient file, which contains claims from institutional outpatient providers including infirmary ambulatory surgery centers; the Carrier Claims (previously known as Part B Physician/Supplier File), which contains inpatient and outpatient claims from noninstitutional providers such as physicians, likewise as stand-alone ambulatory surgical centers; and the Denominator file, which contains beneficiary demographic information and Medicare entitlement and enrollment information. Nearly 94 per centum of the SEER registry patients anile 65 and older were successfully linked with their Medicare claims (Potosky et al. 1993). An boosted data source was a 5 percent random sample of Medicare beneficiaries residing in the SEER geographic areas, including an indicator for whether the individual linked to the SEER database. When SEER subjects are removed from this sample, it represents almost a population-based random sample of cancer-free command subjects residing in SEER areas. This report was approved by the Medical Higher of Wisconsin Human Subjects Enquiry Review Committee.

Training and Validation Datasets

Training Set: Incident Breast Cancer Cases. A accomplice of women aged 65 or older at the time of diagnosis of breast cancer in 1995 (co-ordinate to SEER) was developed. Cases were excluded if the diagnosis was made only at dissection or by death certificate. Subjects were required to encounter the following criteria for the period from January 1995 to March 1996: eligibility for Medicare Parts A and B, non in a Medicare HMO, and to exist alive. Eligibility through the first quarter of 1996 was required to capture Medicare treatment information for patients who were diagnosed near the cease of 1995, but treated early in 1996. These criteria resulted in a cohort of 7,700 women, whose 1995 Medicare claims comprised the preparation set for incident chest cancer cases.

Training Set: Cancer-Costless Subjects. From the 5 pct random sample of Medicare beneficiaries who resided in SEER areas but who did not link to SEER registry for an incident cancer between 1973 and 1995, a cancer-free cohort was developed. These 71,752 women were required to come across the aforementioned eligibility criteria as the breast cancer cases with respect to Medicare eligibility and survival. The 1995 claims of these women comprised the "cancer-free" training prepare.

Grooming Set: Other Cancer Cases. Once more using the v percent random sample and the same eligibility criteria, a accomplice was constructed of iv,501 women who were diagnosed with a cancer other than breast cancer betwixt 1973 and 1995. The 1995 claims of these women comprised the "other cancer" training set up.

Prevalent Breast Cancer Cases. For the purposes of this article, we use the term prevalence to indicate cancer cases diagnosed prior to the index year and not including the incident cases diagnosed in the alphabetize twelvemonth. From the SEER-Medicare linked data, a accomplice was adult of 48,631 women who had breast cancer betwixt 1973 and 1994, co-ordinate to SEER, and who were alive and eligible for Medicare Parts A and B and not in an HMO during 1995. The 1995 claims for these women were not used to train the algorithm per se, but were used to assess the impact of prevalent cases on the algorithm's specificity.

Validation Sets. Using the same selection criteria equally described in a higher place for the year 1995, four analogous sets of claims were constructed for calendar year 1994. When evaluating a predictive algorithm, it is important that the validation set exist contained of the training set up. We defined the preparation sets to exist comprised of claims from 1995, while the validation sets were comprised of independent claims from 1994. Nosotros recognized the possibility that some of the individuals generating the claims for the 1995 preparation fix might also have generated claims for the 1994 validation set, peculiarly among the cancer-gratis and other cancer groups. In assessing the frequency of such overlap, though, we determined that only 5.5 percent of the individuals whose claims were part of the algorithm'due south training for steps 2 or 3 (described below under "Algorithm Development") likewise contributed claims to the validation gear up at those steps.

Algorithm Development

The algorithm was adult using the 1995 training sets. In amalgam the algorithm, consideration was given not merely to the presence or absence of breast cancer diagnosis or procedure codes, but also to other related codes (such as historical codes and radiation codes) that might improve the prediction of a case. In add-on, variables were evaluated indicating whether a lawmaking was in a chief or secondary position on a given claim, and how frequently the code occurred. The algorithm development endeavor involved an iteratively candy interaction between clinical insight and statistical assay. The codes actually used in the algorithm are summarized in Table 1.

Table one

Claims Codes Used every bit Possible Predictors of Chest Cancer

Diagnosis or Procedure	ICD-nine-CM Codes	CPT/HCPCS Codes (procedures only)
Breast cancer^*	174–174.9
Carcinoma-in-situ (breast)^*	233.0
History of breast cancer	V10.iii
Tumor in breast of uncertain nature	238.iii, 239.iii
Secondary cancer to breast	198.ii, 198.81
Other cancer	140–173.9
	175–195.8
	197–199.1 (excluding 198.2, 198.81)
	200–208.91
	230–234.9 (excluding 233.0, 232.five)
	235–239.9 (excluding 238.3, 239.3)
Biopsy^†	85.1–85.19	19000, 19001, 19100, 19101, 19110, 19112
Lumpectomy^† ^,§	85.20–85.21	19120, 19125, 19126
Partial mastectomy^† ^,§	85.22–85.23	19160, 19162
Lymph node autopsy^† ^,§	xl.3	38740, 38745, 38525
Mastectomy^† ^,§	85.33–85.48	19180–19255
Radiation therapy	92.two–92.29	77400–77499
		77520–77525
		77750–77799

A four-function algorithm was adult (Figure 1). The input to the algorithm consists of Medicare claims of all women aged 65 and older who were live and eligible for Medicare Office A and Part B in some index year, including claims for the following three months.

An external file that holds a picture, illustration, etc. Object name is hesr_00315_f1.jpg

Schematic Representation of a Iv-Footstep Algorithm for Identifying Incident Breast Cancer Cases from Medicare Claims

The initial cohort on which the algorithm operates is comprised of women aged 65 and older on January 1 of the index year, who are alive and eligible for Medicare parts A and B, and non in an HMO, from January ane of the index year through March 31 of the following twelvemonth. Farther details of each footstep are provided in the text.

Step 1. Referred to every bit the "screen," requires that a potential case have both a breast cancer diagnosis lawmaking and a chest cancer procedure code (not necessarily on the aforementioned claim) anywhere in the Medpar, Outpatient, or Carrier Claims records (meet Table 1). Only subjects satisfying this screening step are retained for further consideration.

Pace 2. Directly includes subjects with a high likelihood of being a case. To be classified as a case based on this footstep, the bailiwick must meet both of the following criteria:

A mastectomy claim in whatsoever file or a lumpectomy or partial mastectomy claim in any file followed past at to the lowest degree one Outpatient or Carrier claim for radiotherapy with a chest cancer diagnosis.
At to the lowest degree ii Outpatient or Carrier claims on different dates containing chest cancer as the primary diagnosis.

Subjects who pass step two are classified every bit possible incident cases, and proceed to step 4. Subjects who are non classified as cases at step 2 go to pace three.

Step three. This pace of the algorithm applies to all potential cases that passed the screen (step 1), but were non directly included at step ii. In practice, this pace differentiates principal breast cancer cases from women undergoing lumpectomy or fractional mastectomy for benign disease or for another cancer that had metastasized to the breast. Some such patients without primary breast cancer have claims with erroneous master chest cancer diagnosis codes, and therefore pass the step 1 screen. To develop step 3 of the algorithm, logistic regression methods were employed. A model was adult predicting an incident chest cancer example from virtually 50 indicator variables representing the presence of various billing codes (diagnostic or procedural) or combinations of codes that were thought clinically to have possible usefulness. Complete details regarding this model are omitted in the involvement of infinite, and because such details do non assist in assessing the operation of the final algorithm. The final parsimonious logistic model included merely the following iv dichotomous factors:

Surgery. This variable is positive (i.e., set to a value of 1 in the regression equation) if one or more lumpectomy, partial mastectomy, or mastectomy codes are establish in any file (see Table 1). Otherwise, the variable is negative (set to a value of zero).
Single Claim. This variable is positive (i.e., set to a value of i) if a adult female with lumpectomy or fractional mastectomy merits in any file had but one month in which a claim contained a primary breast cancer or a breast carcinoma-in-situ diagnosis. Otherwise, this variable is negative (i.e., set to 0).
Other Cancer. This variable is positive (i.eastward., set to i) if an "other cancer" code is found as a primary diagnosis in one or more than claims from any file. Otherwise, this variable is set to 0.
Secondary Cancer to Breast. This variable is positive (i.e., gear up to one) if a lawmaking for secondary cancer to breast is plant in i or more Outpatient or Carrier claims. Otherwise, this variable is set to 0.

Considering all the factors in the model are binary variables, it is not necessary for a user of the algorithm to use the regression equation to classify a instance equally positive or negative. Once the values of the four variables have been determined, subjects can be ruled in if they have one of three combinations of the variables. These combinations are (1) the "surgery" variable is positive and the other iii variables are negative, (2) the "surgery" variable is positive, the "other cancer" variable is positive, and the other two variables are negative, or (3) the "surgery" variable is positive, the "secondary cancer to breast" variable is positive, and the other 2 variables are negative. With all other combinations, the field of study is declared not to be a breast cancer case.

Step 4. This step of the algorithm is the pace to remove prevalent breast cancer cases. This step uses 3 prior years of claims of subjects classified as a example in footstep 2 or stride three. Such subjects are removed if they have a claim in prior years 1992–1994 (1991–1993 for the validation cohort) that was either positive for step 1 (the screening step) of the algorithm, or contained a diagnosis of prior history of breast cancer. Women younger than age 68 at diagnosis did non have three full years of claims for review, but as many years as were bachelor were used. This strategy results in the removal of near prior cases, but also a number of incident cases (Table 2, rows 6 and 7).

Table 2

Performance of Algorithm for Years 1994, 1995

	1994				1995

	Breast Cancer	Other Cancer	Cancer Free	Prior Breast Cancer	Breast Cancer	Other Cancer	Cancer Free	Prior Breast Cancer
Accomplice size	seven,607	iv,360	72,106	43,851	seven,700	four,501	71,752	48,631
Footstep one: Screen positive	seven,244	fifteen	74	one,777	vii,363	21	71	i,953
Step 2: Loftier likelihood	5,693	1	nineteen	564	5,837	one	17	624
Pace 2: Not high likelihood	one,551	14	55	1,213	1,526	20	54	1,329
Pace 3: Number positive	796	iii	7	615	830	4	9	618
Incident or prevalent cases (Positive later Step ii or three)	half dozen,489	iv	26	1,179	6,667	5	26	1,242
Footstep 4: Incident breast cancer cases	6,094	iii	20	287	half-dozen,180	three	24	318
Number correctly (incorrectly) classified, GS=SEER	half dozen,094	iv,357	72,086	43,564	half dozen,180	4,498	71,728	48,313
	(1,513)	(3)	(20)	(287)	(1,520)	(3)	(24)	(318)
% Correctly classified, GS=SEER	80.11	99.93	99.97	99.35	80.26	99.93	99.97	99.35
Number correctly (incorrectly) classified, GS=SEER+HL	6,094	4,358	72,101	43,564	half dozen,180	iv,499	71,744	48,313
	(1,513)	(2)	(5)	(287)	(ane,520)	(two)	(8)	(318)
% Correctly classified, GS=SEER+HL	80.11	99.95	99.99	99.35	80.26	99.96	99.99	99.35

Determination of Incident Chest Cancer Cases

The initial approach was to consider the "gold standard" for defining an incident chest cancer case to exist SEER. All the same, while conducting this study, information technology was adamant that subjects meeting the criteria from step 2 above had a very high likelihood of being a case according to SEER. A small number of cancer-free control subjects also met the footstep two high likelihood criteria. After transmission inspection of the claims for these subjects, the likelihood of these being incident breast cancer cases seemed extremely high (based on the pattern and number of claims with chest cancer diagnosis over time, radiotherapy claims, etc.). Therefore, in the results section two gold standards for defining an incident case of chest cancer are reported. The get-go is termed the "SEER" gold standard, which is defined solely past whether a bailiwick linked to the SEER registry in that twelvemonth as a chest cancer case. The second gilded standard is termed the "SEER plus High Likelihood" gold standard and consists of cases identified by a SEER registry equally well as control subjects identified past the two criteria of step 2 above. Our conventionalities is that these cases might be in the group of about 6 percent of SEER subjects who did not successfully link with Medicare files (Potosky et al. 1993) or they might be due to a patient moving into a SEER area shortly after chest cancer diagnosis.

Computation of PPV

The estimates of sensitivity and specificity were converted to an estimate of the positive predictive value (PPV) using Bayes Theorem, as

$P P V = \frac{π_{B} Pr (+ | B)}{π_{B} Pr (+ | B) + π_{O} Pr (+ | O) + π_{P B} Pr (+ | P B) + π_{N} Pr (+ | North)}$

where π_B, π_O, π_PB, π_N represent the incidence of breast cancer, incidence and prevalence of other cancers, prior chest cancer, and no cancer, respectively, in the written report population. Based on SEER data, these were estimated to be 0.005, 0.07, 0.03, and 0.895, respectively. Confidence intervals for the PPVs were estimated using Fieller's method (Fieller 1940; Steffens 1971).

Results

When applied to the training cohort, this algorithm had excellent specificity, and moderate sensitivity (Tables ii and 3). Of the initial breast cancer cohort, near v percent of the subjects were not detected by pace ane, about nine pct were non retained by step 3, and a further 6 percent had a prior year code for breast cancer diagnosis or history thereof at step iv. This left an overall sensitivity of eighty percentage. The specificity was excellent, at well over 99.9 percent.

Table iii

Positive Predictive Values (%) of an Algorithm for Using Medicare Claims to Identify Incident Breast Cancer

	1994 (Validation Yr)		1995 (Preparation Year)

	Gilded Standard SEER+HL	Gold Standard SEER	Gold Standard SEER+HL	Gold Standard SEER
Sensitivity	fourscore.xi	lxxx.eleven	eighty.26	lxxx.26
Specificity
Other cancer	99.95	99.93	99.96	99.93
Cancer-gratuitous	99.99	99.97	99.99	99.97
Prior chest cancer	99.35	99.35	99.35	99.35
Overall	99.97	99.95	99.97	99.95
Positive Predictive Value (PPV)	93.24	89.05	92.46	88.10
95% Conviction Interval for PPV	91.66–94.87	86.66–91.57	90.lxx–94.30	85.60–90.74
Algorithm Cohort Limerick
Incident breast cancer	93.24	89.05	92.46	88.10
Other cancer	0.75	i.07	0.72	1.02
Cancer-gratuitous	1.44	5.52	ii.30	6.57
Prior breast cancer	4.57	4.36	4.52	4.31

Algorithm Validation

The validation of this algorithm was carried out in the 1994 cohorts (Tables two and 3). The algorithm's performance was similar to the grooming year. The specificity of the algorithm remained well in a higher place 99.9 per centum. Using the stricter "SEER" gold standard, the PPV was 89 percent. Using the "SEER plus high likelihood" gold standard, the PPV was 93 percent.

Using the PPVs for the four cohorts, the expected composition of a cohort adult using this algorithm can be determined (Table 3). In the validation year, the vast majority of cases selected past the algorithm are incident breast cancer cases. About 1 percent are other cancer cases. Well-nigh 4–v percentage of cases selected by the algorithm are prevalent breast cancer cases. A substantial minority of the prevalent breast cancer cases was diagnosed according to SEER in the three months prior to the kickoff of the 1994 year. If one were willing to tolerate a three-month error in date of diagnosis (i.east., cases diagnosed co-ordinate to SEER in the last three months of 1993 are not counted confronting the algorithm's specificity for 1994), the per centum of prevalent cases in the 1994 algorithm accomplice would decrease from 4.half dozen percent to nearly 3 percent, and the percent considered incident cases would increase accordingly. With respect to the composition of the algorithm cohort, the pct of cancer-free patients in the validation year varies from 1.4 percent to 5.5 percent depending on which gold standard is used.

We performed a sensitivity analysis of the specificity proceeds associated with examining prior claims in step four for differing numbers of years. Of the 48,631 prior breast cancer cases with 1995 claims, merely 1,242 were positive later step 2 or 3 of the algorithm. Examining prior claims for one year back in step 4 would take removed 58.half-dozen per centum of those cases. Going back two, three, or four years, respectively, removed 69.three per centum, 74.4 pct, and 76.5 percentage of the one,242 cases. The specificity gain from applying step 4, however, is associated with a sensitivity loss (loss of index year true incident cases who met the criteria for removal at footstep 4). The percentage of true incident cases retained when applying stride 4 going back one, 2, three, or 4 years was 95.4 percent, 93.9 percent, 92.seven percentage, and 92.three percent respectively.

Algorithm Sensitivity by Patient Characteristics

The algorithm sensitivity by selected patient characteristics is presented in Tabular array 4. The algorithm sensitivity is lower for women with stage four and unknown phase illness at presentation, but at that place are relatively few such patients in whatsoever given year. Women are well represented upwardly to age 84, but at that place is a pass up in sensitivity for the women aged 85 and older. The sensitivity is consistent beyond the unlike SEER geographic regions. With respect to initial treatment, the algorithm fails to identify women who did not undergo initial surgery according to SEER, simply identifies as well women who underwent mastectomy and those who underwent BCS. Women who underwent lymph node dissection or radiotherapy according to SEER are somewhat overrepresented compared to those who did not.

Table 4

Sensitivity of the Algorithm for 1994, past Selected Patient Characteristics

Subgroup Categories^*	Number (%) Identified by Algorithm	Number in SEER Cohort	Odds Ratio^** with 95% Confidence Interval
Overall	6,094 (80.i)	7,607
Modified AJCC Stage
In situ	665 (72.viii%)	914	0.90 (0.81, one.00)
I	ii,818 (81.nine%)	iii,441	1.04 (0.97, ane.11)
IIa	i,299 (85.3%)	i,522	1.08 (1.00, 1.xviii)
IIb	506 (89.6%)	565	1.xiii (1.00, 1.28)
2, NOS	52 (89.7%)	58	i.12 (0.77, 1.63)
IIIa	169 (88.0%)	192	1.10 (0.89, ane.36)
IIIb	150 (73.9%)	203	0.92 (0.74, 1.14)
IV	89 (51.iv%)	173	0.64 (0.49, 0.82)
Unknown	346 (64.2%)	539	0.79 (0.69, 0.91)
Age
65–74	3,379 (83.5%)	four,047	ane.09 (1.02, 1.17)
75–84	two,222 (79.8%)	2,784	0.99 (0.93, ane.07)
85+	493 (63.5%)	776	0.77 (0.69, 0.87)
SEER Area
Atlanta	350 (78.7%)	445	0.98 (0.85, one.13)
Connecticut	945 (82.5%)	1,146	ane.03 (0.94, 1.xiv)
Detroit	1,044 (86.7%)	1,211	i.09 (1.00, one.xx)
Hawaii	132 (84.6%)	156	ane.06 (0.84, 1.34)
Iowa	817 (78.0%)	ane,047	0.97 (0.88, 1.07)
Los Angeles Canton	880 (81.half dozen%)	1,078	1.02 (0.93, 1.thirteen)
New Mexico	214 (80.1%)	267	1.00 (0.83, 1.20)
San Francisco/Oakland	515 (76.three%)	675	0.95 (0.84, 1.07)
San Jose-Monterey	266 (78.0%)	341	0.97 (0.83, 1.fifteen)
Seattle-Puget Sound	675 (74.4%)	907	0.92 (0.83, 1.02)
Utah	256 (76.6%)	334	0.95 (0.81, 1.13)
Surgery
None	83 (27.vii%)	300	0.34 (0.26, 0.43)
Lumpectomy/partial mastectomy	two,746 (83.0%)	3,310	1.06 (0.99, 1.xiv)
Mastectomy	3,264 (81.8%)	3,990	1.05 (0.98, 1.12)
Unknown	1 (fourteen.iii%)	7	0.18 (0.02, ane.45)
Lymph Node Dissection
Yeah	4,393 (85.3%)	5,153	1.23 (1.14, i.32)
No	1,701 (69.5%)	2,447	0.82 (0.76, 0.88)
Unknown	0 (0.0%)	7	0.00
Radiotherapy
Yes	2,101 (90.2%)	2,329	i.nineteen (1.xi, 1.28)
No	3,930 (75.6%)	v,198	0.84 (0.78, 0.90)
Unknown	63 (78.viii%)	lxxx	0.98 (0.71, 1.37)

Discussion

We propose a four-step algorithm for the utilize of Medicare claims data to identify women with surgically treated incident chest cancer. This algorithm has a sensitivity of about 80 percent overall, with a sensitivity of 82–87 percent for stages 1 and 2 illness. The algorithm has a specificity above 99.nine percent, and a positive predictive value of 89 percent, using a SEER gold standard. The PPV is greater than 93 percent based on the SEER plus High Likelihood gold standard.

The algorithm development procedure described herein illustrates several major bug with respect to the utilize of Medicare claims to identify breast cancer cases. 1 is the relationship of specificity to positive predictive value. Because only a minority of women, even in the Medicare age group, develop breast cancer in a given year, an exceedingly loftier specificity (>99.9 percent) is necessary to have a positive predictive value of 90 percent. The dramatic decline in PPV that occurs with only small decreases in specificity can exist seen past comparing the results of this algorithm with prior proposed algorithms (Table 5). Given that the procedures used to care for breast cancer may also be used to identify or treat benign breast disease, and given occasional inaccuracies in the use of a breast cancer diagnosis, it is challenging to attain the necessary level of specificity.

Table 5

Comparison of Algorithms Using Medicare Claims to Identify Breast Cancer Subjects in Tumor Registries

Algorithm	Sensitivity (%)	Specificity (%)	PPV (%)	Comments
McClish et al. 1997	83.0			Only sensitivity was assessed.
Cooper et al. 1999	82.0			Only sensitivity was assessed.
Warren et al. 1999	76.ii	99.3	36.iii	Inpatient+doc claims.
Warren et al. 1999	57.0	99.9	91.3	Inpatient claims only.
Freeman et al. 2000	ninety.0	99.86	lxx.0	Inpatient, outpatient, and dr. claims. PPV 67% if prevalent claims included.
Electric current study	80.1	99.95	89.0	Inpatient, outpatient, and doctor claims. Aureate standard is SEER.
Current study	80.1	99.97	93.2	Inpatient, outpatient, and physician claims. Gold standard is SEER + High Likelihood.

A major goal of this algorithm was to maintain a high specificity while including cases treated in the ambulatory surgical setting. This algorithm achieves a PPV like to that reported by Warren and colleagues (Warren et al. 1999) for inpatient claims, while providing improved sensitivity (Tabular array 5). Although the sensitivity is not every bit high as that reported by Freeman and colleagues (Freeman et al. 2000), the PPV is much higher.

Another major issue with this and prior algorithms is the presence of prevalent cases. Because women with breast cancer often live for many years, the number of prevalent cases in a dataset profoundly exceeds the number of incident cases. Women with prevalent disease undergo at times the same breast procedures as women with initial disease to diagnose or rule out recurrent or new breast disease, and too may carry diagnostic codes of primary chest cancer for years after initial disease. Since local disease recurrences occur most frequently within the kickoff few years after diagnosis, our approach was to assume that algorithm-identified cases with a history of chest cancer within the prior three years had recurrent disease. This led to a decrease in sensitivity from most 85 percent to eighty percent, but maintained the loftier specificity of the algorithm.

In attempting to maximize the PPV of the algorithm, we accepted a moderate sensitivity of nigh 80 percent. Therefore, this algorithm may have express utility for determining breast cancer incidence. The key uses for this algorithm are probable to be for aspects of care not well captured by SEER or other state tumor registries. The study of survivor care, for example, studies of mammography (Schapira, McAuliffe, and Nattinger 2000) or other health intendance utilization and physician intendance (Nattinger et al. 2002) among survivors, is well suited to claims assay. Patterns of care studies with respect to geographic variation and rural areas not well represented by SEER announced feasible given the consistency of the algorithm in different geographic areas. Studies might examine pre-morbid treat older chest cancer patients, such as employ of mammography or other preventive intendance interventions. Although some of the studies mentioned could be performed using the limited number of available linked tumor registry–Medicare databases, the need for greater geographic representation or larger sample sizes might favor the employ of Medicare-derived samples. Given that almost half of all breast cancer cases occur in women aged 65 and older, the algorithm could be practical to 100 percent state Medicare databases for identifying providers with possible quality problems, such as depression levels of medical oncology consultation, poor follow-up care, and poor preventive intendance practices. An algorithm that is less than perfect may notwithstanding provide a valid assessment of patterns of intendance (Kahn et al. 1996).

A limitation we encountered is the fact that about 5 percentage of women identified by SEER as having an incident chest cancer, and who linked to the Medicare claims data, did not even pass the screening step. Based on Table 4 information technology appears that some of these women do not undergo initial surgical therapy. Perchance some women undergo surgery merely are covered past employer-based insurance, which pays for their care in preference to Medicare. In any event, this problem does crusade a limitation on the sensitivity that can exist achieved by the algorithm, even if steps 2, iii, and four could be further optimized. Another limitation is that women who underwent radiotherapy are somewhat overrepresented compared to those who did not, limiting the power to utilise this algorithm to study patterns of intendance for radiotherapy.

We are non able to state which of the ii "gold standards" represents a more than accurate definition of an incident breast cancer case. Although the SEER tumor registry program has excellent case ascertainment, all registry programs likely miss occasional cancer cases. In the case of this study, an incident cancer patient could as well have been classified as a cancer-free control discipline due to failure to link with the Medicare casher files, or due to moving into a SEER surface area presently after illness diagnosis. For these reasons, we developed and presented the "SEER plus High Likelihood" gold standard, which followed a conclusion rule created initially by manual inspection of the claims histories for certain command subjects who seemed to accept a high likelihood of having breast cancer. We were convinced that these subjects likely had incident breast cancer by the lack of prior claims suggesting prevalent disease, and by the multiple claims during the training year that consistently suggested an operation for breast cancer (surgical claims, pathology claims, anesthesiology claims, etc.). Since we did not take access to patient identifiers or charts, we could non ostend that these patients had breast cancer. However, Warren and colleagues (1999) take previously demonstrated that some cases identified past their algorithm using Medicare claims actually had breast cancer but failed to link to SEER when the linkage was conducted. In addition, the number of loftier-likelihood cases identified by our algorithm within the 5 pct control sample is very shut to the number that one would expect given a 94 percentage linkage rate between SEER and Medicare. For instance, in 1994, a 6 percent failure to link would translate into 456 unlinked breast cancer cases. We would expect five percentage of these (23 cases) to be institute in the 5 percentage control sample. Nosotros would farther expect the loftier-likelihood definition to identify 75 percent (17 cases). In fact, the high-likelihood definition did place 19 cases in the v percent command cohort that year (Table two), very close to the expected number.

As has been shown in a number of other disease areas, Medicare claims data offer unique advantages for cancer quality of care and wellness services research (Hewitt and Simone 1999; 2000; McNeil 2001). These data are essentially population-based, and minimize pick bias with respect to geographic region, urban versus rural location, and socioeconomic status. Each of these factors is an important predictor of cancer treatment, a fact that limits analyses of databases from more than restricted populations (Nattinger et al. 1992; Guadagnoli et al. 1998; Gilligan et al. 2002). The possibility of using Medicare data more widely to assess patterns of cancer practice and related outcomes offers a potential that is worthy of further exploration.

Footnotes

Grant back up from the Section of the Army (DAMD17-96-6262).

This report used the linked SEER-Medicare database. The interpretation and reporting of these data are the sole responsibility of the authors. The authors acknowledge the efforts of the Applied Enquiry Program, NCI; the Office of Research, Development and Information, CMS; Information Direction Services (IMS), Inc.; and the Surveillance, Epidemiology, and End Results (SEER) Program tumor registries in the creation of the SEER-Medicare database.

References

Cooper G S, Yuan Z, Stange 1000 C, Dennis L 1000, Amini S B, Rimm A A. "Agreement of Medicare and Tumor Registry Data for Assessment of Cancer-Related Handling." Medical Care. 2000;38(iv):411–21. [PubMed] [Google Scholar]
Cooper Chiliad S, Yuan Z, Stange K C, Dennis L K, Amini S B, Rimm A A. "The Sensitivity of Medicare Claims Information for Case Ascertainment of Six Mutual Cancers." Medical Care. 1999;37(5):436–44. [PubMed] [Google Scholar]
Fieller E C. "The Biological Standardization of Insulin." Journal of the Royal Statistical Society. 1940;7(supplement):1–64. [Google Scholar]
Freeman J, Zhang D, Freeman D, Goodwin J. "An Arroyo to Identifying Incident Breast Cancer Cases Using Medicare Claims Data." Journal of Clinical Epidemiology. 2000;53(half dozen):605–xiv. [PubMed] [Google Scholar]
Gilligan G A, Kneusel R T, Hoffmann R G, Greer A L, Nattinger A B. "Persistent Differences in Sociodemographic Determinants of Breast Conserving Treatment Despite Overall Increased Adoption." Medical Intendance. 2002;40(3):181–9. [PubMed] [Google Scholar]
Guadagnoli E, Shapiro C L, Weeks J C, Gurwitz J H, Borbas C, Soumerai S B. "The Quality of Treat Treatment of Early Stage Breast Carcinoma. Is It Consistent with National Guidelines?" Cancer. 1998;83(two):302–nine. [PubMed] [Google Scholar]
Hewitt M, Simone J V, editors. Ensuring Quality of Cancer Care. Washington, DC: National University Press; 1999. [Google Scholar]
Hewitt M, Simone J V, editors. Enhancing Data Systems to Improve Quality of Cancer Care. Washington, DC: National Academy Press; 2000. [Google Scholar]
Kahn L H, Blustein J, Arons R R, Yee R, Shea Due south. "The Validity of Infirmary Administrative Data in Monitoring Variations in Chest Cancer Surgery." American Journal of Public Health. 1996;86(two):243–5. [PMC free commodity] [PubMed] [Google Scholar]
McClish D K, Penberthy L, Whittemore 1000, Newschaffer C, Woolard D, Desch C E, Retchin S. "Ability of Medicare Claims Data and Cancer Registries to Identify Cancer Cases and Treatment." American Periodical of Epidemiology. 1997;145(3):227–33. [PubMed] [Google Scholar]
McNeil B J. "Shattuck Lecture: Hidden Barriers to Improvement in the Quality of Intendance." New England Journal of Medicine. 2001;345(22):1612–twenty. [PubMed] [Google Scholar]
Nattinger A B, Gottlieb M S, Veum J, Yahnke D, Goodwin J S. "Geographic Variation in the Utilize of Breast-Conserving Treatment for Chest Cancer." New England Journal of Medicine. 1992;326(17):1102–seven. [PubMed] [Google Scholar]
Nattinger A B, Schapira G Thou, Warren J L, Earle C C. "Methodologic Issues in the Use of Administrative Claims Data to Study Surveillance subsequently Cancer Treatment." Medical Care. 2002;forty(viii):4-69–74. [PubMed] [Google Scholar]
Potosky A L, Riley G F, Lubitz J D, Mentnech R Chiliad, Kessler L G. "Potential for Cancer Related Health Services Research Using a Linked Medicare-Tumor Registry Database." Medical Care. 1993;31(viii):732–48. [PubMed] [Google Scholar]
Schapira Yard G, McAuliffe T L, Nattinger A B. "Underutilization of Mammography in Older Breast Cancer Survivors." Medical Care. 2000;38(three):281–nine. [PubMed] [Google Scholar]
SEER-Medicare Linked Database. "National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) tumor registries and the Centers for Medicare and Medicaid Services (CMS) Medicare claims data" 2003. [accessed July 1, 2003]. Available at http://healthservices.cancer.gov.
Steffens F E. "On Confidence Sets for the Ratio of Two Normal Ways." South African Statistical Periodical. 1971;5(2):105–13. [Google Scholar]
Warren J L, Feuer E, Potosky A L, Riley G F, Lynch C F. "Use of Medicare Hospital and Physician Information to Assess Breast Cancer Incidence." Medical Care. 1999;37(five):445–half dozen. [PubMed] [Google Scholar]
Warren J L, Riley M F, McBean A M, Hakim R. "The Use of Medicare Information to Place Incident Breast Cancer Cases." Health Care Financing Review. 1996;18(1):237–46. [PMC free commodity] [PubMed] [Google Scholar]

Articles from Health Services Enquiry are provided hither courtesy of Health Research & Educational Trust

colbournewenbestaide.blogspot.com

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1361095/