LogXact Examples

Insurance Fraud
Osteogenic Sarcoma
Repeated Measures Data: Cross-over Clinical Trial

Insurance Fraud

127 insurance claims were examined by a team of adjusters and judged to be either fraudulent or legitimate. Of interest is the relationship between fraud and 3 broad groups of covariates: Accident (AC1, AC9, AC 16), Claimant (CL7, CL11) and Injury (IJ2, IJ3, IJ4, IJ6, IJ12).

A “1” indicates the claim had that particular characteristic, a “0” indicates the claim did not have that particular characteristic. (The data are available in LogXact .cyl format and ASCII .dat format.)We thank the Automobile Insurance Bureau of Massachusetts for permission to use these data.Challenge: Try fitting a logistic regression model to the data with all ten covariates included.Fraudulent Insurance Claims and their Relationship to Covariates of Interest

Fraud/Total (%fraud) AC1 AC9 AC16 CL7 CL11 IJ2 IJ3 IJ4 IJ6 IJ12
0/22 0% 0 0 0 0 0 0 0 0 0 0
0/1 0% 0 0 0 0 0 0 0 0 0 1
0/4 0% 0 0 0 0 0 0 0 0 1 0
0/2 0% 0 0 0 0 0 0 0 1 0 0
0/3 0% 0 0 0 0 0 0 1 0 0 0
0/10 0% 0 0 0 0 0 1 0 0 0 0
0/2 0% 0 0 0 0 0 1 0 0 1 0
0/4 0% 0 0 0 0 0 1 1 0 0 0
1/1 100% 0 0 0 0 0 1 1 0 0 1
1/4 25% 0 0 0 0 0 1 1 0 1 0
1/1 100% 0 0 0 0 1 0 0 0 0 1
0/1 0% 0 0 0 0 1 1 1 0 1 0
0/8 0% 0 0 0 1 0 0 0 0 0 0
0/1 0% 0 0 0 1 0 0 0 1 1 0
0/3 0% 0 0 0 1 0 1 0 0 0 0
0/1 0% 0 0 0 1 0 1 0 0 1 0
1/1 100% 0 0 0 1 0 1 1 1 0 0
0/1 0% 0 0 0 1 1 0 0 0 0 0
1/1 100% 0 0 0 1 1 1 0 0 0 0
1/1 100% 0 0 0 1 1 1 1 1 1 0
0/1 0% 0 0 1 0 0 0 0 0 0 0
1/1 100% 0 0 1 0 0 1 0 0 1 0
1/1 100% 0 0 1 0 1 1 0 0 0 0
0/1 0% 0 1 0 0 1 0 1 0 0 0
0/10 0% 1 0 0 0 0 0 0 0 0 0
0/2 0% 1 0 0 0 0 0 0 0 1 0
0/1 0% 1 0 0 0 0 0 0 0 1 1
0/8 0% 1 0 0 0 0 1 0 0 0 0
1/7 14% 1 0 0 0 0 1 0 0 1 0
0/1 0% 1 0 0 0 1 0 0 0 0 0
1/6 17% 1 0 0 0 1 1 0 0 0 0
0/1 0% 1 0 0 0 1 1 0 0 0 1
0/3 0% 1 0 0 0 1 1 0 0 1 0
0/3 0% 1 0 0 1 0 0 0 0 0 0
0/1 0% 1 0 0 1 0 0 0 0 1 0
0/1 0% 1 0 0 1 0 0 0 1 0 0
0/2 0% 1 0 0 1 0 1 0 0 1 0
0/1 0% 1 0 0 1 1 0 0 0 1 0
1/1 100% 1 0 0 1 1 1 0 0 0 0
1/1 100% 1 0 0 1 1 1 0 0 0 1
0/1 0% 1 1 0 0 0 0 0 0 0 0
1/1 100% 1 1 0 0 0 1 0 0 1 0

Example of how to read this table: The last line tells us that there was one fraudulent (and no legitimate -- i.e one in total) claim with two accident characteristics (AC1 and AC9) and two injury characteristics ( IJ2 and IJ6)

Solution | Download LogXact Data | Download ASCII Data


Top

Osteogenic Sarcoma Study

In a 46-patient study of non-metastatic osteogenic sarcoma, (Goorin et al., 1987), the investigators were interested in determining the predictors for a three year disease free interval (DFI3). The covariates of interest were SEX, any osteoid pathology (AOP), and lymphocytic infiltration (LI). The complete data set is displayed below. The FREQ (frequency count) variable shows the number of copies of each row in the data set.

 DFI3  LI  SEX  AOP  FREQ
1 0 0 0 3
1 0 0 1 2
1 0 1 0 4
1 0 1 1 1
1 1 0 0 5
1 1 0 1 3
1 1 1 0 5
1 1 1 1 6
0 1 0 1 2
0 1 1 0 4
0 1 1 1 11

Individually each of the covariates, LI, SEX, and AOP, has a statistically significant effect on DFI3. For example the p-value for LI is 0.0075 from Fisher's exact test. But try to fit the logistic regression model DFI3=SEX+AOP+LI by maximum likelihood, using a standard package like SAS, BMDP or GLIM. You will get no convergence. Although maximum likelihood fails, exact inference works. The exact option in LogXact is able to fit the above model and reveals that after adjusting for SEX and AOP, LI has an exact p-value of 0.074.


Top

Cross-over Clinical Trial

The data below are taken from a three-treatment, three-period cross-over clinical trial. The three drugs are A=New Drug, B=Aspirin, C=Placebo. The primary end-point was analgesic efficacy, here dichotomized as 0 for relief and 1 for no-relief, during periods P1, P2, and P3 respectively. See Snapinn and Small (1986) for details.


 Patient Drug Sequence  Response
    P1  P2  P3 
 1 ABC  1
 7 ABC 1 1
 2 BCA 0 1 1
 8 BCA 0 0 0
 3 CAB 1 0 0
 9 CAB 1 0 1
 4 CBA  1 0 1
 10 CBA  1 0 0
 5 ACB   0 0 0
 11 ACB  0 1 0
 6 BAC 1 0 0
 12 BAC 0 0 1

A logistic regression model of the form:

RESPONSE = DRUG + PERIOD

in which DRUG and PERIOD are each treated as 2 degree of freedom factor variables (i.e., DRUG consists of the two indicator variables A versus C and B versus C, while PERIOD consists of the two indicator variables P1 versus P3 and P2 versus P3) is appropriate for modelling the response. However the analysis must take into account the fact that each patient provides three binary responses. In other words, this is a repeated measures data set. The usual large-sample techniques for handling repeated measures data are unreliable here since the data set is rather small. We solve the problem by treating each patient as a separate stratum or matched-set, and using the stratified logistic regression option of LogXact. The question of interest is whether the three treatments are different. We answer this question by performing a 2 degree of freedom test on DRUG in the above stratified logistic regression model. Both exact and asymptotic tests were performed and the results are tabulated below:

 Type of Test  Chi Squared Value  P-value
 Likelihood Ratio 8.74 .013
 Bivariate Wald 5.09 .079
 Unconditional Scores 7.8 .020
 Exact 7.06 .029

Notice that there are variations among the three asymptotic tests (Likelihood ratio, Wald, and Scores), even though all three tests are supposed to be equivalent asymptotically. This suggests that the asymptotics are performing poorly. What conclusion are we to draw about the treatment effect, when the Wald test is not statistically significant but the Likelihood Ratio and Scores tests are? The exact test reports a p-value of 0.029 and resolves the dilemma.


Top

Back to LogXact
Back to Cytel homepage