Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Cross -cultural examination of mental health measures in dementia caregivers: Assessment of the Center for Epidemiologic Studies -Depression Scale and the Zarit Burden Inventory
(USC Thesis Other)
Cross -cultural examination of mental health measures in dementia caregivers: Assessment of the Center for Epidemiologic Studies -Depression Scale and the Zarit Burden Inventory
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CROSS-CULTURAL EXAMINATION OF MENTAL HEALTH MEASURES IN DEMENTIA CAREGIVERS: ASSESSMENT OF THE CENTER FOR EPIDEMIOLOGIC STUDIES - DEPRESSION SCALE AND THE ZARIT BURDEN INVENTORY by Crystal V. Flynn Longmire A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (GERONTOLOGY) December 2004 Copyright 2004 Crystal V. Flynn Longmire Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3155442 Copyright 2004 by Flynn Longmire, Crystal V. All rights reserved. INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. ® UMI UMI Microform 3155442 Copyright 2005 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. DEDICATION For my aunt, Miss Alma M. Flynn, my great example of a woman who loved education, worked hard for her own, and lived life her way. I miss you. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ACKNOWLEDGMENTS To God, who brought me to the opportunity at USC and who brought me through it, many thanks, praise and honor. To Drs. Merril Silverstein and Stan Huey, Jr. my committee who were especially helpful in this process, thank you for being flexible in your schedules to accommodate meeting times, for answering questions, and for supplying great feedback. I am especially grateful to my committee chair, Dr. Bob G. Knight, who was there for the good, the bad, the ugly, and finally the spectacular. You read many drafts, encouraged me, and helped me stay on task. You have been a great mentor through the years and I appreciate you. I look forward to our continued relationship as colleagues. I am very appreciative of the faculty of the University of Southern California Andrus Gerontology Center who taught and trained me well. I can confidently go out into the world knowing I am more than competent. Many thanks also to the Andrus staff who aided my efforts toward the completion of my degree. To all who were comrades in the battle with me, USC Gerontology students and friends, you have been there for me in so many ways. Your generous assistance, advice, and comfort have at times overwhelmed me. Thank you to everyone who worked on the Stress, Ethnicity, and iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Caregiving Study, as well as those who participated. I gained a lot of valuable experience working on that project. The Bob Knight Lab Groups - past and present - also receive my thanks as they were a strong base for me throughout my academic career at USC. To my family whose pride in my accomplishments inspired me to continue on, many a day. To V.E.F., Sr. and Slo-Mo, thank you for all your support and love along the way. To my mother, you have been the influence for most things in my life and my constant ally. Thank you for always believing in me, checking on me every hour that one long night and morning, listening to my thoughts, offering encouragement in times of doubt and comfort in times of panic, and the many, many prayers. You, God and I earned our Ph.D.! You have bragging rights! To my husband who lived with a divided wife and didn’t complain, I love you. We have already made it through many challenges in our short married life, and we are better now than at the start. My, how we have grown! You are quite a man and I am proud of you. Here’s to our wonderful life together! Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TABLE OF CONTENTS Dedication ii Acknowledgments iii List of Tables vi List of Figures vii Abstract x Chapter 1: Introduction 1 Chapter 2: Methods 44 Chapter 3: ZBl Analyses 72 Chapter 4: CES-D Analyses 106 Chapters: Discussion 121 Bibliography 151 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF TABLES Table 1. Correlations between items in Hebert model for Black sample. 55 Table 2. Correlations between items in Hebert model for White sample. 56 Table 3. Correlation matrix for items in the Knight model for the Black 56 caregiver sample. Table 4. Correlation matrix for items in the Knight model for the White 57 caregiver sample. Table 5. Correlations between items in CES-D model for Black sample. 58 Table 6. Correlations between items in CES-D model for White sample. 59 Table 7. Demographic variables of Caregivers and Dependent 61 Variables by race. Table 8. Factor weights for the Hebert model for the Black sample. 75 Table 9. Comparisons of goodness of fit of ZBI and CES-D models. 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF FIGURES Figure 1 a. Multivariate distribution of total scores for items in Hebert 49 et al. model for Black caregivers. Figure 1 b. Multivariate distribution of total scores for items in Hebert 50 et al. model for White caregivers. Figure 2a. Multivariate distribution of total scores for items in Knight 51 et al. model for Black caregivers. Figure 2b. Multivariate distribution of total scores for items in Knight 52 et al. model for White caregivers. Figure 3a. Multivariate distribution of total scores for items in CES-D 53 model for Black caregivers. Figure 3b. Multivariate distribution of total scores for items in CES-D 54 model for White caregivers. Figure 4. The Hebert ZBI Model- briefer 12-item, 2-factor version. 73 Figure 5. The single group analysis for the Black caregivers for the 74 Hebert model. Figure 6. The single group analysis for the White caregivers for the 77 Hebert model. Figure 7a. Unstandardized estimates for Black caregivers in two-group 79 analyses with no constraints per Hyp. 1. Figure 7b. Unstandardized estimates for White caregivers in two-group 80 analyses with no constraints per Hyp. 1. Figure 8a. Unstandardized estimates for Black caregivers in two-group 82 Figure 8b. Unstandardized estimates for White caregivers in two-group 83 analyses with factor loadings constrained per Hyp. 2. Figure 9a. Unstandardized estimates for Black caregivers in Hebert 86 two-group analyses with factor loadings and factor covariance constrained per Hyp. 3. vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 9b. Unstandardized estimates for White caregivers in Hebert 87 two-group analyses with factor loadings and factor covariance constrained per Hyp. 3. Figure 10. Knight ZBI model - briefer 14-item, 3-factor version. 91 Figure 11. The single group analysis for the Black caregivers for the 93 Knight model. Figure 12. The single group analysis for the White caregivers for the 94 Knight model. Figure 13a. Unstandardized estimates for Knight model with Black 97 caregivers in two-group analyses with no constraints per Hyp. 1. Figure 13b. Unstandardized estimates for Knight model with White 98 caregivers in two-group analyses with no constraints per Hyp. 1. Figure 14a. Unstandardized estimates for Knight model with Black 99 caregivers in two-group analyses with factor loadings constrained per Hyp. 2. Figure 14b. Unstandardized estimates for Knight model with White 100 caregivers in two-group analyses with factor loadings constrained per Hyp. 2. Figure 15a. Unstandardized estimates for Black caregivers in Knight 102 two-group analyses with factor loadings and factor covariance constrained per Hyp. 3. Figure 15b. Unstandardized estimates for White caregivers in Knight 103 two-group analyses with factor loadings and factor covariance constrained per Hyp. 3. Figure 16. CES-D model - 20 item, 4-factor model. 107 Figure 17. The single group analysis for the Black caregivers for the 108 Standard CES-D model. Figure 18. The single group analysis for the White caregivers for the 110 Standard CES-D model. Figure 19a. Unstandardized estimates for CES-D model with Black 112 Caregivers in two-group analyses with no constraints per Hyp. 1. viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 19b. Unstandardized estimates for CES-D model with White 113 caregivers in two-group analyses with no constraints per Hyp. 1. Figure 20a. Unstandardized estimates for the CES-D model with Black 115 caregivers in two-group analyses with factor loadings constrained per Hyp. 2. Figure 20b. Unstandardized estimates for the CES-D model with Black 116 Caregivers in two-group analyses with factor loadings constrained per Hyp. 2. Figure 21 a. Unstandardized estimates for Black caregivers in CES-D 118 two-group analyses with factor loadings and factor covariance constrained per Hyp. 3. Figure 21 b. Unstandardized estimates for White caregivers in CES-D 119 two-group analyses with factor loadings and factor covariance constrained per Hyp. 3. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ABSTRACT The factor structure and the metric invariance of the ZBI and the CES- D were examined. Two models for the ZBI and the standard four-factor model for the CES-D were analyzed in samples of Black (N=175) and White (N=225) caregivers of family members with dementia. The 3-factor, 14-item, ZBI model fit the data and was invariant across the Black and White caregivers, whereas the two-factor model considered did not fit and was not invariant. The standard 4-factor, 20-item CES-D model also fit and was invariant across both samples. These findings contribute to the literature on the factor structure of the ZBI, and provide new data on the invariance of the ZBI across two ethnic groups of caregivers. These findings also contribute to the literatures that confirm the standard CES-D four-factor structure, and that consider cultural invariance of the CES-D. By showing invariance of the CES-D and of one briefer version of the ZBI, this study provides support for the comparison of Black and White caregivers in studies including these two scales. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 1 Introduction Overview With the well-known increase in numbers of older adults, there is also an anticipated increase in ethnic and racial diversity. If current trends continue, and the population of older adults doubles by 2030, then the number of minority older adults will likely increase also. In fact, the minority elderly population is expected to increase by 226% from 1998 to 2030, compared to 79% for the non-Hispanic White older population (U.S. Bureau of the Census, 1998). One contributor to the increase in the older population is fewer deaths due to acute disease. People live longer, but often have chronic illness and compromised functionality instead. Many older adults with chronic illnesses or disabilities need caregivers, and older adults in the future will likely need caregivers as well. The effects of informal caregiving on family and friends have been important research topics for decades. Given the continuing growth in need for caregivers, such research is expected to continue, if not expand with new areas to explore, as needs and circumstances change over time. It has already been found that the responsibility of ongoing care can have negative effects on caregivers of older adults (Schulz, O’Brien, Bookwala, & Fleissner, 1995). Such help often includes personal care, and household responsibilities, and the caregiver's personal life, social life, and Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. employment are often affected by the caregiving experience. The idea that informal care of older adults with dementia and other chronic illnesses often places a major burden on family members and other caregivers has been recognized for over twenty years (Zarit, Reever, & Bach-Peterson, 1980). Another part of such caregiver research is differences in outcomes between ethnic groups. One of the most common types of comparisons has been between African American and Caucasian groups - also known as Black and White group differences. Many of these studies have included comparisons of outcomes related to psychological health. Researchers have examined what kinds of effects the stress of caregiving has on mental health and if these effects vary between Black caregivers and White caregivers. Psychological constructs studied include burden and depression among others. When burden has been examined for differences by race group, most often Black caregivers were found to report less burden than White caregivers (Lawton, Rajagopal, Brody, & Kleban, 1992), even when providing more care to more impaired older adults (Allen- Kelsey, 1998; Fredman, Daly, & Lazur, 1995). Some researchers have found significant differences between Black and White caregivers on depression (Haley et al., 1995; Miller, Campbell, Farran, Kaufman, & Davis, 1995) while others have not (Hinrichsen & Ramirez, 1992; Knight & McCallum, 1998). While there has been a mounting interest in the prevalence and presentation of psychological distress among ethnic minority 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. populations, there has been much less attention paid to research on cross- cultural differences in the experience and expression of depressive illnesses (Mui, Burnette, & Chen, 2001). Additionally, cultural influences on symptom reporting remain underexamined (Gallo, Cooper-Patrick, & Lesikar, 1998). The present study focuses on the psychometric properties of the Zarit Burden Inventory (ZBI; Zarit, Reever, & Bach-Peterson, 1980), and Center for Epidemiologic Studies-Depression Scale (CES-D; Radloff, 1977), the two of the most extensively used measures in research for caregiver burden and depressive symptomatology, respectively. Both have been included in local and national studies, across age groups and cross-culturally. Yet, there is still a lack of consistent findings, and there are findings that conflict. This may lead one to consider whether the scales measure different underlying constructs depending on race group. The crux of this examination was to look at whether these two measures operate similarly in Blacks and Whites. If so, comparisons can be made with confidence. If not, then differences between groups may actually be related to different constructs measured instead of group characteristics. Thus, invariance of the factor structures of the CES-D and the ZBI for Black and White caregivers was tested. Measurement equivalence in cross-cultural research When comparing groups, it is fundamental to the research that the measures employed assess the same concept of interest between comparison groups. Meaningful, valid results can be obtained only when 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. equivalent measures of concepts or variables are used. If not, findings of group differences may actually be artifacts of differences in measurement (Tran, 1997). For instance, if the pattern of responses to items is different between groups, then group differences in the total score of a multidimensional scale may not be valid. The mean scores of one group may be related to a certain set of items, while the scores of the other groups may be related to a different set (Liang, Tran, Krause, & Markides, 1989) Measurement equivalence (or invariance) in cross-cultural research requires that three conditions are met: Conceptual equivalence, metric equivalence, and structural equivalence (Hui & Triandis, 1985). Conceptual equivalence refers to the sameness of meaning of research concepts across comparison groups. Fidelity of translation is key. Once variance in responses that relate to the languages of the instrument is inconsequential, similarity is usually deemed sufficient for conceptual equivalence. However, it has been asserted that careful translation processes may address linguistic equivalence, it may not ensure comparable conceptual meaning of mental health constructs across cultural groups (Mui et al., 2001). Metric equivalence assumes conceptual equivalence and means that observed items have comparable relationships with their respective latent concepts or factors across groups. Metric invariance requires that the same pattern of factor loadings and factor correlations along with equivalent magnitude of these relationships across groups (Posner, Stewart, Marin, & 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Perez-Stable, 2001). Confirmatory factor analysis (CFA) tests the hypothesized factor structures and model invariance across groups. Lastly, structural equivalence assumes the other two equivalences and refers to similarities between the causal relationships of a construct of interest and its consequences in two or more comparative groups. Structural equivalence should be evaluated after the evaluations of conceptual and metric equivalence. One can use path analysis or structural equation modeling to assess structural equivalence. An important note is that a scale may have conceptual equivalence between two groups but not possess either metric or structural equivalence (Tran, 1997). Metric equivalence was the focus of this study. Conceptual equivalence was assumed since the samples were of Caucasians and African Americans and analyses did not include translation since both samples completed the scales in English. The objective here is to establish the relationships between the items of each measure for each group, a prerequisite of structural equivalence. Structural equivalence would be a topic of future research, if metric equivalence can be demonstrated. Zarit Burden Inventory One of the most commonly used measures of caregiver burden is the Zarit Burden Inventory (ZBI ; Zarit, Reever, & Bach-Peterson, 1980). Originally constructed with 29 items, the revised version seen most often in the literature has 22 items. Twenty-one of the items are supposed to 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. measure several dimensions of burden, and item 22 is a global measure of burden and is usually not included in factor analyses. Many researchers use the ZBI in caregiver studies, but few have examined its factor structure. The few that have done so, have often explored data for possible factor structures since the originators of the ZBI did not do an initial factor analysis to discern its factor structure. As such, the ZBI does not yet have one dominant factor structure, and certainly not one that has been established across race or age groups. One Factor Burden measures have been criticized for not capturing the multidimensional nature of caregiving stress. The ZBI was identified as one of these unidimensional measures of burden by several researchers (e.g., George & Gwyther, 1986; Novack & Guest, 1989). However, factor analysis of the ZBI was not done in these studies to support this claim. Instead, the ZBI was considered unidimensional because it was commonly used to calculate one total score. The preference of these researchers was to compare people on factors, as this was more telling of caregivers’ needs and caregiving impact. Novack and Guest (1989) wrote that some questions in their Caregiver Burden Inventory (CBI) were drawn from the ZBI. The CBI has twenty-four items, with five factors (Time-Dependence Burden, Developmental Burden, Physical Burden, Social Burden, and Emotional 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Burden). In four out of those five factors (all but Social Burden), there is at least one item that is the exact wording from the ZBI, but the ZBI question is phrased as a statement. In addition, in the Social Burden Factor there are items that are similar to the wording in the ZBI. This supports that the concept that there are items in the ZBI that could load on several factors. In a review of burden measures, Vitaliano, Young, and Russo (1991) observed that though distinct scores do not exist, the items of the ZBI reflect both objective burden (e.g., disruption of family life) and subjective burden (e.g., caregiver response to the situation). Further, recent studies have provided evidence that the ZBI does not fit a unidimensional model (Hebert, Bravo, & Preville, 2000; Knight, Fox, & Chou, 2000). It would seem that there are multiple factors in the ZBI. With greater understanding of its factor structure, caregivers can be compared on its factors too, as with the CBI or any other scale claiming multidimensionality. Two factor Whitlatch, Zarit, and von Eye (1991) examined the Zarit Burden Inventory in a sample of caregivers of family members with dementia. They proposed a two-factor structure for the ZBI. Their factors were called Personal Strain and Role Strain. They devised an 18-item version of the ZBI with 12 items for Personal Strain and 6 items for Role Strain. The Personal strain factor reflected how personally stressful the caregiving experience is, and Role Strain was to reflect the stress due to role conflict or overload. 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. There was no further information on the model fit or variance explained regarding this factor analysis. Their primary interest in this study was caregiver interventions, the factor analysis was just a part of the study. They did report the reliability coefficients (alphas). For Personal Strain the alpha was .80, and for Role Strain it was .81. Hebert, Bravo and Preville (2000) developed a two-factor, 12-item version of the ZBI. They tested several alternative models before arriving at their final model. The models were as follows: first, a model of 22 fully independent items; second, a 22 item model with 2 non-orthogonal factors; third, a 22 item model with 5 non-orthogonal factors; fourth, a 14 item model with 3 non-orthogonal factors; fifth, a 12 item model with 2 non-orthogonal factors. The first four models did not fit the data, and the fifth is the model they created. The first model was the base model against which all others were compared. They credited the second model to Whitlatch et al. (1991), but the Whitlatch et al. model had only 18 items, while the one in the Hebert et al. study had 22. As for the other four additional items, they do not say on which factor(s) they were placed for their analysis. The fourth model was from Knight et al. (2000; which was in press at the time). Their final result was that a version of the Whitlatch et al. model, re-specified via EFA and modification index information, with two factors and 12 items, was the best at depicting the burden reported by the caregivers in their sample (X2 =64.75, df=49, p=„07). Other fit indices supported this conclusion: Adjusted 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Goodness-of Fit (AGFI) =.98; X2 /df = 1.32; Root Mean square Residual (RMR) = .10. Their two factors, like those of Whitlatch et al., were called Personal Strain (3 items) and Role Strain (9 items). Though Hebert et al. gave their factors the same names as Whitlatch et al. did, the item composition of the factors and the number of items in the analyses were not the same between the two studies. All of the three items from in the Personal Strain factor in the Hebert et al. (Hebert) model are also part of the Personal Strain factor in the Whitlatch et al. (Whitlatch) model. The other nine items from the Whitlatch Personal Strain factor are not a part of the Hebert model at all (e.g., embarrassment, anger, too much expected). While all six of the items for the Whitlatch Role Strain factor load onto the Hebert Role Strain factor as well, the Hebert model also includes three other items for this factor, none of which were included in the Whitlatch model. One of the three extra items was item 22 - the global item. The other two additional items ask the caregivers if they “ feel their health has suffered” and if they “ fear for the future”. Neither model used the item that asks if the caregivers feel they “don’t have enough money to care for their relative” in addition to their other expenses. Overall, Hebert et al’s Personal Strain factor seems to be focused more on the negative feeling the caregiver may have regarding her caregiving responsibilities, while the Whitlatch et al.’s Personal Strain factor is more inclusive with items that also ask about the caregiver’s negative perceptions of and responses to the care recipient. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As mentioned above, Hebert et al. tested the Knight et al. model. The chi-square was 230.80, df=75, p<.01, and the fit statistics were AGFI=.94, the X2 /df =3.08, and the RMR=.26. The AG FI supported model fit, but the X2 /df was borderline. The RMR should be approximately .05 or less in a well-fitting model (Byrne, 2001). The RMR here can be interpreted as meaning the Knight model explained the correlations to within the average error of .26. This is outside the desired range, but so was the final Hebert model with an RMR of .10. Indeed the Hebert model was the best fitting of all those tested, but the Knight model was the next best fit, even better than the model credited to Whitlatch (X2=1120.45, df=208, p<.01; AGFI=.90; X2 /df=5.39; RMR=.32). It is odd that Hebert et al. chose to revise a worse fitting model. The Whitlatch model was not even the next best model after the Knight model. There was another model that was better. Only the base model of 22 independent items was a worse fit. There was no explanation as to why they chose to revise the worst factor structure, second only to a model with no factors at all. As explained later, Knight et al. also tested the Whitlatch model, and found that it did not fit the data. Bedard, Molloy, Squire, Dubois, Lever, and O'Donnell (2001) also endeavored to develop a short version (12 items) of the ZBI - one that could be used in cross-sectional, longitudinal, and intervention studies. Data from caregivers of cognitively impaired older adults were factor analyzed with a 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. principal-components analysis and Varimax rotation. The authors were a lot less explicit about their factors than other authors have been. Their main focus was creating a screening version and a briefer version that worked as well as the 22-item version of the ZBI. They did several EFAs, but their main focus was not to find the factor structure of the ZBI but to find items to test at various times with their longitudinal data. They mostly mention “ factor one” and “ factor two,” and only once refer to the factors by name. From the sparse references it seemed their two factors- Personal and Role Strain- were composed of nine and three items respectively. The Cronbach’s alpha for the Personal strain factor was .89, and for the Role Strain factor it was .77. They describe these as equivalent to the alphas in the Whitlatch et al. study (.80 and .81). The factors explained approximately 50% of variance for the baseline and follow-up assessments. Their version had eleven items in common with Whitlatch et al. (1991) and eight items in common with Hebert et al. (2000). As mentioned earlier, both Whitlatch et al. and Hebert et al. also called their two factors Personal Strain and Role Strain. However, the items common among all three models mostly load on the Personal Strain factor for Bedard et al., but load on the Role Strain factor for the Hebert et al. and the Whitlatch et al. models. Examples of items that loaded on opposite factors were questions that ask the caregivers if because of caring for their relative, they feel they “don’t have enough privacy,” “their social life has suffered,” “and their relationships with 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. family have been affected in a negative way”. This difference may be a sign of the complex nature of caregiver burden in that it may be difficult to judge which items indicate experiences that are predominately personally stressful and which are stressful due to role conflict or overload (per Whitlach et al., 1991 explanations of personal strain and role strain). This may also reflect Bedard et al’s desire to set their model apart from the other two models, their disagreement with Whitlatch et al.’s and Hebert et al.’s interpretation of the ZBI items, or their use of different definitions of role and personal strain from some other literature. It has happened that disciplines vary on their use and definitions of a certain term. They also vary on terminology used for the exact same construct. In this instance, one can only speculate about the particular reason, as this obvious conflict was not discussed in their article. Considering the disagreement between studies regarding item content, the constructs of personal and role strain seem to be unstable. Lastly, O’Rourke and Tuokko (2003a) examined the psychometric properties of Bedard et al.’s abridged, 12-item version of the ZBI. The methods used included both exploratory and confirmatory factor analyses with data from a sample of dementia caregivers in Canada. They concluded the results supported the 2-factor structure of nine items for Personal Strain and three for Role Strain established by Bedard et al. Further, just like Bedard et al., they found no loss of predictive validity as a result of item reduction from 22 to 12. They decided that though the 22-item version has 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. been the most commonly used (Bedard et al., 2001), abridged versions have been developed with the expectation that burden will be measured more consistently if a reliable and valid brief measure can be identified (O’Rourke & Tuokko, 2003a). The 12 item version had a Cronbach’s alpha of .85 and fit indices as follows: chi-square=79.72, df=36,p<.01; CFI=.99, AGFI=.97, RMSEA=.037. Though the 12 items loaded significantly on their respective two factors, four of the items loaded significantly across the factors with loadings >.30 in the EFA. In fact, one item “uncertain about what to do” had a higher loading for the opposite factor (Personal Strain) than the one (Role Strain) proposed by Bedard et al., and this cross-loading was confirmed in the CFA. The fit indices were obtained after item error terms were correlated. The correlated terms and the cross-loadings may suggest instability. Bedard et al.’s main objective was a briefer version and the factor analysis was only a part of that. The article does not say which other three items also cross-loaded, but Hebert et al. and Whitlatch et al. did have several of the Bedard items on the opposite factors as mentioned earlier. Misspecified solutions lead to lack of consensus regarding which items appear on each subscale (Miller at al., 1997). Three factor In contrast to the above research, one of the authors of the ZBI, Zarit (1989), said he and Hassinger found the ZBI to have three factors in a 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sample of caregivers of family members with dementia. The three factors were Caregiver Anger, Patient Dependency, and Caregiver Lack of Privacy. However, he was not explicit about their analyses to obtain this factor structure. Knight et al. (2000) confirmed a three-factor model was in two different caregiver samples. Their analyses included competing models (1-, 2-, and 3- factor models) to investigate if the ZBI is indeed a unidimensional measure, to test the validity of the 2-factor Whitlatch et al. model and the fit of a possible multifactor model per Zarit’s description of a three-factor model (Zarit, 1989). They used confirmatory factor analysis (EQS package) to test both 18-item and 21-item versions of the scale (Item 22 - the global item - was not included). The 21-item general burden factor model did not fit the data (X2 = 626.9, df=189, p<.001; CFI = .80). The 18-item, 2-factor Whitlatch et al. model was an improvement over the 1 -factor model, but did not fit the data (X2 = 476.44, df=134, p<.001; CFI = .82). Given that the one- and two-factor models did not fit the data, they did an EFA using oblique rotation and found a three-factor model. The X2 statistic (207.30, df=75, p<.001) indicated a Sack of fit, but the goodness-of-fit index showed acceptable fit with CFI=.91. One item “social life has suffered due to caring” loaded highly on both Embarrassment/Anger and Patient’s Dependency. This item was placed on the Patient’s Dependency factor, and the model was confirmed in a second sample of caregivers with a chi-square 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of 102.71, df=75, and p=.01 and CFI=.94. Though not specifically mentioned, their X2 /df showed good fit too, since the ratio was less than two (1.37). Their three factors were Embarrassment/Anger (8 items; e.g., feel strained, have less privacy, feel social life has suffered, feel embarrassed), Patient’s Dependency (4 items; e.g., relative is dependent, not enough time for self), and Self-criticism (2 items; e.g., should be doing more, could be doing a better job). The Embarrassment/Anger and Patient Dependency factors contain some of the same items as the Role Strain and Personal Strain factors in the model by Whitlatch et al. The Self-criticism factor only included role strain items from the Whitlatch et al. model. All but one of the items (health has suffered) in the 3-factor model are also in the Whitlatch et al model. This three-factor model, like all the other models in the literature, does not include all of the 21 non-global items, just 14 of them. Knight et al. conclude that the finding of a three-factor model having the best fit of the models tested is evidence that the ZBI is not unidimensional but does indeed have several dimensions just as other burden scales are multidimensional (Lawton et al., 1989; Novack & Guest, 1989). The factors in the Knight et al. model overlapped with factors in other scales that were developed to be multidimensional. Comparison of Four Models O’Rourke and Tuokko (2003b) did a study comparing the Whitlatch, 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bedard, Hebert, and the Knight models using a nationally representative sample in Canada. Their analyses included examination of internal consistency, factor structures, and concurrent validity. The internal consistency coefficients ranged from alpha= .87 to .62 with the Hebert measure being the highest and the Knight measure being the lowest. They mention that alphas that fall between .90 and .80 are within the optimal range for these parameters. The Bedard model at .85, along with the Hebert model, falls within this range. The Whitlatch model at .73 had moderate internal consistency, but the Knight model was in the poor range. This finding makes sense, as high internal consistency can be a sign of unidimensionality. Seemingly, the model with more factors should have worse internal consistency. Regarding the confirmatory factor analyses (performed with LISREL), all of the models were determined to be viable per the goodness-of-fit indices. The Bedard model required corrections or correlated error terms to achieve effective fit of the data. The chi-square was 82.93, with df=39, p>.01, but the other indices showed good fit. The Comparative Fit Index (CFI)=.98 with a >.95 threshold. The Adjusted Goodness of Fit Index (AGFI) was >.90 at .95, and the Root Mean Square Error of Approximation (RMSEA) was <.05 at .047. Similarly the Hebert model had a chi-square of 72.51 (df = 38, p<.01), and good fit per the indices (CFI=.98, AGFI=.95, RMSEA=.043). The Knight (X2=111.86, df=63,p<.01) and Whitlatch (Xs=227.22, df=110, 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p<.01) had higher chi-square goodness of fit tests, but their other indices showed good fit. Knight’s indices were CFl=.98, AGFI=.95 and RMSEA= .039. Whitlatch’s were CFI=.96, AGFI=.95, and RMSEA=.046. The chi- square is always included, but rarely considered for fit, as it is very sensitive to sample size. Per the other fit indices, all of these models were viable solutions for the data. Relative to depressive symptomatology (measured by CES-D), the concurrent validity was examined for all the models to see if they accounted for a significant amount of variance in the CES-D independent of demographic and patient illness variables that were also entered into the analyses. All four of the models accounted for a significant increase in observed variance of CES-D scores with the Hebert 12-item measure being the best with a change in r2 of .10. The Hebert model accounted for an additional 10% of the variance independent of the other variables in the regression analyses. The Bedard, Whitlatch and Knight models had changes in R2 of .09, .08. and .07, respectively. According to O’Rourke and Tuokko, the order of best to worst regarding statistical tests is the Hebert measure, then the Bedard scale, followed by Whitlatch et al. and Knight et al. Since many of the stats for the factor analyses and the concurrent validity are very close, this conclusion is based mostly on the internal consistency statistics. They also note “without reliability no measure is valid, (p.62)” The reliability measure was thought to 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. underscore the poorer performance of the Knight model in the other categories. The authors conclude that the two-factor models seemed to be better overall than the 3-factor one proposed by Knight et al. All of these scales are brief measures of the original ZBI. This study and that of the Bedard et al. model contribute to the argument that the briefer versions are valid and reliable indices of caregiver burden, and seem to have better psychometric properties than the 22-item version. In their quest for briefer versions, these analyses also contributed to the information about the factor structure of the ZBI. Both two-factor and three-factor models seem to fit the data. There is not yet one model that stands out far above the rest. There is more evidence that the ZBI is multidimensional than there is evidence that it is not. Their results and conclusions also bring up a question of whether internal consistency of a whole scale may be less for scales with factors. Scales with factors that measure various dimensions of a construct may be somewhat related, but maybe not as well related as a unidimensional construct. Possibly the reliability of each factor may be a better indicator. Nevertheless, the current focus is the factor structure of the ZBI, so this question of reliability in unidimensional vs. multidimensional scales was not considered. Summary There have been several factor analysis studies of the ZBI. However, 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. these studies did not focus on possible differences in factor structure between ethnic groups. These analyses have often been done to verify or explore a factor structure for the original ZBI or abridged versions in caregiver samples, and other times they are done to develop briefer or screening versions of the ZBI. In general, studies have found differing numbers of factors and items that load on factors for the ZBI. There has been, however, some correspondence across studies regarding factor themes and item content. In previous cross-cultural, caregiver studies, researchers have noted the need for analysis that confirms the psychometric properties of scales that are used to compare ethnic groups (see Janevic & Connell, 2001 for review). As considered in cross-cultural studies of other scales, it seems sensible to test the possibility of different factor structures for the ZBI when comparing race groups. Lawton, Rajagopal, Brody, and Kleban (1992) suggest that Black and White caregivers may have a different ideology about caregiving. This cultural difference in perspective may relate to difference in comprehension and responses to items that might affect the measurement equivalence of the ZBI across groups. Thus, since it is a very commonly used measure, and there is a paucity of literature on the factor structure of ZBI with race groups, the proposed study would add to the knowledge of the psychometric properties of the ZBI by examining two ZBI models for metric equivalence in samples of Black and White dementia caregivers. It is 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. important to be assured that the same construct of burden is measured across race groups so that cross-cultural comparisons of caregiver burden are valid. With the ZBI analyses, the focus is not just to confirm the structure of models in two race groups, but also to replicate these models and possibly strengthen the support for them as valid factor structures. Further, since there have not been studies of the ZBI factor structure across race groups, there has been no confirmation of varying structures by race. Thus, in this study invariance across groups is assumed. Center for Epidemiologic Studies-Depression Scale The Center for Epidemiologic Studies-Depression Scale (CES-D) is a 20-item questionnaire designed to measure depressive symptomatology in population-based surveys (Radloff, 1977; see Appendix for scale). The CES- D was developed to include major components of depression (per the literature), and to be used in studies of the relationships between depression and other variables across population subgroups. Items were chosen from previously validated depression scales. The scale was tested for suitability for use in household surveys via information such as scale acceptability by both clinical and general populations, and whether interview conditions affected scores. The scale was found to have high internal consistency, and adequate test-retest reliability. Reliability, validity, and factor structure were comparable across a variety of demographic characteristics in the general 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. population samples and among subgroups. Some of the subgroups were race groups (Black and White), age groups (under 25, 25-64 and over 64), and levels of education. Rad [off did point out that test-retest correlations were less than .40 (below moderate) for Blacks, those aged under 25, and those in a subgroup of people who had an “emotional problem in the past week for which they felt they needed help.” Radloff (1977) found a four-factor structure for the CES-D including: Depressed Affect, Positive Affect, Somatic and Retarded activity, and Interpersonal feelings. Initially, only sixteen of the original twenty items were used in the four factors. She did principal components factor analyses (EFA) and found four eigenvalues greater than one, and altogether they accounted for 48% of the variance. Then, she examined a Varimax (orthogonal) rotation to four factors. Several studies have confirmed the four-factor structure, including the other four items, and have found it to be a reliable and valid measure of depressive symptoms in older adults (Davidson, Feldman, & Crawford, 1994; Hertzog, Van Alstine, Usala, Hultsch, & Dixon, 1990). Radloff did point out that the high internal consistency of the scale found in all groups argues against undue emphasis on separate factors. However, Radloff and Teri (1986) stated that though factor analysis for independent factors should be interpreted with caution for scales with very high internal consistency, factor analysis does give some indications about the clustering of subsets of items. They further conclude that, for some 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. purposes, and while using items with the strongest and most consistent loadings, it is useful to score subscales based on the four factors. The literature includes evidence that supports the standard four-factor structure, but there are also studies that support other factor structures. Studies of minority groups in particular offer alternative factor structures. The following review of the literature on the factor structure of the CES-D includes both studies that support the standard factors and those that submit other models. Confirmatory Studies Hertzog, Van Alstine, Usala, Hultsch, and Dixon (1990) examined the measurement properties of the CES-D scale in older populations. They validated the standard four factor structure in two samples - one adult sample ages 20-80 from the U.S., and one older adult sample ages 55to 78 from Canada. The model fit well in the U.S. sample (X2=343.84, df=164; GFI=.925; NFI=.908) and the Canadian sample (X2 =280.79, df=164; GFI=.909; NFI=.886). The factors were highly intercorrelated, and even more so for the U.S. sample. A second-order confirmatory analysis fitting a second-order Depression factor found good fit for both samples. They also tested for factor pattern invariance between three groups - U.S. younger, U.S. older and Canadian older. Through multiple group analyses they found invariant unstandardized item factor loadings across samples and age 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. groups. Their results supported the measurement validity of the CES-D for use as a screening tool for depression in older adults. Davidson, Feldman, and Crawford (1994) tested the four-factor structure in a sample of frail, older adults, considered an underlying second- order factor, and examined the influence of the somatic factor on the total CES-D score and the influences of age, race, health, and functional status on the four CES-D factors. Before doing the CFA, they found the scale to be highly reliable with an alpha of .86. The results showed that the model fit the frail older adult sample well X2=301.77, df=164, p<.001; GFI=.905; NFI=.822). A second-order CFA found that much of the information in the correlations between the first-order factors is accounted for by the second- order factor loadings. The somatic factor loaded especially high on the second-order factor (.8979). The change in chi-square from the first-order to the second-order was .911, df=2, p=.142, and the relative normed fit index that assessed the fit of the second-order factor to the covariance matrix of the first-order factors was.989. Additional analyses by Davidson et al. determined that the high scores of the frail elderly on the CES-D were not due to disproportionate influence of the somatic factor, as those who score high on the Somatic and Retarded Activity factor also score highly on the other three factors as well. OLS regression analyses showed that age did not have an effect on any of the four subscales, but race, functional status and health did. Age was unrelated 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to any of the symptoms on only the four factors. Race was significantly associated with the Depressive Affect, Positive Affect, and Somatic and Retarded Activity factors. The White clients were more likely to have higher depressive symptom scores than the non-White clients. Older adults with more functional limitations had higher scores on the Somatic and Retarded Activity subscale, but not on the other three subscales. Moreover, self- reported 'health status was the strongest predictor of high depressive symptoms for all four factors. Altogether, the Davidson et al. findings validated the four-factor structure in frail, older adults, and confirmed the second-order factor found by Hertzog et al. (1990). They also presented evidence that the high depression scores of frail, older adults is not due disproportionately to the somatic factor, but that race, health, and functionality, not age are related to the depressive symptoms on the four subscales. Wetherell, Gatz, and Pederson (2001) did a longitudinal analysis of depressive symptoms and anxiety in a sample of 1,391 Swedish older adults. Depression was measured by a Swedish translation of the CES-D English scale that had been tested for its accuracy at representing the original twenty items, and anxiety was assessed by the State Anxiety subscale of the State- Trait Personality Inventory. The program EQS for Unix was used to analyze the data, and model fit was evaluated by the chi-square test, the normed fit index (NFI), the non-normed fit index (NNFI), and the comparative fit index 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (CFI). In their confirmatory factor analysis of depression and anxiety, they looked at a measurement model with the standard four factors and two anxiety factors. They compared this mode! to another that combined all the depression onto one factor and all the anxiety items onto one factor, one other model with positive and negative affect factors, and a final one with only a mental health factor that combined both depression and anxiety items. The model with the four CES-D factors and the two anxiety factors fit better than all the other models (X2=2,451.29, df=780, p<.001; NFI=,89; NNFI=.91; CFI=.92). Wetherell et al. admitted that the fit was less than optimal with the highest index reaching .92. This may have suggested that the fit might be closer with a different configuration of items. Since the model had support from a lot of research previously, they decided to accept the marginal fit as opposed to make empirically driven changes that might have been idiosyncratic to that sample. The Cronbach’s alphas for the subscales (factors) were greater than .70 (acceptable) except for the Interpersonal subscale, which is composed of only two items. Mackinnon, McCallum, Andrews and Anderson (1998) examined cultural differences in reporting of depressive symptoms among older people in five Southeast Asian countries - Indonesia, Korea, Myanmar, Sri Lanka, and Thailand. The CES-D was administered verbally with a somewhat more expanded introduction than usual. Three of the twenty items were not asked. Two of them - “as good as other people” and “had crying spells’ were not 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. asked because they were viewed as culturally inappropriate. The third item, “hopeful about the future,” was viewed as inappropriate to ask the elderly people being surveyed. All except “crying spells,” which is from the Depressed Affect factor, were items from the Positive Affect factor in the standard four-factor model. They also used a three-point response scale instead of four categories like in the original scale, thinking it would work better for phone interviews. They did CFAs using LISREL 8.12a and generally weighted least squares estimation of polychoric correlations, which were chosen because of their robustness to differences in response format. Their model was different than other models in that while it loaded the items on specific factor, it also loaded the items on a general factor. Unlike other models, the general factor did not have the fist order factors load on it, but the items themselves. The four factors have items loading on them but they do not have any relation to the second-order factor. With this method they were able to test the improvement of fit in the model by adding each of the four factors. All factors were orthogonal. The model, including the four-factors along with the general factor, replicated in all five countries. Overall the chi-squares of the models improved when all four factors were included with the general factor, but the four factors really did not improve the model over the single factor model. The fit of the models was based on NFI and RMSEA (root mean square error of approximation). The comparison of models was based on the relative 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. normed fit index (RNFI). For instance for Indonesia, the model including the general factor and all four factors (called Depressive Affect, Well-being, Somatic and Interpersonal in their study) had a chi-square of 253.80 with df=102, but the model with the general factor only had a chi-square of 582.34, df = 119. The NFI for the model with the four factors was .96, but for the general factor model it was .90. The RMSEA (where smaller numbers are better) for the four-factor model was .035, but for the general factor model it was .057. With all of these indices, the fit for the four factor model was better. But then looking at the RNFI = .94, which is the fit of the general factor model relative to the four factor model, it seems there is little information lost in preferring the simpler, general-factor model to the more complex four-factor model). The story is basically the same for the other four countries. The NFIs and the RMSEAs improve with the four factors added, but the RNFI is high also. The fit of the model with the four factors is good, but compared to the simpler one factor model it does not add much more information. Of the five countries, only Sri Lanka showed marginal fit with NFI=.84 and RMSEA =.054. The main conclusion of Mackinnon et al. is that there is a general factor for the CES-D and that their findings support Radioff’s (1977) assertion that the CES-D should be used as a single scale. Radloff’s argument was based on internal consistency, but they claimed that their models demonstrated that a single factor captures the information obtained by the 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. scale. However, they did also say that the structure for each site (country) was similar to the findings in Western settings. They concluded that the CES-D has comparable measurement properties across countries and described their study as a confirmation of the attributes of the CES-D. Though the four factors did not seem to add much more to the model than the general factor, having the four factors did improve the fit of the model. The type of model used was different than those that have items loading on factors and the factors loading on a overarching factor. It seemed from their descriptions about how some items loaded highly on one of the four factors, and not has much on the general factor, or that an item loaded on both, that there may have been some competition between the four factors and the general factor in terms of accounting for the variance in the items. Whether there was such a competition was not exactly clear in the article. Given the differences in their models and other second-order and first order models, that their model had orthogonal not oblique (correlated) factors, and their choices about how to administer the scale, it may be more difficult to make comparisons with their conclusions and others. Further, it was not clear why, or if, the weight given the RNFI to override the evidence offered by the other fit indices was warranted. However, it was clear that the four factors did not make the model fit worse, and chi-squares did decrease 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. even though the degrees of freedom also decreased from the general factor only model to the model also including the four factors. In addition, Mackinnon et al. did not find that somaticization was a predominant feature of depressive symptomatology in any one culture out of the five tested. They also asserted that their results showed an acceptable approximation to configurational invariance across the five Asian countries, as RNFIs for constrained CFA models between pairs of the countries showed that little information was lost by constraining the models to be identical between two countries. They compared some of the findings with those of an Australian sample from a previous study they did where the new second- order model was used. The Australian sample represented Western culture. All in all, they conclude from their findings of invariance, lack of somaticization, and especially the underlying general factor that the manifestation of depressive symptomatology was more similar between Asian and Western countries than previously thought. Studies of Alternative Models Manson, Ackerson, Wiegman Dick, Baron, and Fleming (1990) looked at the factor structure of the CES-D in a sample of American Indian adolescents. As part of an examination of the psychometric characteristics of the CES-D, they did an exploratory factor analysis that resulted in a five- factor solution. However, the scree plot suggested that a two- or three-factor model was more appropriate for the data. They did a Varimax (orthogonal) 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. rotation of the three-factor solution. Then accepting all items with loadings of .40 or higher, which is the same criterion used by Radloff (1977), their results were that Factor 1 had nine items, Factor 2 had six, and Factor 3 had five. Their Factors 2 and 3 were most similar to Radloff’s two factors Somatic and Retarded activity and Positive Affect, respectively. Their Factor 2 also comprised two items from Depressed Affect (“could not shake the blues” and “ felt depressed”). Factor 3, along with the four Positive Affect items from the standard model, also contained one item that loads on the standard Somatic and Retarded Activity factor (“everything I did was an effort”). Their Factor 1 had more items and was more general than Radloff’s Depressed Affect including both Interpersonal factor items, and two Somatic and Retarded activity items (“ talked less” and “could not get ‘going’”). Overall, Factor 1, 2, and 3 are composed of items that for the most part correlate with Depressed Affect, Somatic and Retarded Activity, and Positive Affect, respectively. The Interpersonal factor did not appear as a separate factor in this American Indian adolescent population. There is no information about the five-factor model to know if this factor emerged in that solution. Their two-factor model combined their Factors 1 and 2 from the three- factor model, and left Factor 3 intact. They did not say which of the two models would be best, just that the three- or two-factor solution would be better than the five-factor model. However, there seemed to be a lot of overlap between Factors 1 and 2 since both had Depressed Affect and 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Somatic and Retarded Activity items in them, and there were two items that had loadings equal to or greater than .40 on both factors (“felt depressed” and “felt fearful”). Combining the two factors in the two-factor model seemed to reduce overlap between factors, and made the factor structure clearer with a Negative Affect factor (15 items) and a Positive Affect factor (5 items). Miller, Markides, and Black (1997) examined whether the standard factor structure of the CES-D applied to two samples of elderly Mexican Americans. Their premise was that though the four-factor structure has been supported by prior research, there was also research to suggest that the factor structure of the CES-D varied by culture. They mention that studies have found that some cultures may express depression more somatically than affectively and the somatic and affect factors blur. Another time, the somatic and affect factors were clear, but the interpersonal factor was not present in a Mexican American sample. Further, other depression inventories had only two factors (Depression and Positive Affect), which is consistent with theory that suggests that negative affect and well-being are the two major dimensions of emotion. They first performed an exploratory analyses of data from 2,536 noninstitutionalized southwestern Mexican Americans (Sample 1). EFAs were done on four subgroups defined by gender (male and female) and language of interview (English or Spanish). Using a standard of eigenvalues being greater than 1.0, they found two of the four EFA models identified four 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. factors (English- and Spanish-speaking males), one EFA had three factors (Spanish-speaking females) and a fourth model (English-speaking women) was an ultra-Heywood - the communality estimate was greater than 1.0, making it impossible to obtain a solution. The models for the four subgroups had multiple loadings for all the items except the four Positive Affect items, which they called Well-being. Those four items represented one clear factor for all four subgroup models. For English-speaking women (the ultra- Heywood case in the four-factor analysis), they tried a three-factor model instead, and a solution converged. But even on this three-factor model there were lots of multiple loadings for many of the items. The large number of multiple loadings for each model and across all the models and the ultra- Heywood case indicated that the four-factor and three-factor solutions had too many factors, and models with fewer factors should be tried. Additional evidence for reduced factor solutions came from the scree plots for all four subsamples that suggested only two factors were present. They also did EFAs with another sample of Mexican American elders (Sample 2) and found there was similar evidence - multiple loadings, scree plot - for a two-factor solution. This EFA with the second sample did not include subgroups like in the first set of EFAs with Sample 1 due to smaller sample size (n=33Q). Though evidence from the EFAs indicated a two-factor solution fit the data best, they also did confirmatory factor analyses for both four-, and two- 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. factor models. The four-factor models examined for fit were the standard four-factor, and one that had been found as the result of a CFA with Mexican Americans (which is Sample 2 in this study) that had been reported in a previous study (Krause & Markides, 1985). Miller et al. used only the larger sample for the CFA of the four-factor models. The previously tested four- factor model was very similar to the standard model. Three of the factor names were different and a few of the items were on different factors. The four-factor model factors were Somatic Problems (7 items), Negative Affect (5 items), Interpersonal (4 items), and Well-being (4 items). Their Somatic problems factor included a Depressive Affect items from the standard model, but did not include one of the Somatic and Retarded Activity items. The Negative Affect factor did not have two of the Depressive Affect items, because one was on the Somatic factor and one was on the Interpersonal factor. Their Interpersonal factor had the two items from the standard model plus a Depressive Affect item and a Somatic and Retarded Activity item. The Well-being factor contained all the four Positive Affect items. In comparison, the standard model (X2 =4,774.50, df=212, pc.0001; GFI=.89; CFI=.91; AGFI=.86; TLC=.89) did not fit the data as well as the alternate four-factor model (X2=2,437.21, df=212, Pc.OOQI; GFI=.94; CFI=.96; AGFI=.92; TLC=.95). For all except the chi-square, values of .90 and greater indicate good fit. All of their CFAs were done using L1SREL V IS 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with weighted least squares estimation. These CFAs were also multiple indicators, multiple causes (MIMIC) models, and included age, gender and language as control variables for the four-factor CFAs. The exploratory analysis suggested a two-factor model of Depression (16 items) and Well-being (4 items) fit the data best. They did MIMIC confirmatory factor analyses of that two-factor model using both Mexican- American samples. Results showed the model fit for the larger sample (Sample 1), but not for the smaller one (Sample 2). The chi-square tests rejected both models at p values <.0001 (2320.63,df=222 and 1200.39,df=218 respectively). The other fit indices were GFI=.94; CFI=.96; AGFI=.93; TLC=.95 for Sample 1 and GFI=.79; CFI=.78; AGFI=.74; TLC=.75 for Sample 2. They redid the analyses without the control variables, but the Sample 2 model still did not reach acceptable fit levels though fit improved. For Sample 1, The GFI and AGFI dropped below acceptable levels, but the CFI and the TLC both increased by .01. When looking at the amount of variance of the observed variables accounted for by the latent variables (factors), they noticed that for some items the factors accounted for less than 40% of the variance. Such items are considered unreliable if modification indices do not suggest that the items may be more highly correlated with another latent variable in the model. As such, they decided that the differences in fit indices between the two samples were related to several unreliable items in the Sample 2 data. 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. It is interesting that they did not consider this information evidence contrary to a two-factor model. Their two-factor model was not replicated in the second sample. They said that the items were unreliable, but they did not know the cause of the unreliability even after looking at differences in variables such as education, and trying different methodologies. An alternate conclusion would be that enough variance was not accounted for in seven of these items by the factors, because the right factor was not available. Hence the model did not have enough factors. Of the items deemed unreliable, three were from the Somatic and Retarded Activity factor from the standard model, two were from the Interpersonal factor, and two were from the Positive Affect factor. This could be seen as suggesting that having Somatic and Retarded Activity and Interpersonal factors may have accounted for more variance for at least some of those “unreliable” items. Miller et al. seemed to consider every other possibility but this one. The goodness of fit indices for the CFAs with Sample 1 were better with the two-factor model than the standard four-factor model (but basically the same as the alternate four-factor model). They argue however, that since model fit should not improve when unnecessary factors are added, these results suggested that the more parsimonious two-factor model was the best fit. The EFAs with the two samples of elderly Mexican Americans living in the Southwestern U.S. (Samples 1 and 2) supported the two-factor structure of Depression and Well-being for the CES-D. The two-factor model 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. also replicated across gender and language of interview (English or Spanish) in the EFAs. When the two-factor model was tested via CFA in the two samples, it fit very well for one sample, but not the other. Taking all the information gathered in their study, the authors concluded a model of well-being and general depression factors is the best structure for the CES-D among Mexican American older adults. They also suggested that there is a need to explore alternate structures for the CES-D in different populations. Additionally, the adequacy of different structures should be evaluated using multiple criteria as use of only one criterion may lead to faulty subscale construction. The Miller et al. (1997) model fitted with other two-factor depression instruments and the literature they cited regarding emotion having two major aspects. They also find an alternate CES-D structure for older Mexican Americans and confirmed it in a sample of Mexican Americans. Their methods seemed very well devised, and their article explains the reasons behind their analytic decisions. The two-factor model in this study was very similar to the two-factor model in the Manson et al. (1990) study. The only difference was that the Well-being factor only contained the four Positive Affect items, whereas with Manson et al. there was also a Somatic and Retarded Activity item in their Positive Affect factor. If not for their careful methodological attention, one might wonder if these models really just reflect the different responses to the 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. positively and negatively worded items. The correlation matrix in the Miller et al. article shows negative correlations between the four positively worded items and the other sixteen items, and positive correlations with each other. Lastly, most of the evidence for the two-factor model was based on EFA. The two-factor model did not replicate in Sample 2, but the alternate four-factor model had previously been confirmed in that sample, and then it was confirmed again in Miller et al.’s study with Sample 1. They seemed to put less weight on the replications through CFA for the alternative four-factor model, in favor of the two-factor model that did not replicate even with several control variables added to the analysis. Long Foley et al. (2002) looked at the factor structure of the CES-D in a sample of African American older adults. They did a two-stage EFA where they first used principal components analysis with no rotation. Then based on the scree plot and Eigenvalues obtained, they then used an orthogonal (Varimax) rotation and four factors of depressive symptomatology: Depressive/Somatic, Positive (D/S), Interpersonal, and Social Well-being (SWB). Though all twenty items of the CES-D were used, two of the items were found to be “ factorially complex,” loading almost equally on two different factors, and one item did not load on any factor. “I talked less” loaded on Interpersonal and Social Well-being, while “ thought my life had been a failure" loaded on both the Interpersonal and the Depressive/Somatic factors. 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. It was noted that these two items were not included in the original Radloff four-factor structure (which did not have four items loaded on a factor), and that each of them had been problematic in analyses in previous studies. In other analyses, “ talk” has been shown to load on the Somatic and Retarded Activity factor in what has been called the standard four-factor structure, and the “ failure” item has loaded on the Depressed Affect factor in such studies. One item “could not get going” did not load on any factor in these analyses, but usually loads on the Somatic and Retarded Activity factor in the standard four-factor structure. This study’s first factor Depressive/Somatic was seen as further evidence that depressive and somatic symptoms may not be separate dimensions in some ethnic groups, such as African Americans, since a separate somatic factor was not identified. This factor contained all seven of the Depressed Affect items and four of the Somatic and Retarded Activity items (“bothered,” “mind,” “effort,” and “sleep”) from the standard four-factor model. Their Positive factor had three items and all of them were shared with the standard Positive Affect factor. The two Interpersonal items were the same as in the standard model. The last factor- Social Well-being- has not been found in the literature previously. This factor included two items from the standard Somatic and Retarded Activity factor such as “appetite” and “ talk,” along with one item from the standard Positive Affect factor (“hopeful”). The authors reasoned that this factor might tap into the social 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. interactions of African American older adults. Further, they concluded that their findings support evidence that the CES-D seems to have varied measurement properties across diverse populations. This study did find four factors and for two of their factors (Depressed/Somatic and Interpersonal), the items from two of the factors for the standard CES-D model did hang together. Their D/S factor contained all of the Depressed Affect items. This factor was not broken up. The Interpersonal factor, which Radloff (1977) mentions as the weak factor, held up and its two items stayed together to form their Interpersonal factor. Their Positive factor also had three of four Positive Affect items from the standard model. The Somatic and Retarded Activity factor separated the most with items on two different factors. Long Foley et al. did not do a confirmatory factor analysis in their methods. They neither confirmed their four-factor model, nor did they attempt to disconfirm the standard four-factor model. Summary There have been more studies of the CES-D with diverse samples than many other scales, and different factor structure solutions for the CES-D have been found for different race/ethnic groups. It has been proposed that there is no standard, universal factor structure for this scale (Long Foley et al., 2002). Moreover, it has been proposed that factor structure may differ among ethnic or minority groups. Possible explanations that have been proposed are that in some cultures there as a greater mind/body connection 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and psychological and physical symptoms are more connected, or that there are cultural differences in the reasons for depression or the events that lead to it. Thus, the symptoms related to the feelings of depression or the expression of depression might vary such that factors, or the way symptoms hang together, might also vary between groups. Such hypotheses may be true, but so far the case for these hypotheses to be tested has not been fully built. Most of the evidence for different factor structures in minority groups has been based on exploratory analyses. As scales can operate with some differences from group to group, even from sample to sample of similar groups, it is very possible to find different factor solution when there is no a priori theory about how the structures should operate as in EFA. Many factors other than group differences can also affect EFA. For instance, EFA varies by software package used in that how the algorithms are calculated differs. EFA can also vary depending on the machine or hardware used to run it on - as the speed of the machine at doing the calculations makes a difference in the outcomes. The studies that conclude that there is a disparate factor structures by race group do so based on EFAs, and do not first disconfirm the standard four-factor structure in their minority group samples. The predominant process thus far has been to find a new structure first, then possibly to confirm that new one. This does not seem to be the logical process. Instead, before devising a new structure, one should first be certain that the 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. standard structure that has been tested and confirmed in several samples does not fit the data at all. Then once a definite misfit has been determined, one can look to find what structures do fit the data. The current study was devised to follow the latter procedure and to help fill in this confirmatory gap in the literature. This study expanded upon the information about the psychometric properties of the CES-D by examining the measurement equivalence of the standard four-factor structure for the CES-D between African American and White groups. The endeavor here was not to find a new factor structure to fit a minority group, but to confirm whether or not the standard four-factor structure can fit data from a minority group as well as data from a White group. Thus, this study assumes that the standard CES-D four-factor structure is invariant across the two race groups. Cross-cultural differences in depression or burden Though some studies indicate few racial/ethnic differences in either one-year or lifetime prevalence rates of diagnosable mental disorders among older adults, researchers have proposed that estimated prevalence rates for ethnic minority older adults may be biased due to low acceptability of standardized measures of symptom reporting. Additionally, sociocultural factors, such as differences in perception, interpretation, valuation, expression, and tolerance of symptoms may add to this bias, along with physical, cognitive, and functional impairment (Mui et al, 2001). Long Foley, 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Redd, Mutran and DeVellis (2002) also discuss how social, cultural or linguistic differences between populations may lead to differentials in comprehension and responses to scale items, which may then result in nonequivalent measures of mental health symptoms. Thus, it is important that the fitness of standardized scales and their items be properly evaluated for their relevance to cultural subgroups. This study explored whether there may be measurement differences in mental health instruments used in caregiver research by race group in an effort to contribute to the understanding of differences in mental health outcomes by race, and if assumptions of equality of constructs across groups may be problematic. The confirmatory factor analyses for this study tested models of the CES-D and the ZBI for metric invariance across Black and White dementia caregivers. The theoretical underpinnings of these analyses were based on the models examined. Each model represents the original authors’ theory regarding how caregiver burden or depression is measured. Each has proposed, proven, confirmed or chosen a particular model among others for which to advocate. The three theories chosen for replication and confirmation in these analyses were those reflected in the Hebert et al. ZBI model, the Knight et al. ZBI model, and the standard four-factor CES-D model. Since there are no factor analytic findings for the ZBI, and very little confirmatory factor analysis on the CES-D to the contrary, the hypotheses in this study assumed metric invariance by race. 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Hypotheses 1. The number of factors for each scale is equivalent across groups. 2. The pattern of factor loadings is equivalent in regards to item composition, rank order and magnitude. 3. The relationships between the factors of each measure are equivalent. 4 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 2 Methods Participants and Procedures This study is an analysis of secondary data from three samples. The first sample included 44 African American and 110 non-Hispanic White caregivers and was collected from 1990-1993 through the Research Training and Information Transfer Core of the Alzheimer’s Disease Research Center- Southern California consortium at the University of Southern California. The sample was recruited from a variety of referral sources such as the Alzheimer’s Association of Los Angeles County and the Los Angeles Caregiver Resource Center. Participants qualified if they were the primary caregivers of the older adult and they were adults themselves (at least 18 years of age). The interviews were done either in the home of the caregiver or at the lab, with the location choice being the caregiver’s. More than ninety percent were home interviews. Caregivers also chose the interview times. The second sample included 41 Black and 54 White caregivers who were caring for a demented family member. Participants qualified if they were at least 50 years old, either lived with or provided at least 8 hours of care per week for the care recipient, and the recipient suffered from chronic memory loss. This sample was collected between 1994 and 1996 also under the Research Training and Information Transfer Core of the Alzheimer’s Disease Research Center - Southern California consortium. Caregivers 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. were recruited from a variety of referral sources including the Los Angeles Caregiver Resource Center and the Tingstad Older Adult Counseling Center. Interviews were completed in the caregiver’s home or at the university, depending on the caregiver’s preference. The majority were home interviews. The interviewers were graduate students from the School of Gerontology. The third sample is from the Stress, Ethnicity and Caregiving study conducted at the University of Southern California. Data were obtained from 2001-2003. There were 95 Black caregivers and 65 White caregivers. This random sample of caregivers was obtained through phone calls made to homes in Los Angeles County. The households called were located through census tracts which had high percentages of older adults residing in them (10% or more 65 years or older), and were predominately (60% or higher) African American or White, nonHispanic . To decrease possible socioeconomic confounds, the tracts selected also had to have an average household income below the median for Los Angeles County. Project telephone interviewers called randomly selected households from reverse telephone directories. In a short telephone screening, the purpose of the call was explained; an adult in the household was asked if he/she or anyone in the household cared for a person over the age of 55 who had “memory problems.” If yes, the caller then asked if a brochure could be sent to the caregiver explaining the study. If not, the adult was asked if they 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. had a family member in the Los Angeles County area that did care for such a person, and that could be contacted for participation. Caregivers were included as participants if they said they were currently caring for a family member (or someone they considered family) who was over the age of 55 and had abnormal memory problems or were diagnosed with dementia. Such participants are contacted to arrange an in- person interview, and offered resources and $20 as incentives to participate. The interview is scheduled at the participant’s discretion and usually in their home or on campus. The majority of the sessions were in-home interviews (over 90%). Trained graduate research assistants did home interviews of the participants. The interview sessions included demographic information, the ZBI, the CES-D, and several other psychological and physical measures. At the beginning of the interview, both the participant and the research assistant signed a consent form. At the end, a receipt signed by both, twenty dollars and resource information were given to the participant. Though the data were collected during various years, all of the participants were primary, not secondary, caregivers. The primary caregiver is the person most responsible for the older adult with dementia, or severe memory problems. All participants were asked the same questions in the same format. The vast majority of the interviews were conducted at home. Home interviews are considered a way to decrease bias toward less distressed caregivers (Knight et al., 2000). This sample includes as a subset 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the primary caregivers who were the subject of the Knight et al. factor analysis described in the Introduction. Measures The measures reported here were part of a larger set of measures designed to examine the stress and coping processes in dementia caregivers. Constructs measured by some of the other scales included physical and mental symptoms, familism, emotional and instrumental support. The following measures were chosen from the larger set because of their specific relation to the focus of this study. Sociodemoqraphic characteristics As mentioned, demographic information was obtained in caregiver interviews. Age and gender were included in the preliminary analyses. All of these variables were self-reported. Interviewers asked participants their age. It was entered as number of years, in whole numbers without decimals or fractions of years included. Both gender categories are included in the analyses. Male was coded as 1 and female as 2. Burden The Zarit Burden Inventory was used to assess appraisal of burden related to caregiving (Zarit et al., 1980). The version of ZBI used in this study has 22 items. The items are scored from 0 to 4 with higher scores indicating higher levels of distress. Total scores range from 0 to 88. The internal consistency of the ZBI has been reported as alpha=.92 (Hebert et al., 2000; 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Knight et al., 2000). Relative to a single global rating of burden, the validity of the ZBI has been estimated at .74 and .66 (Bedard et al., 2001). The internal consistency for all 22 items for the Black sample in this study is alpha=.92, and alpha=.93 for the White sample. Depression The Center for Epidemiologic Studies Depression Scale was used to look at depressive symptomatology (Radloff, 1977). The CES-D scale is a 20-item self-report scale developed to screen for depressive symptomatology in the general population. Each response is scored from 0 to 3. There are four items that are worded in a positive direction. These four items are reverse coded before summing the items scores for the total score. Total scores range from 0 to 60 with higher scores indicating more depressive symptomatology. Scores above a standard cutoff of 16 are said to be indicative of clinical levels of depression. The CES-D has high internal consistency (alpha = .85), and correlated well with other scales designed to measure symptoms of depression (.51 to .61; Radloff, 1977). The internal consistency for all 20 items for the Black sample in this study is alpha=.88, and alpha=.85 for the White sample. Preliminary analyses Distribution Analyses included histograms to look for outliers and check distribution. Most items were skewed, either positively or negatively. Total 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. scores of the ZBI items included in the models were computed for each of the two samples. Histograms of these scores represent multivariate distribution. For the Black caregivers, the total scores for the 12 items included in the Hebert et al. (Hebert) model ranged from 0 to 48 with a mean of 18.4. The histogram is shown in Fig. 1a. Though the distribution was somewhat positively skewed, it did not significantly depart from symmetry, as the skewness (.237) was not more than twice its standard error (.184). Figure 1a. Multivariate distribution of total scores for items in Hebert et al. model for Black caregivers. 40 30- 20 >* 10 o c CD 3 €T C D I LL 0 Std. Dev = 10.67 Mean = 18.4 N = 175.00 0.0 10.0 20.0 30.0 40.0 50.0 5.0 15.0 25.0 35.0 45.0 Total Score for Hebert 12 Items 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For the White caregivers, the total scores for all items included in the Hebert model ranged from 0 to 46 with a mean of 22.6. The histogram is shown in Fig. 1b. The skewness for this histogram did not equal zero (-.05), but not twice its standard error (SE; .162). Figure 1b. Multivariate distribution of total scores for items in Hebert et al. model for White caregivers. I_ b 10 Std. Dev = 10.82 Mean = 22.6 N = 225.00 0.0 10.0 20.0 30.0 40.0 5.0 15.0 25.0 35.0 45.0 Total Score for Hebert 12 Items The range for the 14-item Knight et al. (Knight) model was 0 to 52 with a mean of 21. 3 for the Black sample (see Fig. 2a). The skewness was not significant for this multivariate distribution at .329 (SE=.184). The range for the Knight model was 0 to 52 with a mean of 25. 7 for the White sample 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (see Fig. 2b). As with the Hebert model, the multivariate distribution for the Knight model total score for the White sample was very close to normal (zero) with skewness calculated at .021 (SE=.162). Figure 2a. Multivariate distribution of total scores for items in Knight et al. model for Black caregivers. 40 ----------------------------------------------------------------------- Std. Dev = 10.47 Mean = 21.3 N = 175.00 0.0 10.0 20.0 30.0 40.0 50.0 5.0 15.0 25.0 35.0 45.0 Total Score for Knight 14 Items 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 2b. Multivariate distribution of total scores for items in Knight et al. model for White caregivers. 50 40 ■ 30- 20 - >> O c CD 3 Cr CD U LL 10 - Std. Dev = 11.02 Mean = 25.7 N = 225.00 10.0 20.0 30.0 40.0 0.0 50.0 5.0 15.0 25.0 35.0 45.0 Total score for Knight 14 Items The CES-D 20-item model ranged from 0 to 53 with a mean of 16.06 for the sample of Black caregivers. See Fig. 3a for the histogram. The distribution was significantly skewed at .743 (SE=.188). For the White caregivers, the total scores for all items included in the CES-D model ranged from 0 to 54 with a mean of 17.74. The histogram is shown in Fig. 3b. The skewness for this histogram was also significant at .632 (SE=.166). Skewness is typical in non-patient samples as there is generally a larger proportion of low scores (Radloff, 1977). 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3a. Multivariate distribution of total scores for items in CES-D model for Black caregivers. 50 40 30 20 O 10 c < D = 5 cr CD LL 0 rrrrB rrrnrrrr,rrr„,i¥ , „ „ „ T „ „ „ „ B , „ „ „ „ „ „ „ r , a r,„ L Std. Dev = 11.31 Mean = 16.1 N = 167.00 0.0 10.0 20.0 30.0 40.0 50.0 5.0 15.0 25.0 35.0 45.0 55.0 Total Score for CES-D 20 Items 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3b. Multivariate distribution of total scores for items in CES-D model for White caregivers. Std. Dev = 9.64 Mean = 17.7 N = 214.00 0.0 10.0 20.0 30.0 40.0 50.0 5.0 15.0 25.0 35.0 45.0 55.0 Total Score for CES-D 20 Items The skewness for the ZBI models was not zero, but it was not significant. The skewness for the CES-D was significant, but distributions did not depart substantially from normality (i.e., skewness was not > 2). Thus, additional methods such as transformations to normalize the data were not necessary (West, Finch & Curran, 1995). Item correlations There was also an examination of the Pearson’s correlation tables for the scales by model and group for understanding of basic relationships between items. For both groups most of the items for the Hebert model 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. were significantly correlated with the other items and item-total correlations were above .3, For the White sample, Hebert items were all significantly correlated. See Tables 1 and 2 for the correlation matrices for the two samples with Hebert items. Item-total correlations ranged from .36 to .77 for the Black sample. The item-total correlations for the White sample ranged from .39 to .77. Table 1. Correlations between items in Hebert model for Black sample. ZBI 2 2 3 1.000 .689** 6 .434** 7 .337** 9 .463** 10 .559** 11 .542** 12 .641“ 13 .340** 17 .554** 18 .526** 22 .566** ZBI 3 1.000 .438** .284** .542** .606** .585“ .585“ .359** .536** .508** .581“ ZBI 6 1.000 .298** .487** .404** .445** .418“ .508** .457“ .392** .436** ZBI 7 1.000 .249** .234** .192* .216“ .252“ .403“ .183* .308“ ZBI 9 1.000 .494“ .506** .468** .378** .487** .450“ .533** ZBI 10 1.000 .631“ .586“ .455** .660** .578*’ .622“ ZBI 11 1.000 .643“ .381“ .567** .497** .582** ZBI 12 1.000 .388** .643“ .459“ .629** ZBI 13 1.000 .442** .348“ .446** ZBI 17 1.000 .569** .656“ ZBI 18 1.000 .665** ZBI 22 1.000 ** Correlation is significant at the 0.01 level (2-tailed). * Correlation is significant at the 0.05 level (2-tailed). 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 2. Correlations between items in Hebert model for White sample. ZBI 2 2 3 1.000 .581** 6 .557** 7 .367** 9 .545** 10 .531** 11 .680** 12 .581** 13 .462** 17 .577** 18 .488** 22 .628** ZBI 3 1.000 .434** .340** .545** .558** .546** .473** .410“ .518“ .395** .567** ZBI 6 1.000 .321** .493“ .502** .590** .570** .544** .565** .504“ .610“ ZBI 7 1.000 .275** .241** .268** .321“ .232“ .371“ .209** .283** ZBI 9 1.000 .630** .511“ .398** .438“ .524“ .551“ .589“ ZBI 10 1.000 .490** .477** .440** .576“ .437** .582** ZBI 11 1.000 .581“ .491“ .528** .431“ .584“ ZBI 12 1.000 .527** .594** .407** .572“ ZBI 13 1.000 .522** .457** .490** ZBI 17 1.000 .501“ .616“ ZBI 18 1.000 .586** ZBI 22 1.000 ** Correlation is significant at the 0.01 level (2-tailed). Several of the item correlations including items 20 and 21 in the Knight model are not significantly related in the Black sample. See Table 3 for the correlation matrix for the Black sample with items from the Knight model. Item-total correlations ranged from .27 to .72 for the Black sample. 2 4 5 6 8 9 10 11 12 13 14 18 20 ZBI 2 1.000 ZBI 4 .323** 1.000 ZBI 5 .453** .478“ 1.000 ZBI 6 .434** .422“ .586** 1.000 ZBI 8 .439** .139 .138 .226** 1.000 ZBI 9 .463** .287“ .535** .487“ .233** 1.000 ZBI 10 .559“ .400“ .509** .404** .311“ .494** 1.000 ZBI 11 .542** .252“ .412** .445** .280** .506** .631“ 1.000 ZBI 12 .641“ .317“ .459** .418“ .372“ .468** .586** .643** 1.000 ZBI 13 .340“ .516“ .441“ .508** .047 .378** .455“ .381“ .388** 1.000 ZBI 14 .380“ .232** .275** .298** .433** .265“ .324** .317“ .500** .104 1.000 ZBI 18 .526” .305** .513“ .392** .297“ .450“ .578** .497** .459** .348“ .254** 1.000 ZBI 20 .149* .199" .174* .226** .193* .353** .259** .094 .090 .120 .097 .163 1.000 ZBI 21 .110 .177* .147 .204“ .129 .176* .196“ .122 .143 .110 .122 .086 .569** 21 ** Correlation is significant at the 0.01 level (2-tailed). * Correlation is significant at the 0.05 level (2-tailed). 1.000 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. There were fewer nonsignificant correlations for White sample. The correlation matrix for the White sample is in Table 4. The item-total correlations for the White sample ranged from .32 to .72. Table 4. Correlation matrix for items in the Knight model for the White caregiver sample. 2 4 5 6 8 9 10 11 12 13 14 18 20 21 ZBI 2 1.000 ZBI 4 .393** 1.000 ZBI 5 .523** .554** 1.000 ZBI 6 .557** .514** .481** 1.000 ZBI 8 .389** .101 .247** .225** 1.000 ZBI 9 .545** .522** .587** .493** .173** 1.000 ZBI 10 .531** .413** .479** .502** .293** .630** 1.000 ZBI 11 .680** .372** .440** .590** .259** .511** .490** 1.000 ZBI 12 .581** .333** .383** .570** .399** .398** .477** .581** 1.000 ZBI 13 .462** .499** .359“ .544** .221“ .438“ .440** .491“ .527“ 1.000 ZBI 14 .419“ .298** .284** .328** .365** .359** .429** .378** .391“ .455“ 1.000 ZBI 18 .488“ .477“ .481“ .504** .160* .551“ .437“ .431“ .407** .457** .375** 1.000 ZBI 20 .264“ .315“ .244“ .280“ .052 .309** .301“ .314“ .204** .312“ .155* .157* 1.000 ZBI 21 .142* .275“ .263“ .172* -.064 .312** .225** .226** .073 .230“ .117 .136* .661 1.000 “ Correlation is significant at the 0.01 level (2-tailed). * Correlation is significant at the 0.05 level (2-tailed). Many of the item correlations from the CES-D scale were not significant, for both groups. Most of nonsignificant correlations included the Positive Affect items - 4, 8, 12, and 16. See Table 5 and Table 6 for the correlations for each group. Item-total correlations ranged from .24 to .79 for the Black sample. The highest correlated item was “I felt depressed.” The item-total correlations for the White sample ranged from .15 to .64. The positively worded items were some of the most weakly correlated items. 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Table 5. Correlations between items in CES-D model for Black sample. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 1.000 2 .282** 1.000 3 .359** .384** 1.000 4 .112 .120 .103 1.000 5 .281** .214** .505** .137 1.000 6 .437** .513** .645** .090 .470** 1.000 7 .264** .356** .414** .023 .353** .524** 1.000 8 .113 .109 .098 .325** .055 .156* .018 1.000 9 .176* .292** .516** .035 .159* .490** .410“ .068“ 1.000 10 .306** .369** .552** .066 .383** .576** .445“ .157* .469“ 1.000 11 .201** .439** .392** .009 .266** .504** .359“ .049 .320** .382“ 1.000 12 .255** .171* .154* .369** .218** .291** .064 .456“ .088 .247“ .123 1.000 13 .206** .365** .446** .104 .269** .428** .364** .042 .176* .360“ .312“ .090 1.000 14 .348** .415** .455** .080 .326** .650** .370“ .136 .417“ .564“ .427** .255“ .456** 1.000 15 .022 .173* .269** .076 .280** .328** .223“ .002 .246“ .334“ .177* .146 .289“ .310“ 1.000 16 .182* .245** .219** .464** .148 .326** .126 .430** .159* .236** .084 .615“ .125 .181* .096 1.000 17 .287** .390** .446** .128 .318** .482** .212“ .188* .321“ .447“ .408** .160* 337“ .428“ .221“ .195* 1.000 18 .387** .492** .564** .117 .492** .734“ .452“ .181* .424“ .637“ .492“ .330“ .395“ .657“ .342“ .322** .562“ 1.000 19 .151 .361** .412** .010 .235** .397** .274“ -.048 .511“ .399** .261“ -.019 .294** .339“ .376“ .031“ .231“ .349“ 20 .312** .386** .485** .064 .420** .476** .452“ .081 .266“ .429** .316“ .095“ .397“ .374** .195* .104 .390“ .495“ * * Correlation is significant at the 0.01 level (2-tailed). 19 * Correlation is significant at the 0 1.000 .360*’ * 05 level (2-tailed). d 00 20 1.000 Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission. Table 6. Correlations between items in CES-D model for White sample. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 1.000 2 .230** 1.000 3 .437** .244“ 1.000 4 .004 .034 .046 1.000 5 .475** .167* .340“ .003 1.000 6 .405** .177“ .662“ .063 .340“ 1.000 7 .467** .295“ .478“ .074 .430“ .485“ 1.000 8 -.005 -.024 .059 .184“ .055 .086 .056 1.000 9 .268** .128 .321“ .042 .301“ .351“ .284“ .045 1.000 10 .332** .176“ .459“ .052 .300“ .419“ .352“ -044 .490** 1.000 11 .266** .159“ .356“ -.011 .358“ .417“ .393“ .036* .236** .324** 1.000 12 .075 .022 .027 .300“ .019 .078 .107 .561“ .026 .058 -.006 1.000 13 .307** .111 .304“ .028 .265“ .286“ .283“ -.028 .192“ .333“ .208“ .014 1.000 14 .335** .120 .595** .183“ .235“ .572“ .356“ -.029 .382“ .453** .327“ -.024 .325“ 1.000 15 .261** .105 .112 .128 .198“ .186“ .221“ -.117 .289“ .191“ .119 -.038 .246“ .239“ 1.000 16 .131 .076 . 135* .415“ -.013 .083 .206“ .367** .091 .138* .058 .654“ .055 .082 .011 1.000 17 .353** .204“ .401“ .015 .260“ .332“ .320“ -.009 .344** .384“ .216“ -017 .249“ .281“ .102 .046 1.000 18 .306** .151* .586“ .129 .180“ .561“ .363“ .009 .350“ .511“ .319“ .006 .278“ .592“ .128 .124 .551“ 1.000 19 .298“ .104 .258“ .050 .320“ .263“ .300“ .042 .531“ .457“ .175* .052 .165* .23“ .354“ .044 .269“ .243“ 20 .410“ .234** .400** .001 .450“ .499“ .585“ .048 .388“ .466** .408“ .121 .333“ .359“ .206** .165* .354“ .349“ ** Correlation is significant at the 0.01 level (2-tailed). * Correlation is significant at the 0.05 level (2-tailed). 20 1.000 Oi C O Reliabilities Reliabilities for each scale were calculated also. For the Hebert model, the alphas were .92 for both the Black and White caregiver samples. The reliability (alpha) was .88 for the Black caregivers, and .90 for the White caregivers for the Knight model. For the CES-D model, the .88 for the Black caregivers, and .85 for the White caregivers. Descriptive Analyses Descriptive analyses included T-tests, and chi-squares to look for significant differences on sociodemographic variables and the measures of interest. Though the main focus of this study is the CFA, descriptive analyses offer basic information about these samples of Black and White caregivers (see Table 7). The average age for the White caregivers was 62.88 and 57.48 for the Black caregivers. The two groups were significantly different regarding age. The majority of both groups of caregivers were female, and this characteristic did not vary significantly by race. The Black and White caregivers differed significantly on all three of their average total scores for burden. See Table 7 for comparison. The White caregivers’ mean scores were higher across all three calculations. Scores between 21 and 40 have been proposed as indicating mild to moderate burden in the 22-item scale (Hebert et., 2000). Both groups seem to fall into this group, with White caregivers right at the limit. The Black caregivers’ mean total score on the CES-D is 16.06. The 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. White caregivers’ mean score was 17.74. The two groups were not significantly different on their total scores according to a T-test on the means (t=1.57, p>.05). Sums of all 20 CES-D items (total scores) above 16 are considered indicators of clinical, depressive symptomatology, and the participants are a little above this threshold. See results in Table 7. Table 7. Demographic variables of Caregivers and Dependent Variables by race. White (n=225) Black (n=175) Background characteristics SD SD Chi-Souare T-test Female Caregivers 68.9% — 76.0% 2.47 Agea 62.88 (13.57) 57.48 (14.44) 3.83*** Self-reported depression -CESD 17.74 (9.64) 16.06 (11.31) 1.57 Self-reported burden -ZBI 22-item 40.51 (16.92) 33.82 (16.43) 3.97*** Hebert 22.65 (10.82) 18.45 (10.67) 3.88*** Knight 25.65 (11.02) 21.32 (10.47) 3.99*** Note: Categorical variable Female is reported with percentage and a chi-square test. Other entries reported as means and standard deviations in parentheses with t-tests. a White n-224, Black n=173 for age only. * p<.05 ** p<.01 **’ p<.001 Missing data To prepare the data for confirmatory factor analysis using AMOS 4.0, all cases with missing data were deleted. AMOS creates error messages when there are missing data. Only the chi square statistics are estimated and the rest of the statistics are not. For the ZBI, n=227 for the White caregivers and n=181 for the Black caregivers. Two (<1%) cases were deleted from the White caregiver file leaving an n of 225. Six cases (3%) we re deleted from the Black caregivers so n=175 for that group for the ZBI. For the CES-D, 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the Black caregivers so n=175 for that group for the ZBI. For the CES-D, n=227 for the White caregivers and n=185 for the Black caregivers. In total 18 cases (9.7%) were deleted from the Black CES-D file (n=167). Thirteen cases (5.7%) were deleted from the White CES-D file leaving an n of 214. There are several options on how to deal with the missing data depending on what is found. If there are a lot of missing data in a few particular cases, then cases may be deleted. However, if many cases have missing data, then imputation may be a better option in order to conserve data and keep a larger number of cases for more analytical power. The possibilities with imputation are to do multiple imputation before analyses or to allow the maximum likelihood estimation in the statistical program (AMOS) to the imputation automatically as part of the confirmatory factor analysis. When the amount of missing data is small, these two types of imputation are very similar. However, since there were enough cases so that deletion would still leave a sufficient sample size for each group, this method was used. Though some of the other methods are quite commonly used, the preference in this study was to use actual data from the participants, and not computer generated data. Callahan and Wolinsky (1994) discussed differences in factor structure between respondents with missing values that were imputed and respondents with complete data. Analyses Perspective Confirmatory factor analysis using AMOS 4.0 was the main analysis of 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the data. Each model was tested for goodness-of-fit and measurement invariance across the two race groups. The perspective taken with these analyses was somewhat a combination of strictly confirmatory (SC) and of alternative models (AM) as distinguished by Joreskog (Byrne, 2001). In the SC scenario, a researcher proposes a single model based on theory, collects data, and then tests the fit of the model to the sample data. Based on the results of this test, the researcher either rejects or accepts the model and no further modifications to the model are made. In using the AM perspective, more than one model is hypothesized, all with empirical or theoretical grounding, and after data analysis, one model is selected as most appropriate in representing the sample data (Byrne, 2001). In this study, more than one model was considered in the ZBI analyses, however the models were tested for fit to the data and no modifications to the models were made. The SC stance is not common since it can be very expensive to collect data then discontinue research after one rejected hypothesis. This was a secondary data analysis so the expense was less. Further, the purpose was to replicate the work of others who already generated models. The analyses were tests to see if their theories about how caregiver burden should be measured were supported by other caregiver samples and whether they applied just as well to one race group as they did another. As such, modifications to the structure of the models would not have been true 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. replication, but more like model generating. The only differences in models that were tested were the conditions of equality constraints for the Black and White caregiver samples. The number of items and factors remained the same throughout. ZBI Models The ZBI analyses considered the Hebert 12-item, 2-factor model and the Knight 14-item, 3-factor model. O’Rourke and Tuokko (2003b) found the Hebert model’s psychometrics to be better than several other models, and they mentioned the superior methodology applied by the authors who developed it. The Knight et al.’s (2000) model has been replicated more than any other in the literature and has generally fit moderately well. The original measure has 22 items but not all the items have loaded on factors previously in the literature. Hebert, Bravo and Preville (2000) tested several alternative factor structures for the Zarit Burden Inventory. They tested the fit of 1, 2, and 3- factor models first and then via exploratory factor analysis re-specified the 2- factor model based on modification index information. The end result was their 12-item, 2-factor version of the ZBI, which also included some correlated error terms. However, their articles was not specific regarding which terms were correlated. Consequently, this aspect of the model was not replicable and the Hebert model in this study did not include error term correlations. 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Knight, Fox, and Chou (2000) also examined the factor structure of the Zarit Burden Inventory. They tested the fit of a 1 -factor model and a previously developed 2-factor model via CFA, then they found a 3-factor model from an exploratory factor analysis. This 3-factor model was reconfirmed in another sample. The factors were non-orthogonal, but each item was loaded on one and only one factor. This model was their 14-item, 3-factor version of the ZBI. In their model, they set the two loadings of the Self-criticism factor equal because there were only two observed variables with a similar scale for that factor. Setting the loadings equal to one another was not possible with AMOS because it led to those two parameters being unidentified in the analysis. In the analyses for this study, one of these items was set to 1 to set the scale for that factor and to obtain an identified model. This was the version that was modeled here as the Knight model. CES-D Model The CES-D analyses examined the standard 20-item, 4-factor model. The scale developed by Radloff (1977) included all twenty items, but the 4- factor structure she proposed only included only 16 of those 20. Over time the other four items have been incorporated into the four factors, which have been replicated several times in several samples as described earlier in this paper. The model used here included all 20 items and four factors. The fit of this factor structure for the CES-D scale was analyzed in this study. 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Estimation and Fit criteria The estimation procedure employed in these analyses was maximum likelihood (ML) in the AMOS 4.0 program. There is increasing evidence that ML operates reasonably well even under less than optimal conditions, such as violations of analytical assumptions. Hence, though there was skewness in the data, it is recommended to use ML when distributions are not substantially nonnormal (West, Finch, & Curran, 1995). ML is the most widely used estimator and is recommended for reporting of results (Hoyle and Panter, 1995). It is also the default for AMOS 4.0 and is the estimator most often used in text examples. ML estimated best-fitting solutions for the models and evaluated the fit of the models. The main interest in structural equation modeling is the degree to which a hypothesized model adequately describes, or fits, the sample data. The evaluation of model adequacy should derive from various perspectives and based on several criteria (Byrne, 2001). Specifically, model assessment should focus on parameter estimates and the model as a whole. In terms of parameter estimates one would consider information such as the size and sign of the parameters, and their statistical significance. One can also check for excessively large or small standard errors, but there is no definitive criterion regarding standard error magnitude as the unit of measurement in the variables influences each. Problems with parameters are indicated by correlations >1, for example. Moreover, parameters with critical ratios 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. >| 1.961 are significant. These criteria were considered in the evaluation of the models. The chi-square goodness-of-fit statistic and three other supplemental fit indices - PGFI, CFI, and RMSEA- were used to examine the adequacy of the structural equation models as a whole. More than one index of fit was used as recommended by the literature (Byrne, 2001; Hu & Bentler, 1995). The chi-square is the conventional omnibus test of fit and assesses the amount of discrepancy between the sample and fitted covariance matrices (Hu & Bentler, 1995). Statistical significance is an indicator that the model may not be a good representation of the process that produced the data in the population. In this case, the model should be rejected. A nonsignificant chi-square indicates that the observed and estimated matrices are not statistically different. A nonsignificant chi-square with associated degrees of freedom is of interest to researchers as it implies that the data fit the model, though other models are possible that may fit the data. It has been suggested that the standard chi-square (chi-sq) statistic may not be enough to describe model adequacy since a significant chi-sq may be a sign of model misspecification, power of the test, or violation of some methodological assumptions underlying the estimation method. Therefore, several other fit indices supplement the chi-square findings. In general, indices of model fit fall into the categories of model fit (e.g., PGFI), 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. model comparison (e.g., CFI), or model parsimony (e.g., RMSEA; Schamcker and Lomax, 1996) The parsimony goodness-of-fit (PGFI) considers the complexity of the hypothesized model while assessing the goodness of fit of the model. In so doing, it offers a more realistic evaluation of the model (Byrne, 2001). Usually, parsimony-based indices have lower values than the generally accepted standard levels for other indices of fit. It has been suggested that parsimonious-fit indices in the .50s are not unexpected. The chi-sq tests the null hypothesis that the population covariance matrix equals the covariance matrix of the hypothesized model. This strict null hypothesis will almost surely not be exactly true because it is not possible for researchers to know all there is to know about the data (Hu & Bentler, 1995). If the population and the model covariance matrices are not exactly equal, and the null hypothesis is not true. Then the test statistic (T) for the chi-square will not be chi-square distributed. However, it may still be distributed as a “noncentral” chi-sq variate. The comparative fit index (CFI) measures the comparative reduction in lack of fit as estimated by the noncentral chi-sq of a hypothesized model versus a baseline model (Hoyle & Panter, 1995). The CFI is classified an incremental or comparative index of fit. The CFI is preferable to some of the other indices in this category because it values fall with in the “normed” range of 0 and 1. Originally a value of >.90 was advised as a cutoff for a well-fitting model, but recently a 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. value close to .95 has been proposed (Byrne, 2001). However, given the smaller sample sizes involved in the analyses here, the standard cutoff of >.90 was used. Fit criteria should be chosen because of their different approaches to the assessment of model adequacy and when the literature provides support for them as important indices that should be reported (Byrne, 2001). The root mean square error of approximation (RMSEA) is such an index. The RMSEA is basically a measure of lack of fit per degree of freedom (MacCallum, 1995). It considers the error of approximation in the population and expresses this discrepancy per degree of freedom. As such, it is sensitive to the complexity of the model. This index is one that takes into account the degree of disconfirmability of a model (i.e., the degree that it is possible for the model to be inconsistent with observed data), thus helping researchers not to conclude that models with more parameters are better simply because they fit data better (MacCallum, 1995). Values <.05 show good fit. Values up to .08 indicate reasonable errors of approximation in the population. Values between .08 and .10 indicate mediocre fit while those above .1 represent poor fit. Though some researchers recommend a value of .06 to show good fit, they also recognize that when sample size is small, the RMSEA has a tendency to overreject true population models (Byrne, 2001). The 90% confidence interval (Cl) around RMSEA addresses the imprecision of the estimate, and can be very helpful in making conclusions 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. about the precision of the RMSEA for a particular model. A narrow Cl argues for a more precise RMSEA. AMOS also tests for the closeness of fit with a p-value associated with the RMSEA when it is <.05. The p value for this test should be >.50. The chi-square statistic has limitations, such as sensitivity to sample size. As such, other fit indices are helpful in determining model fit. However, despite its problems, it is not acceptable to depend solely on fit indices alone (Byrne, 2001). It has also been proposed to look at chi-sq relative as a measure of fit and less as a test statistic. As such, relative to degrees of freedom, a small chi-square value indicates good fit, and a large one indicates bad fit (Byrne, 2001). The ratio of the chi-square statistic divided by its degrees of freedom is called the “relative chi-square” or normal chi- square. Some researchers allow values as large as 5 as being an adequate fit, but conservative use calls for rejecting models with relative chi-square greater than 2 or 3 (Garson, 2004). In these analyses, the more conservative standard was used. Preliminary single group analyses A traditional part of testing for factorial invariance is first considering a baseline model for each group separately. This model signifies the one that fits the data best from the parsimony and substantive meaningfulness perspectives. Baseline models are not necessarily expected to be exactly the same across groups as measuring instruments are often group specific in 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the way they operate (Byrne, 2001). Since the analyses in this study were confirmatory, and replications of previously studied models, there were no changes in the model made by group. The CES-D and ZBI models were analyzed for each group separately in part as a formality, and in part as a preview of the analyses when the groups are simultaneously analyzed. Hypothesis Testing Depending on the hypothesis, the range of tests on the models ranged from all free parameters to almost all constrained parameters. Sets of parameters are tested in an increasingly restrictive fashion. The orderly sequence of steps in the analyses is both necessary and strongly recommended in light of the univariate approach to testing the related hypotheses via the AMOS program (Byrne, 2001). For the first hypothesis, the statistical model was tested for both groups simultaneously without any constraints. Constraining the item regression weights tested the second hypothesis. The test for the last hypothesis was a model that had both item regression weights and the factor covariances constrained. The equality of the item error terms was not tested. Except in particular instances, when for example invariant reliability of a scale is of interest, the equality of error variances and covariances is of less interest and is considered an overly restrictive test of the data (Byrne, 2001). 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 3 ZBI Analyses Hebert et al. ZBI Model The two factors in this model were called Personal Strain and Role Strain. The first factor had three items (9, 17, and 18) and the second factor has nine items, or observed variables (2, 3, 6, 7, 10, 11, 22, 13, and 22). The factors were non-orthogonal, and none of the error terms were correlated. See the Hebert statistical model in Fig. 4. Single group analyses In the single group analysis for the Black caregivers, the chi-sq had a value of 118.432, with 53 degrees of freedom (df), and p < .001. The PGFI=.620, the CFI=,939 and the RMSEA= .084. The PGFI was above .5 and the CFI reached .90, both of which are used as cutoffs for good fit. The RMSEA was in the mediocre range. Taking all the indices together, the model seemed to be a decent fit to the data. See Fig. 5 for results. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4. The Hebert ZBI Model- briefer 12 item, 2-factor version. Personal Strain ) IZBI18 ZB112; ZBI10 e10 ZBI2 Role Strain ZBI22" ZBI6 e6 ZBI7 ZBI3 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5. The single group analysis for the Black caregivers for the Hebert model Zbi17 1.06 1.00 .73 i Personal Strain ) zbi9 'r* .82 .76 - ► zbi 18 e18 .99 ie13 zbi 13 .79 1.00 1.59 .74 e1 2} zbi 12 f 1.64 .41 .58 1.53 .63, 1.41 Role Strain 1.55 .53 e22 zbi22 .98 .76 zbi6 .73 1.38 zbi7 e7 .59 zbi3 f Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In contrast, the covariance matrix for the two factors was not positive definite. The sample covariance matrix being not positive definite is a common problem, and is usually the result of some observed variables being perfectly predictable by others. Since the inverse of the sample covariance matrix is needed to compute estimators, no solutions can be obtained from the estimation process when variables are linearly dependent (Chou & Bentler, 1995). In this case, the matrix is not the sample covariance matrix, but the factor covariance matrix. Extrapolating from the sample covariance explanation from above, it was thought that maybe the problem was that the factors were too correlated and an inverse of their matrix could not be obtained in the estimation procedure. In looking at the output for the unstandardized factor score weights, it was found that many of the items loaded almost equally as well on both factors (see Table 8). This may be why this matrix was not positive definite. Table 8. Unstandardized factor weights for the Hebert model for the Black sample zbi 17 zbi6 zbi2 zbi 18 zbi9 zbi 12 zbi22 Personal Strain 0.082 0.051 0.088 0.058 0.051 0.087 0.114 Role Strain 0.114 0.057 0.098 0.081 0.071 0.097 0.127 zbi3 zbi7 zbi 10 zbi 11 zbi 13 Personal Strain 0.093 0.021 0.103 0.079 0.040 Role Strain 0.104 0.023 0.114 0.088 0.044 When the model was tested for the Black caregivers without the factor covariance, all of the matrices were positive definite, but the fit was much worse (chi-sq=329.187, df=54, pc.Q01; PGFI=.572; CFI=.745; RMSEA=.171). Further, the modification index suggested that adding the 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. factor covariance would change the chi-sq by 120.901 units and estimated parameters by .832. Finally, since the factors in the Hebert et al. (2000) article were non-orthogonal, removing the factor covariance would be a modification of the model. One of the goals of the analyses in this study is to stay as true to the published models as possible. Hebert et al. did mention that there were correlated error terms. Yet, they did not say which errors terms were correlated, and only mentioned correlations they removed. Since error term correlation information was unavailable, none of the error terms were correlated in the Hebert models analyzed here. For the White caregivers, the chi-sq was slightly smaller (112.930) for the same degrees of freedom (53) and same p-value (<.001). All of the supplemental indices were better for the White caregivers. The RMSEA was .071, which fell in the range of reasonable errors of approximation. The PGFI and CFI were .625 and .957 respectively. Both indicated adequate fit. The indices taken together suggested that the model fit the data and was a better fit than that for the Black caregivers. See Fig. 6 for results. The results for the Black sample must be viewed cautiously since the estimation process was not completed. 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6. The single group analysis for the White caregivers for the Hebert model .63, zbi 17 e17} .93 .00 .65 .88 I Personal Strain ) zbi9 .86 .87 - ► i zbi 18 e18 1.04 e ia zbi13 .82 .80 1.00 zbil 1 1.19 .80 *1 zbi 12 1.07 .75 .76 zbi 10 1£3 e10 .50, 1.04 Role Strain zbi2 1.08 .47 e22 *j zbi22 1.08 .72 .89 .59 1.33, e7 .62 zbi3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Results bv hypotheses Hypothesis 1 In this analysis, both the Black and the White caregiver samples were analyzed simultaneously for model adequacy. The chi-sq goodness of fit test was 231.384 for 106 degrees of freedom with a p-value <.001. The Black group’s covariance matrix was not positive definite. The regression weights and their standard errors for the Black sample were fine. All of them were significant according to their critical ratios (CR). The unstandardized estimates are in Figure 7a. The main problem was that the correlation between the personal strain and role strain was 1.007 (this is less apparent in the figure which shows unstandardized parameters with the covariance, not the standardized correlation). The parameters for the White sample seemed to be in order regarding regression weight estimates. There were no negative variances, and all parameters had significant critical ratios. All matrices were positive definite. The correlation between the two factors (.977) was very high for this group also. The RMSEA at .055 fell in the range of reasonable errors of approximation, but was close to good fit. The PGFI and CFI and were .620 and .949 respectively, indicating adequate fit. The indices taken together suggested that the model fit the data. See Fig. 7b for results. This model acted as the baseline model against which the other models were compared. 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 7a. Unstandardized estimates for Black caregivers in two-group analyses with no constraints per Hyp. 1. e17r-*ZBI1 e9 I—^ZBI9 Personal Strain e18;— »£BI1 e13;^ZBI1 e1lV^ B l l j e12— ►ZBI12 e10— ►ZBI10 e2 — *1 ZBI2 Role Strain e6 ; — h ZBI6 e l ) — « H ZBI7 5 9 /^ A 1 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 7b. Unstandardized estimates for White caregivers in two-group analyses with no constraints per Hyp. 1. .6 3 . .65. e 9 j—► Z B I9k^/ Personal Strain e 18j— ►ZBH 1 . 0 4 / N 1 'e13)~~%BI1 e11- • * > ZB!1 % e12,— ►ZBI12 e10/— ► ZB110 Role Strain 47e22— HZBI22 .72 e6 ) — H ZBI6 e l ^ ZBI7 .62^ \ 1 r .e3 ! > ZBI3 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The indices pointed toward the number of factors for the Hebert model being equivalent across groups, but due to the not positive definite covariance matrix this conclusion cannot be accepted completely. This solution was unstable. The rest of the analyses for the other hypotheses were done to see what trends could be observed and perhaps insight into what outcomes would be if the covariance matrix were positive definite. Hypothesis 2 Again, both the Black and the White caregiver samples were analyzed simultaneously, but in this analysis the factor loadings were constrained across the two groups. Constraining the factor loadings within a model in AMOS automatically makes the item composition of the factors, and the rank order and magnitude of the loadings equal. The model was analyzed under these conditions for both groups to test for fit. Generally, there was nothing wrong with the parameters for both samples. The estimates were significant (per CR) and their direction and magnitude appeared satisfactory. The chi- sq goodness of fit test was 254.606 for 116 degrees of freedom with a p- value <.001. The degrees of freedom increased by 10 when the item regression weights were constrained equal across groups (10 item constrained equal and 2 still fixed at 1 to set factor scale). The unstandardized estimates are in Figures 8a and 8b. 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 8a. Unstandardized estimates for Black caregivers in two-group analyses with factor loadings constrained per Hyp. 2. .66 1.00 .98 .76 e9 ZBI9 I Personal Strain ) .76 .84 .98, 1.00 .79 {e -| -| r ^ Z B M 1 k 1.34 .76 .76 .6 0 /— X1 ; ------------ e1Q VZBI10 .64/— . 1.22 .58 1.18 ZBI2 Role Strain 1.26 .54, e22j—^76122!“ 1.05 .76' ZBI6 .65 ZBI7 1.08 .61 e3 ► ZBI3 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 8b. Unstandardized estimates for White caregivers in two-group analyses with factor loadings constrained per Hyp. 2. .62 e 1 7 h ^ B I1 e9,M ZB!9 Personal Strain ZB118| e l3 -» jZBI1 1.34 e12r>ZBI12 1 e10)— *iZBI1 Role Strain e22i— H ZB I22K ^ .6 2 /0 , 1 .74 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Again, the covariance matrix for the Black caregiver sample was not positive definite. The problem again was that the correlation between the personal strain and role strain for the Black sample was greater than one (1.012). The correlation between the two factors was .977 for the White caregiver group. The PGFI was .672 and CFI was .944. Both supported model adequacy. The RMSEA remained at .055. The indices taken together suggested that the model fit the data. The Hebert model seemed equivalent in terms of factor composition and loadings across groups, per the fit indices. However, since the covariance matrix was not positive definite, the finding was not entirely acceptable. This solution found by the estimation process was unstable as part of the process was unable to be done. Table 9 shows the comparison between the model tested with no constraints and this one. This model had a significantly larger chi-sq as compared to the baseline model. This was evidence of noninvariance. Hypothesis 3 To test the third hypothesis, the Hebert model was analyzed for the Black and the White caregiver samples simultaneously, while the factor loadings and the covariance between the two factors were constrained across the two groups. Overall, the parameters for both samples seemed reasonable. The chi-sq goodness of fit test was 254.657 for 117 degrees of freedom with a p-value <.001. The degrees of freedom increased by 1 when 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the factor covariance was constrained equal across groups. The unstandardized estimates are in Figures 9a and 9b. The covariance matrix for the Black caregiver sample was still not positive definite. The correlation between the personal strain and role strain for the Black sample was 1.012 as in the last analysis. The correlation between the two factors was .978 for the White caregiver group. The PGFI was .678 and CFI was .944. Both supported model adequacy. The RMSEA was .054. The indices taken together suggested that the model fit the data. The Hebert model seemed equivalent in terms of factor composition and loadings across groups, per the fit indices. However, since the covariance matrix was not positive definite, this finding is not entirely acceptable. This solution was unstable as the estimation process was unfinished. Table 9 shows the comparison between the model tested with no constraints and this one. Again, there seemed to be evidence of noninvariance since this constrained model had a significantly worse chi-sq than the baseline model. 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 9a. Unstandardized estimates for Black caregivers in Hebert two- group analyses with factor loadings and factor covariance constrained per Hyp. 3. .67 r ^ 1--------- e17 «-ZBI1 e9 ► ZBI9 Personal Strain e l^i—►ZB118 el3 VZBI1 e11 ►ZBI11* 1.34 ie12H*ZBf12 elOrHZBMO e2 j— * ZBI2 j* Role Strain 76( e 6 M Z B I 6 ZBI3 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 9b. Unstandardized estimates for White caregivers in Hebert two- group analyses with factor loadings and factor covariance constrained per Hyp. 3. .67 (e lT M L z B U T 1 e9 * > ZBI9 Personal Strain ZBI18 e12rHZBI12 ■ 60/T > , 1 elO H ZBH O ^l-2 2 e2 f-^i ZBI2 Role Strain e22 ' ->ZBi22 76( e6 1 « > ZBI 6 1 ,3 7 e7 ~1>ZB!7 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 9. Comparisons of goodness of fit of ZBI and CES-D models. F it In d e x In fo Difference in Chi A n a ly s e s Chi Sa df Ratio (Chi/df) PGFI CFI RMSEA Sg Chi, df, p-value ZBIBSaH 118.432 53 2.234566 0.620 0.939 0.084 ZBIBSaH2 329.187 54 6.096056 0.572 0.745 0.171 ZBIWSaH 112.93 53 2.13075 0.625 0.957 0.071 ZBI2GaH 231.384 106 2.182868 0.620 0.949 0.055 Baseline model ZBI2GbH 254.606 116 2.194879 0.672 0.944 0.055 23.22, 10, <.01 ZBI2GcH 254.657 117 2.176556 0.678 0.944 0.054 23.27,11, <.05 ZBI2GdH 255.703 119 2.148765 0.689 0.944 0.054 24.31, 13, <.05 ZBIBSaK 174.747 74 2.361446 0.614 0.896 0.088 ZBIWSaK 202.066 74 2.730622 0.621 0.908 0.088 ZBI2GaK 376.825 148 2.546115 0.618 0.903 0.062 Baseline model ZBI2GbK 391.876 159 2.464629 0.661 0.901 0.061 15.05,11, >.05 ZBI2GcK 393.931 162 2.431673 0.673 0.902 0.060 17.11, 14, >.05 ZBI2GdK 399.241 165 2.419642 0.684 0.901 0.060 22.41, 17, >.05 CESDBSa 249.253 164 1.519835 0.684 0.959 0.056 CESDWSa 362.149 164 2.208226 0.662 0.927 0.075 CESD2Ga 611.377 328 1.863954 0.671 0.941 0.048 Baseline model CESD2Gb 630.128 344 1.831767 0.701 0.941 0.047 18.75, 16, >.05 CESD2Gc 636.046 350 1.817274 0.712 0.941 0.046 24.67, 22, >.05 CESD2Gd 643.785 354 1.818602 0.719 0.940 0.046 32.41, 26, >.05 Legend ZBI= burden scale CESD= depression scale BS= Black caregivers only, single group WS= White caregivers only 2G=Both groups tested at same time a= no constraints b=item regression wts constrained c=item regression wts and factor covariances constrained d=item regression wts and factor covariances and factor variances constrained K=Knight et al. ZBI model-14 items, 3 factors H=Hebert et al. ZBI model -12 items, 2 factors H2=Hebert model without factor covariance included at all Post hoc analysis Though not specifically hypothesized, constraining the factor variances is sometimes a part of testing for invariance. Therefore a fourth, post hoc analysis including all the constraints from hypothesis three testing along with factor variance constraints was done. The findings were 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. somewhat surprising considering the previous tests. In this fourth analysis, the factor covariance matrix for the Black sample was positive definite. The correlation between the two factors was still high at .992. However, it would seem that a slight attenuation in the factor covariance for the Black sample occurred when the variances for the factors were constrained equal between the two samples, and enough so that the covariance matrix became positive definite. Thus, the estimation process was completed allowing for stable findings. All parameters were in the correct direction and significant. The chi-sq was 255.703 with 119 df and P<.001. The PGFI was .689 and the CFI was .944 indicating model fit. The RMSEA was .054 indicating reasonable parsimony. All in all, this model seemed to be a decent to good fit to the data. Summary of findings The models were adequate across all three analyses in support of model fit. However, the covariance matrices for the factors for the Black caregiver sample were not positive definite across all three analyses. As such, the results were not stable. The models for hypotheses 2 and 3 did not show improved fit via added constraints. As often happens, the chi-sq increased with the increase in degrees of freedom. Parsimony increased, but fit decreased. The tradeoff for parsimony was not a good one since the increases in chi-sq were 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. significant for both of the constrained models as compared to the unconstrained, or baseline, model. This result suggested that the Hebert model was noninvariant (not metrically equivalent) across the groups of Black and White caregivers. The fourth (unhypothesized) model’s parsimony for fit tradeoff also did not produce a better result. Yet, full estimation was achieved for the fourth model, and the supplemental fit indices pointed toward model adequacy. Given the other findings where the models were not fully estimated, these results may have been spurious. Taking all the findings together, the Hebert model was unstable in the Black sample and was noninvariant across samples. Overall, these findings are speculative and must be considered with caution because the estimation process was incomplete. Knight et al. ZBI Model The three factors in this model were called Embarrassment/Anger, Patient Dependency, and Self-criticism. The first factor had eight items (4, 5, 6, 9, 10, 11, 13, and 18), the second factor has four items (2, 8, 12, and 14), and the third factor has two items (20 and 21). The model had correlated factors, but no correlated item error terms. None of the error terms were correlated in the model in the original article. See the Knight statistical model in Fig. 10. 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 10. Knight ZBI model - briefer 14-item, 3-factor version. ZBI5 ZBI6 ZBI9 mbarrassment/Ange ZBI13f e13 e18 ZBI18 ZBI2' ZBI8 itient's dependen ZBI12f e12 e14)— HZBI14i ZBI20' e20 Self-criticism 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Single group analyses For the Black caregivers, the chi-sq had a value of 174.747, with 74 degrees of freedom (df), and p < .001. The PGFI=.614, the CFI=.896 and the RMSEA= .088. The PGFI did reach .5, but the CFI did not reach .90. The CFI, which has been found to be quite robust to sample size with ML estimation (Hu & Bentler, 1995), was very close though. The RMSEA was in the mediocre range. The model did not fit the data well, but seemed to be a fair representation of the data. See Fig. 11 for results. All matrices were positive definite for the Black sample for this model. When looking at the single group analysis for the White caregivers, the chi-sq was even larger (202.066) for the same degrees of freedom (74) and same p-value (<.001). The RMSEA was the same as that for the Black sample at .088, again showing mediocre fit. The PGFI and CFI were higher for the White sample though, at .621 and .908 and respectively. Both the PGFI and the CFI indicated adequate fit. The indices taken together suggested that the model was a mediocre fit, but the fit appeared to be slightly better than that for the Black caregivers. The White sample size was some larger than that for the Black group (n=225 and 175 respectively). See Fig. 12 for results. 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 11. The single group analysis for the Black caregivers for the Knight model. e4 )—►ZBI4 e5 t *ZBI5 e6 j— HZBI6 e 9 /-^ Z B |9 mbarrassment/Ange’ r ■ 6 0 /" A 1 , e10;—^B I1 ZB 11 e13 ^ZBI13 e18)— *HZBI18 e2 j-HZB 2 e8 ^ZBIS^ -59 Patients dependency e12J— ►IZBI12 1.5 9 /V > , 1 e14r^ZBI14 820 ^ *>26120 Self-criticism e21r^ZB I21 .18 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 12. The single group analysis for the White caregivers for the Knight model. .62/—x 1 i e4 ; — HZBI4 eo) —HZBI5 eoy1 —^ZBI6 1 e9,—*ZBI9 mbarrassment/Ange e i ® — * £ b h ZBI13 e18— HZBI18 e2)~*iZBI2 Patients dependency HZBI12 ZB 14 .39 e20!— ^HZBI20 Self-criticism e21— *ZBI21 1.53 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Results by hypotheses Hypothesis 1 To test the first hypothesis both the Black and the White caregiver samples were analyzed simultaneously for model adequacy. The model seemed to be a viable one as all of the items and error estimates differed significantly from zero (CR values ranged from 3.22 to 11.28). Unlike the Hebert model, all of the covariance matrices were positive definite for the Black sample. The estimation process was complete for both groups of caregivers. The covariances among the three factors were somewhat larger for the White group than the Black one. For both groups, the strongest relationship was between the two factors Embarrassment/Anger and Patient’s Dependency. The weakest covariance for both samples was between Patient’s Dependency and Self-criticism per CR’s and standardized estimates (correlations). The chi-sq statistic did not indicate good fit (376.825, df=148, p<.001). The PGFI was .618, and the CFI was .903. The PGFI supported model fit. The CFI suggested that this model was better than an independence model where all correlations between variables are zero, or in other words, there was a reduction in lack of fit using this model vs. an independence model. The RMSEA was good at .062 and fell within a confidence interval that ranged from .055 to .070. This RMSEA can be reported with 90% 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. confidence that the true value falls within those bounds, which represents a good degree of precision. Since the RMSEA was over .05 however, the p value associated with it did not apply. Altogether, the model seemed to be adequate. See the results in Fig. 13a and 13b. Given these findings, the first hypothesis appeared to be supported with the number of factors for the Knight model being equivalent across groups. Hypothesis 2 The Black and the White caregiver samples were analyzed simultaneously as before, but this time the factor loadings were constrained across the two groups. Generally, the parameters for both samples were normal. The estimates were significant (per CR) and their direction and magnitude appeared satisfactory. The chi-sq goodness of fit test was 391.876 for 159 degrees of freedom with a p-value <.001. The degrees of freedom increased by 11 when the item regression weights were constrained equal across groups (11 item constrained equal and 3 still fixed at 1 to set factor scale). Again, all matrices were positive definite for both groups. The correlation between Embarrassment/Anger and Patient’s Dependency was again the highest among the factor correlations for both the Black and the White caregivers (.835 and .901, respectively). The unstandardized estimates for this model are in Figures 14a and 14b. 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 13a. Unstandardized estimates for Knight model with Black caregivers in two-group analyses with no constraints per Hyp. 1. e6 r~ « H ZBS6 e9 ZBI9 mbarrassment/Ange ^ H z e m e13)— ® ^Z B 113 .79 e18h>ZBI18 Patient dependency e 14;—► iZB114 Self-criticism e21 F^iZB 21 1.01 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 13b. Unstandardized estimates for Knight model with White caregivers in two-group analyses with no constraints per Hyp. 1. e 4 ) ^ZBl4 mbarrassment/Ange ve11r^jZBI11 99 ^ 1 ZB113 e18^ZBI18 Patient's dependency e 1 2 h ^ Z B H 2 ^ ^ 88 ZB 1 14 1.02 Self-criticism e20h-*ZBI2Q © 21 i *ZB 21 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 14a. Unstandardized estimates for Knight model with Black caregivers in two-group analyses with factor loadings constrained per Hyp. 2. .73, e4 BI4 .49 1.00 ‘ZBI5 1.15 .63 ZBI6 k 1.38 \ \ ^ 34 ^ 1 .3 8 1.5iJSmbarrassment/Anger .71 BI9 .65 5110 1.61 .90 1.33 .90 .48 1.34 .81 .19 .52 ’ZBI2 1.03 1.00 .99 .45 ■ZBI8 .59 1.10 itient's dependenc .87 1.60 .20 5114 .41 .90 ■ZBI20 1.00 Self-criticism .86 .74 BI21 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 14b. Unstandardized estimates for Knight model with White caregivers in two-group analyses with factor loadings constrained per Hyp. 2. ■ 620 1 — I e 4 /^ Z B i4 e5 h-*jZBI5Nk 1.15 e9 r^Z B I9 1 1.5i^^m barrassm ent/Ang( 1.61. 1.01 ZBI11 e13MZBI13 e18;^ZBI18 tients dependen e l 2 M Z B I 1 2 e14j-HZBI14 e20H^ZBI20 Self-criticism e21 / — ►ZB 21 1.53, 100 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. The PGFI was .661 and the CFI was .901. The PGFI and the CFI uphold the assumption of model adequacy. The RMSEA remained almost the same as in the previous analysis at .061. The indices taken together suggested that the model did fit the data. The Knight model seemed equivalent in terms of factor composition and loadings across groups, per the fit indices. Table 9 shows the comparison between the model tested with no constraints and this one. This model’s chi-sq was not significantly greater than the chi-sq for the baseline model. This finding supported invariance in the Knight model. Hypothesis 3 For the third analysis of the Knight model the factor loadings and the covariances between the three factors were constrained across the two groups. Overall, the parameters for both samples seemed reasonable. They were all significant and in the right direction. The chi-sq goodness of fit test was 393.931 for 162 degrees of freedom with a p-value <.001. The degrees of freedom increased by 3 when the factor covariances were constrained equal across groups. The unstandardized estimates are in Figures 15a and 15b. 101 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. Figure 15a. Unstandardized estimates for Black caregivers in Knight two- group analyses with factor loadings and factor covariance constrained per Hyp. 3. .73, .49 1.00 e5 .63 1.38 .34 .71 1.38 ZBI9 mbarrassment/Ange 1.51 .65 e1Q BI1 1.61 .90 1.33 .90 .48' 1.34 e18 ZBI18 .52 ZBI2 e2 .19 1.03 1.00 .99 .59 1.10 itient's dependen ZBI12 e12 .87 1.60 .20 \ e14 ZBI14 .41 .90 1.00 Self-criticism .86 .74 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 15b. Unstandardized estimates for White caregivers in Knight two- group analyses with factor loadings and factor covariance constrained per Hyp. 3. .62/ e 4 r-*iZ B I4 k e5 HZBI5 e6 —*ZBI6 e 9 h ^ Z B l9 ^ :38 1 , 1 elQ- »ZB!10* mbarrassment/Ange] 1.01 ZB 11 e13h*ZBI13 018MZBI18 e2 —HZBI2 e8 h-^:ZB 8 tients dependen e l2 r^ Z B I1 2 e14j— ♦iZBIM e20h^ZBI20 Self-criticism e21 H^ZB 21 1.53, 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The PGFI was .673 and CFI was .902. Again, both the PGFI and the CFI supported model fit. The RMSEA was almost the same as in the previous analyses at .060. Altogether the indices were evidence for adequate model fit. The relationships between the factors of each measure seemed to be equivalent for the Knight model. Table 9 shows the comparison between the baseline model and this one. Again, there is evidence of invariance for this model. Post hoc analysis An analysis constraining the factor variances is was also done with the Knight model. There were no surprises in this analysis. As in the other three Knight model analyses, the estimation process was completed, and all parameters were significant and in the correct direction. The chi-sq was 399.241 with 165 df and p<.001, The PGFI was .684 and the CFI was .901. Both indices showed model fit. The RMSEA remained at .060 representing reasonable errors of approximation in the population. This model seemed to fit to the data. Summary of findings for hypotheses In general, the models’ fits were adequate. The PGFIs were consistently above .05, the CFIs were at .90, and the RMSEA values were above mediocre and closer to good fit across all three analyses. All models had complete estimation. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The models for hypotheses 2 and 3 did show improved fit via added constraints. The chi-square statistics also increased in these analyses as in the Hebert model analyses. In contrast, the tradeoffs for parsimony were good ones since the increases in chi-sq were not significant for both of the constrained models as compared to the baseline model. Like the other two, the fourth (unhypothesized) model’s parsimony for fit tradeoff also produced a better result. These results were evidence in favor of metric invariance regarding the number and composition of factors, the factor loadings, the covariance between the factors, and the factor variances. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 4 CESD Analyses The four factors in this model were called Depressed Affect, Somatic and Retarded Activity, Positive Affect and Interpersonal. The CES-D was analyzed using all 20 items. The first factor had seven items (3, 6, 9, 10, 14, 17, and 18). The second factor had seven items (1, 2, 5, 7, 11, 13, and 20). The third factor had four observed variables (4, 8, 12, and 16), and the fourth had two (15 and 17). The four factors were correlated in the model, but none of the error terms were. See the CES-D statistical model in Fig. 16. The model was tested to see if measurement invariance held for each race group. Single group analyses For the Black caregivers, the chi-sq had a value of 249.253, with 164 degrees of freedom (df), and p < .001. The parameters looked in order. They were all significant, and in the right direction. The variances were all positive and none of the correlations were greater than 1. The PGFI=.684, the CFI=.959 and the RMSEA= .056. The PGFI supported adequate fit. The CFI, which seems to be quite robust to sample size, was above the .90 standard, and surpassed the more stringent .95 cutoff. The RMSEA was in the range of reasonable errors of approximation. The model seemed to be a good representation of the data. See Fig. 17 for results. All matrices were positive definite for the Black sample for this model. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 16. CES-D model - 20 item, 4-factor model e3. ►CESD e6 1 •CESD0' e9 1>CESD9* e10 : *CESD10* e14— *CESD14* 017 ! *CESD1 018 C E S D 1 o1 '►CESD1 Depressed Affect c2 1 *CESD ► CESD5- e7 1»CESD7« e n ^ fc E S D I o13 ' *CESD1 Somatic and A Retarded Activity 20 1 *CESD2 e4 1*CESD4* [e8J ^ E S D ^ * j o12 ' *CESD1 e16 ' C ESD 1 i,e15: — ^-*dESD1 $* Positive Affect Interpersonal Reproduced with permission of the copyright owner. Further reproduction prohibited without permission Figure 17. The single group analysis for the Black caregivers for the Standard CES-D model. ■ 45(e3V^“-*C ES D .33/ .32, 1 . e 6 j— ~*CESD e 9 ) ------► .28/ 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ no / >QESD1(K 5 Depressed Affect “ 8e14 1 *CESD1 e17 ' ► CESD17- e1 ► CESD1 .50 CESD2 Somatic and Retarded Activity CESD11 v ) l' e13 ’ »CESD1 58 e20— 'CESD2 ve4y— ►CESD4j Positive Affect e s H - ^ E S D i * - ^ ' 1 95 r 11---------------1 Interpersonal 1.26 (e^9^~-*CESD1 108 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. When looking at the single group analysis for the White caregivers, the chi-sq was larger (362.149) for the same degrees of freedom (164) and same p-value (<.001). The parameters were all significant, and in the right direction. The RMSEA was higher than that for the Black sample at .075, but still showed decent fit. The PGFI and CFI were lower for the White sample at .662 and .927 respectively, but both indicated good fit. The indices taken together suggested that the model was an acceptable fit, but the fit appeared to be somewhat worse than that for the Black caregivers. The White sample size was larger than that for the Black group (n=214 and 167 respectively). See Fig. 18 for results. Results by hypotheses Hypothesis 1 To test the hypothesis that the number of factors for this scale is equivalent across groups, both the Black and the White caregiver samples were analyzed simultaneously for model adequacy. The model seemed to be a viable one as all of the items and error estimates differed significantly from zero. All of the covariance matrices were positive definite for both samples. The estimation process was complete for both groups of caregivers. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 18. The single group analysis for the White caregivers for the Standard CES-D model. ■ 39( g } - :U jC E S D ■ 45^ \ 1 , -------- (e6 j *QESD . 3 8 / x * i (e 9 ) ®iCESD . 3 5 ^ . 1 1 ESD1(>J Depressed Affect ■ 510 1 K e7j 1.01X I 1 11 p c r c n 7 J ' Somatic and ; / O l O U / - - - - - - - - - - - - - - - - .91 y Retarded Activity / / i HCESD1 \e4j— » iCESD4 CESD8* ' " e12 >CESD1 V ^ ^ e s d T • 2| ^ p esD 1^ interpersonal 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The chi-sq statistic did not indicate good fit (611.377, df=328, p<.001). The PGFI was .671, and the CFI was .941. The PGFI provided support for model fit, and the CFI showed that there was a reduction in lack of fit using this model vs. an independence model. The RMSEA was .048 and fell within a confidence interval that ranged from .042 to .054. This RMSEA was reported with 90% confidence that the true value fell within those bounds, which represented a good degree of precision. The probability value associated with a test of close fit for the RMSEA in the population was .731 (>.50). Altogether, the model seemed to be adequate. See the results in Fig. 19a and 19b. Given these findings, the first hypothesis was supported with the number of factors for this CES-D model being equivalent across groups. R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. Figure 19a. Unstandardized estimates for CES-D model with Black caregivers in two-group analyses with no constraints per Hyp. 1. ■45/O n 1 e3 -♦peso .33/— 1 --------- '.ee)— *CESD .32. 1 v e 9 WO C E S D ^ -65 ■28/— x 1 , --------------- , .83 3/'"''— "X 1 R R ' (e1Q o~H C E S D 1 --------- Depressed Affect ►CESD Somatic and Retarded Activity / ^ C E S D i 1 e13— ►CESD1 [ ^ e20 1 »CESD2 PESD4 Positive Affect ^ Q E S D 1 2 %m e1& 1 <CESD1 X e S D IS Interpersonal 1.26, R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. Figure 19b. Unstandardized estimates for CES-D model with White caregivers in two-group analyses with no constraints per Hyp. 1. •'*0 3 *CESD e6)— ^ E S D ►CESD Depressed Affect ►CESD1 e18 ^CESDt f f r - ^ E S D I 1 .0 0 I -5 9 \ i ; C D C O .51 X"X 1 e S r ^ M C E S D Somatic and Retarded Activity CESD7* ►CESD1 e20-- ►CESD2 Positive Affect ^1^— ► CESD1 ,1 5 > < t ...................1.1 Interpersonal 113 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. Hypothesis 2 Both caregiver samples were analyzed simultaneously as in the analysis for the first hypothesis, but this time the factor loadings were constrained across the two groups. The parameters for both samples were normal. The estimates were significant (per CR) and their direction and magnitude appeared satisfactory. The chi-sq goodness of fit test was 630.128 for 344 degrees of freedom with a p-value <.001. The degrees of freedom increased by 16 when the item regression weights were constrained equal across groups (16 item constrained equal and 4 still fixed at 1 to set factor scales). All matrices were positive definite for both groups. The unstandardized estimates for this model are in Figures 20a and 20b. The PGFI was .701 and CFI was .941. Both the PGFI and the CFI upheld the assumption that the model fits the data. The RMSEA remained almost the same as in the previous analysis at .047. The Cl for the RMSEA was .041 to .053, and the test of closeness had a p-value of .831. The indices taken together suggested that the model fit the data. The CES-D model seemed equivalent in terms of factor composition and loadings across groups, per the fit indices. Table 9 shows the comparison between the baseline model and this one. The findings provide support for invariance across race groups for the standard CES-D model. 114 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. Figure 20a. Unstandardized estimates for the CES-D model with Black caregivers in two-group analyses with factor loadings constrained per Hyp. 2. Affect o10 C E S D 1& ’ CESD1 e2 - CESD .66/''"^, - | ----------- ^1CESD7- oomau Retarded 1 •CESD1 ■ ^ ^ E S D I ^► CESD4- •94@ ,55@ j ! '>bESD1 '51e i6 -J!*CESDT •34/^7A 1 i ------------- 1.00 * C E S D 1 5 ^ - 1.06 Positive Affect ) 115 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. Figure 20b, Unstandardized estimates for the CES-D model with Black caregivers in two-group analyses with factor loadings constrained per Hyp. 2. ■ 390 1 : ■ 36/0 - 1 ■ % 1 ( 3 ) —HeESD10 Depressed Affect e1y ►CESDI Somatic and Retarded Activity ► p E S D § ^ ^ ( Positive Affect Interpersonal 10e11 ' »CESD1 M i — ►CESD '90O .41 .45 .22 .16 £ j V - » b E S D 1 e16 1 *CESD1 015 '«CESD1 i V - ^ E S D I 116 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. Hypothesis 3 For the third analysis of the CES-D model, the factor loadings and the covariances between the three factors were constrained across the two groups. The parameters for both samples were all significant and in the right direction. The chi-sq goodness of fit test was 636.046 for 350 degrees of freedom with a p-value <.001. The degrees of freedom increased by 6 when the factor covariances were constrained equal across groups. The correlation between Depressed Affect and Somatic and Retarded Activity was again the highest among the factor correlations for both the Black and the White caregivers. The correlation between Depressed Affect and Positive Affect was the weakest. The unstandardized estimates are in Figures 21a and 21b. The PGFI was .712, and the CFI remained the same at .941. The two indices indicated model fit. The RMSEA was good at .046. The Cl ranged from .041 to .052, and the test of closeness was .845. Altogether the indices were evidence for model fit. The relationships between the four factors seemed to be equivalent for the CES-D model. Table 9 shows the comparison between this model and the model without constraints. Invariance was shown for this model. 117 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. Figure 21a. Unstandardized estimates for Black caregivers in CES-D two- group analyses with factor loadings and factor covariance constrained per Hyp. 3. e3 ’ -CESD ■34/-> 1 .32 e6/ C ESD e9,: — ►OESD_j •27@ - !* 6 e s d i o * • 49@ j L» Ce s d i Depressed Affect e 2 j — *CESD ,65/^x 1 Somatic and . 8 3 / ^ 1 r i^eTj »iC E S D 7^ —^ Retarded Activity ° @ i ^ C E S D 1 .58 i 1.24 ESD2 { B 4 j — HCESD . 9 4 0 1 ie8 .56, .51 .33 e12 ' *CESD1 1 »CESD1 ^^C e s d T ^ *— ! Positive Affect Interpersonal 118 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. Figure 21b. Unstandardized estimates for White caregivers in CES-D two- group analyses with factor loadings and factor covariance constrained per Hyp. 3. 39 o3 ' C E S D 44 c-6 ' *CESD6* ' .39 ^— n - | ------------ (e9)— ►CESD Depressed Affect ESD1CH 10*C ESD 1 ei; ►CESDI e2 »CESD 51^ 1 Somatic and Retarded Activity — Hdesdi ir .6 0 O . 1. -1 ^CESD2 e4 1 *-CESD4 Positive Affect ■ 9 0 ,0 1 e8 '^ T ^ - ^ b E S D I ►CESD1 e l 5 »CESD1 A 7 ' ^ ' ' 1 ►CESD1 Interpersonal 119 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. Post hoc analysis An analysis constraining the factor variances was also done with the CES-D model. As in the other three CES-D model analyses, the estimation process was completed, and all parameters were significant and in the correct direction. The chi-sq was 643.785 with 354 df and p<.001. The PGFI was .719 and the CFI was .945. Both showed model fit. The RMSEA was .046, and represented good fit. This model seemed to fit to the data. Table 9 shows the results. Summary of findings for hypotheses In general, the models fit. The PGFIs were above .05, and the CFIs were consistently greater than .90. The RMSEA values consistently indicated good fit (<.05). They had good degrees of precision and tests of closeness fit. All models had complete estimation. The models for hypotheses 2 and 3 showed improved fit via added constraints. The chi-square statistics did increase in these analyses. However, the tradeoffs of fit for parsimony were good ones since the increases in chi-sq were not significant for both of the constrained models as compared to the baseline model. Further, the fourth (unhypothesized) model’s parsimony for fit tradeoff also produced a better result. There was evidence in favor of metric invariance regarding the number and composition of factors, the factor loadings, the covariance between the factors, and the factor variances for the CES-D. 120 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. CHAPTER 5 Discussion Given the previous literature, this study had three goals (1) to do factor analyses of two-factor and a three-factor ZBI models across race groups as this had not been done previously, (2) to see if the standard CES- D four-factor structure held in a sample of Black caregivers as well as White caregivers, and (3) to see if both the ZBI and the CES-D measured the same constructs for both samples. In general, the two-factor ZBI model did not fare as well as the three-factor model. The two-factor had problems analytically, whereas the three-factor model fit the data and appeared to be metrically equivalent across both groups of caregivers. The CES-D four- factor structure fit the data, and was invariant among Black and White dementia caregivers. Hebert Two-factor Model Though the Hebert model has been replicated in another Canadian sample of caregivers, it was not stable across both U.S. samples in this study. In particular, it did not seem to fit the sample of Black caregivers. Differences between the samples here and the Canadian samples may relate to why the Hebert model could not be estimated for the Black sample. The Canadian studies articles ( Hebert et al. and O’Rourke & Tuokko) did not include the racial breakdown of the samples and only said the samples were nationally representative and were from two waves of a larger longitudinal 121 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. study. Such information was not included even in the articles that described the original study from which the data came. However, according to Canada’s 2001 census, Canada’s total population was 29, 639,035 with a “visible minority” population of 3,983,845 (about 13.4% of total). In Canada, visible minorities are “persons, other than Aboriginal persons, who are not white in race or colour.” Almost 50% of the minority population is Chinese (largest minority group) and South Asian (second largest) with a total of 1,946,470. Blacks make up 662,210 people, which is about 16.6% of the total minority population, and about 2.2% of total Canadian population. Blacks are the third largest group of minorities. The rest of the minority population (about 34.5%) is made of a variety of groups (e.g., Southeast Asian, Filipino, Arab/West Asian, Latin American; Statistics Canada Statistical Reference Centre, 2004). The Canadian samples may have had a variety of racial and ethnic groups included, but very likely the great majority of the caregivers were White. That the Hebert model was a better fit to the White sample in this study may be an indicator of some difference in the Black sample such that the model did not fit those data as well. The difference in fit could be related to cultural differences in ideology or interpretation of the Personal Strain and Role Strain concepts. A study of caregiver strain in Black and White daughter caregivers showed that though there was a high prevalence of emotional strain associated with caregiving for both groups, different factors distinguished the two groups and their 122 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. reactions to caregiver role strain (Mui, 1992). For example, in both groups, conflict between caregiving responsibilities and the caregiver’s social and personal life predicted emotional strain. On the other hand, while the quality of relationship with recipient related to more strain for White caregivers, it did not for Black caregivers. Thus, the Personal Strain item in the Hebert model that asks if the caregiver feels “strained when you are around your relative” may be a question that both groups can relate to, but just not an item that relates to personal strain for them both. The item might operate well though on some other type of factor. The main point here is that these factors do not seem to fit both groups, as shown by the inability for full estimation with the Black caregivers and with the significant differences in chi-square as the model was constrained to be equal across both groups. The items themselves are likely not the reason, it seems the configuration of the items into the Personal Strain and Role Strain factors may be most responsible for the noninvariance. As mentioned earlier, even in two Canadian samples (from Hebert et al. and Bedard et al.) there were discrepancies about the items that composed the Personal Strain and Role Strain factors. The items that Hebert et al. put on their Role Strain factor were on a Personal Strain factor in the Bedard et al. model and vice versa. Additionally, the Bedard model was very similar in goodness of fit, validity and other properties when compared to the Hebert model by O’Rourke and Tuokko (2003b). With some 123 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. of the same items loading on both factors at different times, there does not seem to be a clear distinction between the two factors. This may also help explain why the relationship between them was very strong for White samples, and especially the Black sample. In general, there is evidence that these two factors may not be stable constructs or accurate dimensions of the ZBI scale, thus contributing to the noninvariance found. Other reasons for the Hebert model not replicating could be differences across the Canadian and U.S. samples. For instance, caregivers of institutionalized older adults as well as those in the community were included in the Hebert et al. and the O’Rourke and Tuokko (2003a, 2003b) articles. The caregivers in this study cared for community-dwelling elders only. O’Rourke and Tuokko (2003b) cited that research has found no differences in involvement or dysphoria for people who care for those in institutions vs. those in the community. The Canadian sample included family and friends, excluding paid and formal caregivers. They decided that providing care for someone to whom one is committed should be inherently more distressing. The O’Rourke and Tuokko article did not give the ratio of friends to family in their sample, but the Hebert et al. study may be an indicator, and it included only 8.9% friends. The samples in the present study did not contain friends, just family, and no formal caregivers. Since the amount of friends is low in the Canadian samples, this difference is likely not much of a contributor to differences in findings. The one question that might 124 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. be asked to tie these variables together is, “how much of a difference does it make to live in Canada vs. the U.S.?” There are national policies and history that could affect the two populations. For instance, there is universal health care in Canada, but not in the U.S. This could change the nature and severity of burden associated with caregiving. Methodologically, there were some differences in the model published by Hebert and the model analyzed here. The original Hebert had correlated error terms, but since they were not specific about which error terms were correlated in their methods, no error terms were correlated in the model in this study. Indeed, regarding the correlated error terms, the original Hebert model may have had too many, or extra, parameters and so fit more easily per the disconfirmability principle (MacCallum, 1995), but this methodology may have also made the model harder to replicate. Theoretical. Statistical, and Practical Summary of Hebert Model Theoretically, the Hebert model was supposedly based on the Whitlatch model (18items, two factor), but really it was developed through re specification of a 22-item, two-factor model using EFA based on modification indices. There are fewer items in the Hebert model, but the two factors are similar in item composition and share the same names as the Whitlatch model. However, there was not a lot of information on the development of the Whitlatch model, and it was not clear how theoretically, or empirically- based that model was. Though the Whitlatch model fit worse than two other 125 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. models, including the Knight model, Hebert et al. still chose to re-specify the Whitlatch model without any explanation for why they chose a model that was shown to have an inferior fit to their data. In addition, though modification indices can be helpful, changes to models should be based on hypotheses or findings in the literature. Altogether, the theoretical foundation of the Hebert model seems somewhat weak. Statistically, the Hebert model has been replicated in another Canadian sample, but had some key statistical problems with the Black sample in this study. As such, the Hebert model seemed to be unstable across race groups. Deleting some items, or even loading all the items on one factor may have minimized these problems. However, these methods would have been changes in the model. The model did not fit better with constraints, so parsimony did not produce better results, and metrically invariance was not shown across race groups. All in all, the Hebert model did not perform well in its first test in U.S. samples. Conversely, given the speculative nature of the Hebert model findings, acceptance of results depends on confirmation in a larger sample. Hebert et al. included both French and English versions of the ZBI, but they said that language was not related to different scores in the ZBI in their sample. The Hebert model was developed with one Canadian sample via EFA, a method which is affected by sample characteristics and can vary by software and hardware; additional parameters were added based on 126 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. modification indices, not theoretical or empirical input. These variables may not make much difference in each place separately, but when considering them in groups from different nations, it may make a difference. Thus, it is possible that such a model may not replicate as well in a U.S. sample, or in a sample that varies a lot from the one on which it was based. Practically, if the Hebert model had held up, it might be advantageous to have a simple two-factor model with Personal and Role strain for the ZBI. On the other hand, there might be some concern that the factors are quite imbalanced with only three of the twelve items loading on the Personal Strain factor. When there are more than two factors, the imbalance is less obvious and maybe even expected as some factors are more important or play a larger role in the overall measurement of the general construct (e.g., the CES-D where Depressed Affect is a dominant factor). One might argue that personal and role strain are equally important and the number of items devoted to each should be similar. One final concern with the two-factor model is that the ZBI possibly could then be criticized for being too general and not tapping into more specific dimensions of burden found in other scales such as Time Dependence and Impact on Social Life. Knight Three-factor Model The Knight model has been replicated in several samples now - two Canadian caregiver samples, the caregiver sample in which it was confirmed, and now two more samples of caregivers, one Black and one White. It has fit 127 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. decently to well in every sample. Even in the Hebert et al. (2000) sample, where the claim was that it did not adequately reproduce the data, the AGFI was .94, and it was a better fit than the other models analyzed. The Knight model findings were evidence that the ZBI may have more than two factors. The other studies have mostly found two factors, so these results do not support those findings. One reason for this may be that some of the two-factor studies were ones where the Canadian groups used similar samples if not subsets of same sample - Hebert et al., O’Rourke and Tuokko (2003a, 2003b), and their samples included French and English versions of the ZBI. Hebert et al. mentioned that those who did not speak either language were not included in their analyses, and that language did not make a difference in results thus supporting the equivalence of the French version to the original English version. Given that the structure of the original ZBI has not yet been established or even studied much, it seems adding in extra factors such as variable language might make it more difficult to establish the factor structure, especially using EFA. In all fairness, it may have been done to maintain a sample size large enough to do factor analysis. Knight et al. analysis, in contrast, was a U.S. west coast southern California sample, using an English-only version of the ZBI. All of the covariance matrices, both item and factor matrices, were positive definite for both samples for all of the analyses. This showed that the Knight model fit both samples of Black and White caregivers. None of 128 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. the items on the factors or the factors themselves were so highly correlated that the model could not be fully estimated via maximum likelihood estimation. This contrasted the Hebert model’s outcomes with the exact same data, and same number of participants. The Knight model had 14 items while the Hebert model had 12. This means that with more parameters to estimate, thus a lower participants to parameters ratio, the Knight model was fully estimated for every set of analyses using the same analysis methods, whereas the Hebert model was not. In the comparison done by O’Rourke and Tuokko (2003b), they based the conclusion that the Hebert model was better than the Knight model mostly on internal consistency. High internal consistency was viewed as a plus in their article, but it may not have been, since high internal consistency can be used to argue against a scale having factors. As such, it makes sense that the internal consistency might be lower when factors are true. In addition, the Knight and Hebert models were equal on the AGFI and the CFI, and differed on the RMSEA and ECVI. The RMSEA was better for the Knight model than the Hebert model (.039 vs. .047). The ECVI was .39 for the Knight model and .30 for the Hebert model, and suggested by the lower number that the Hebert model had greater likelihood of replication across similar samples of comparable size from the population on which the tests were performed. That is a good property to have, but a better test may be that a model can be replicated across a variety of samples from several 129 R eproduced with perm ission of the copyright ow ner. Further reproduction prohibited w ithout perm ission. populations. The ECVI was correct in that the Hebert model has been tested well in Canadian caregiver samples, but the Knight model has tested well in both U.S. and Canadian samples. The Knight model fit the Black sample and results indicate that the three factors in the Knight model describe the Black data. Since this is the first study to evaluate the ZBI factor structure across race groups, and looking at Black caregivers in particular, there is no other study with which to compare how other ZBI models might fit with Black caregivers. One possible reason for the Knight model findings might be that the three factors are specific and vary markedly in the dimensions they seem to tap - anger vs. dependency vs. self-criticism. Possibly these traits are related to an acceptable covariance matrix for the factors in the Knight model. Differences in factors lead to other possible reasons for different results between these two models - model composition and item configuration. The two models include different items. They share eight items: 2, 6, 9, 10, 11, 12, 13, and 18. The Hebert model also included: 3, 7, 17, and 22, but the Knight model included instead: 4, 5, 8, 14, 20, and 21. The Knight analyses did not include item 22, the global burden item, whereas the Hebert model did. In terms of item configuration, though the two models share items, they sometimes differ on whether the items load onto the same factor or not. For instance, items that load on the same factor in the Hebert model load on separate factors in the Knight model. 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Though some of the items are shared, the two models also differ on the conceptual basis of their factors. Specifically, the Hebert model’s Personal and Role Strain factors seem to relate more to sociological theories such as role theory. In contrast, the three factors in the Knight model seem to relate to psychological concepts (e.g., emotion and the self). Further, the Knight model factors were similar to those found in other measures of burden (e.g. the Caregiver Burden Inventory; Novak & Guest, 1989). Perhaps, conceptual foundations could have contributed to the Knight model fit. A possible correlation between item 18’s error term and item 22’s error term across factors was a problem for the Hebert model. The Knight model does not have this problem, as it does not include item 22. Possibly the exclusion of item 22 allowed for more items to load on factors and possibly more factors to emerge since extremely high inter-factor correlation was less of a problem. Theoretical, Statistical, and Practical Summary of Knight Model Theoretically, the development methods for the Knight model have been tested by performance. In developing their model of the ZBI factor structure, Knight et al. tested other factor structures using confirmatory factor analyses. In contrast with Hebert et al., when a one-factor model and the Whitlatch model proved to not fit the data, they did not try to re-specify one of them via modification indices. Instead, they considered that a multifactor solution was plausible per Zarit’s (1989) discussion of a three-factor model, 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and the fact that other burden measures, such as those by Lawton et al. (1989) and Novak and Guest (1989), were multidimensional. Then, they did an EFA in an effort to find a factor structure that best described their data. Though there was limited information provided by Zarit on that three-factor model, they did compare their three-factor model to his. They also discussed how the composition of their factors compared with other burden models. This attention to theoretical underpinnings of other measures and what was known about the ZBI at that time, along with previous empirical findings may have paid off in finding a model that fits various data seemingly better than some other models. Statistically, the Knight et al. model, though not yet an established ZBI model, has shown the ability to fit Canadian as well as U.S. data. The U.S. data has included helpseeking and randomly found family caregivers. The Knight model has fit Canadian data where the ZBI was administered in two different languages, and caregivers were friends as well as family members of dementia patients. The Knight model only has 14 of the 21 nonglobal items in it. Knight et al. mentioned that their EFA with oblique rotation produced five factors; two of the factors only had one item each loading on them. Those two factors were dropped, hence, the three-factor model. A minimum loading of .5 was their criterion for which items loaded on which factors. By this criterion they had three-factors and fourteen items. Knight et al. used the .5 132 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. threshold because some of the statistics literature at that time indicated that the .4 threshold might be too low (B. G. Knight, personal communication, August 11, 2004). As it turns out, this might be another reason why their factors have been more stable. Possibly when using EFA, which can vary by software and hardware, it may be a wise, though conservative decision to use a higher threshold for item loadings. However, it is interesting to note that using the .4 standard for the Knight et al. EFA, 20 of the 21 ZBI items loaded on five factors, with only item 3 not meeting the criterion with its highest loading at .33. With this in mind, it seems very possible that there may be even more factors in the ZBI, or at least that more of the items can load onto factors. The Whitlatch et al. model that has 18 items out of 21, and other multidimensional burden measures such as that of Novak and Guest (1989) that has five factors further support this idea. Moreover, Rankin et al. (1994) found five factors with eigenvalues greater than one accounting for 64% of the variance in the ZBI. The five factors from the Knight et al. analysis accounted for 65.8% of the variance in the scale. Shorter versions may be necessary for time saving purposes, but for those who want to use the 22-item version, having a factor structure for the ZBI that contains all 21 of the nonglobal items, seems beneficial. This evidence that most if not all of the 21 items load onto factors needs further examination. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Another positive point about the Knight et al. model is that they developed and confirmed their model on samples including Blacks. Sample 1 was 21% Black and Sample 2 was 34% Black. Knight et al. did not just develop their model, they cross-validated it on another sample. The Hebert model was not cross-validated. These facts may also help account for why this model fit the data in this study better than the Hebert model. Practically, the Knight model is multidimensional as researchers have purported burden scales need to be. Three factors seems better than two in that more information is available about each participant’s feelings of burden. The factors are more balanced than the Hebert model, but the E/A factor still has twice as many as one factor and four times as many as the other. Since the factors are very different from one another, this imbalance may be explained in that the Embarrassment/anger factor may be the most dominant or important dimension of burden as measured by this scale. CES-D Four-factor Model The standard four-factor model for the CES-D was confirmed in these samples of Black and White dementia caregivers. These results support other analyses that have confirmed the standard factor structure like Hertzog et al. (1990) and Davidson et al (1994). The standard factor structure has been confirmed in middle-aged and older adults (Hertzog et al., 1990), frail eiders (Davidson et al., 1994), middle-aged and older Swedish twins (Wetherell et a., 2001), Asian people ages 60 and over (Mackinnon et al., 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1998), and now dementia caregivers, specifically Blacks and White caregivers. This study does not confirm the studies that found alternate CES-D factor structures. This evidence refutes the idea that the CES-D factor structure varies depending on the race group studied. Further, those studies of alternate models mostly used EFA. They did not first disconfirm the four- factor model in their samples using CFA, then check for other possible solutions via EFA. They went directly to the EFA method not knowing whether or not the standard model fit their sample. One of the few studies that did CFA first, by Gupta and Yick (2001), had a very small sample size for the parameters examined (N=76), and their administration of the scale was different as it was done over the telephone and the response format was altered. Similar to Hertzog et al. (1990), multiple group analyses were done in the present study to determine if there was invariance across race for factor pattern, covariances and variances. Overall, results show that the CES-D has the metric equivalence across these samples of Black and White dementia caregivers. Though there has been evidence of varied structures by cultural group (e.g., Mexican Americans), this study provided evidence for the standard four-factor structure holding in other cultures. Long Foley et al. (2002) suggested that items for somatic symptoms and Depressed Affect items may load on the same factor for some ethnic 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. groups. This was not found here. The Somatic and Retarded Activity factor and the Depressed Affect factor fit the data as separate factors. In the unconstrained model, the two factors were highly correlated in both samples, though more so in the Black sample. Since these two factors are significantly related in both samples, if Long Foley et al. had done their analyses on a White sample also, they may have found that the Depressed Affect items and some of the somatic symptom items loaded on the same factor for them too. That the magnitude of correlation is greater in the Black sample may just be a sign of a stronger connection between these factors for some Black caregivers, but does not necessarily mean they are not two separate factors. This also underscores why it is first necessary to test samples for the four-factor structure. As it is possible for EFA estimates to differ for the same sample, depending on statistical package and computer used, major decisions, such as the CES-D is not equivalent across groups, should not be based solely on EFA results. Comparing the White group to Blacks seemed like a good place to start examining the CESD-D factor structure in caregivers, as Blacks are likely one of the most assimilated minority groups in the U.S. Almost al! Blacks were born and raised in the U.S., as well as their parents and several generations preceding. Over the years, the Black population has become somewhat more heterogeneous due to immigration from places such as African countries (e.g., Nigeria), Caribbean islands (e.g. Haiti), and Central 136 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and South American countries (e.g., Belize). If this trend continues or increases, the sizable subgroups of immigrants within the Black population may be more like other minority groups that are less like the “mainstream” White population. The Blacks in the sample were asked if they identified with an additional ethnic groups or nationality and very few (less than 10%) mentioned any other ethnicity or country. Though there are cultural differences between Blacks and Whites, there may be fewer between them than between Whites and groups living in other countries or groups recently immigrated to the U.S. It was suspected that if differences were found between Blacks and Whites in the U.S., then one could expect to see such differences for other minority groups even more. The finding of invariance across the two groups in this study helped confirm that Black /White comparisons made with CES-D are valid. This is a significant piece of knowledge as the majority of the comparative literature on caregivers includes Black and White comparisons. Further, as information about Blacks and Whites grows, research can move on to other comparisons to see for which cultural circumstances the finding of invariance stands. Theoretical, Statistical, and Practical Summary of CES-D Model Theoretically, the CES-D is very sound. This scale has the best theoretical development of the ones studied here. The development by Radloff was described in detail earlier. When she designed this scale, she considered other measures and theories about depression. She was very 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. clear about what the scale was used for -measurement of depressive affect in the general population, not as a diagnostic measure of clinical depression. Her strong theoretical foundation for this scale likely has much to do with its longevity. Statistically, since the scale was designed for research, properties that are needed and expected from research measures like reliability and validity were considered a priori by Radloff, and were not first introduced by another researcher who happened to use the CES-D. Radloff explored the CES-D factor structure first, and measured and discussed its strengths and weaknesses. She also examined the generalizability of the CES-D in clinical and general populations as well as across subgroups (e.g., age, race, and education). The scale has been used for people from a variety of backgrounds and statuses. The original factors, sometimes with very minor enhancements (all 20 items included) are well supported by the literature. In the analyses done in this study, the fit of the CES-D four-factor structure did not worsen even as the equality constraints on the parameters increased. Practically, the findings for the CES-D in this study should increase the confidence of researchers and practitioners in the ability of the CES-D to measure the same dimensions of depressive symptoms in Black and White caregivers. With the changing population, the CES-D may not fit as well to as many types of people, but for now it seems to still be very useful. 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The CES-D has several subscales, but using the total score and the cutoff of 16 are likely the most common use of it. The subscale scores are there if needed for sophisticated analyses or if the CES-D is examined as a continuous variable. The total score is best for screening and obtaining information about depression in a sample for study or for clients and patients. Limitations Structural equation modeling is based on several parametric assumptions. One of these assumptions, that the data have a multivariate normal distribution, was violated. AMOS does not offer a test for multivariate normality, so the distribution of the total score was used instead. Some of the items for the two scales were skewed, and the distribution for the total score for the CES-D was significantly skewed for the both samples. The skewness of the items was basically due to the data being ordinal and not continuous, which is another assumption of SEM. AMOS did not have an estimator specifically for non-normal or ordinal data, and the maximum likelihood estimation method has been shown to be robust to these types of assumption violations (Chou & Bentler, 1995) There may not have been enough in the sample size to properly evaluate the models. Most writers on SEM or factor analyses recommend 10 participants for each parameter. In the analyses here, this level was not reached. For example, in the Hebert analyses, 250 participants would have been needed for the 25 parameters (10 factor loadings, 1 covariance, 2 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. factor variances, and 12 error terms). Neither the Black nor the Whites sample had that many participants. SEM is based on asymptotic (large sample) rules and when the sample size is smaller, estimators may not operate the as expected (Byrne, 2001). It should also be noted that these findings relate mostly to U.S., English-speaking samples and may not generalize to samples using other translations of these scales or all nationalities and cultures. For example, non-western societies may express depression differently than Western societies (Mui et al., 2001). Implications These analyses aided in the development of theory regarding caregiver burden and depression by examining the factor structure of two measures of mental health - the ZBI and the CES-D. These findings add to the literature that has tried to establish the factor structure of the ZBI. This study also contributes to the literatures that confirm the standard four-factor structure of, and that considers cultural variations in the CES-D. Unlike many other studies, this study also expands upon the invariance literature by submitting the first results of an examination of metric invariance for the ZBI and CES-D with Black and White dementia caregivers. First, a three-factor model for the ZBI was found to fit in two more caregiver samples. Second, this same model was found to be metrically invariant across two race groups, with factors, factor loadings, factor covariances, and 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. factor variances constrained equal for Black and White caregivers. Third, this study was able to further confirm that the standard four-factor model for the CES-D fit not only White caregivers, but Black caregivers also. Fourth, beyond structure fit, it was shown that the CES-D had similar psychometric properties for both Black and White caregivers. These findings can help fill in information that has been missing regarding ZBI factor structure and its performance in different groups, as well as argue for the standard model in the debate about whether the CES-D has a different structure depending on race group. This was just one study, but the possible implications are far-reaching. The replication of the ZBI models suggested that the Hebert model may need rethinking to work in U.S. samples, and that the Knight model has done the best of all ZBI models so far in fitting samples of caregivers. This helps build a very small literature base. The replication of the CES-D added to the evidence that the standard factor structure holds up in Black and White, U.S., English-speaking samples, and is metrically equivalent in the two groups. As this study showed that the ZBI and the CES-D factor structures held across two groups of Black and White caregivers, there can be more trust that these two scales are measuring the same constructs across these two race groups. Therefore, when comparisons are made in caregiver studies and differences are found among Black and White caregivers, one can have greater assurance that these differences are not due to 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. measurement inequality. For the CES-D the invariant structure included all of its 20m items. But for the ZBI, it may be best to compare Black and White caregivers using the items and subscales specifically from the Knight model, as this has been the only model so far to replicate in both groups. Regarding the CES-D literature, there really is no compelling evidence to say that the information from the EFA and CFA studies have to be mutually exclusive. It does not seem that conclusions must be either there are no other factor structures than the four-factor structure for all ethnic groups, or the four-factor structure is categorically incorrect. Instead, one could take the findings together. Thus, when CFA shows that the four-factor structure holds in a sample, it means that depression is viably measured in that way for that group. Then when an EFA on the same sample shows combined factors it can mean that these factors are very related to one another, and when one factor does not show up this could indicate that it is less salient for that group, but not that the factor does not exist or cannot be measured. Though there have not been factor analyses across groups for the ZBI before now, it would seem that similar conclusions could also apply to the ZBI. Since metric invariance was found for both measures in this study, this means that some of the previous inconsistent findings may not be related to the scale’s ability to measure burden or depression. If anything it may relate to acceptability. In fact, much of the findings about different factor structures 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. may have resulted in part from cultural intolerance of the scale due to the way the questions are asked or other language issues. Granted there may be varying dimensions of burden or depression by ethnic groups, or ones not tapped by the ZBI and CES-D. However, one could argue that the factors measured in the ZBI and CES-D really are present for some groups, but respondents may not want to truly address the items due to feelings of betrayal of care recipients if they admit struggles with caregiving, or fear of stigma if they admit depressive symptoms. Conversely, respondents or clients may find it requires a different way of thinking than they are used to, or a different kind of introspection for them to answer the questions accurately. The real point may be to find out how to measure the same concepts by language, culture, etc. As mentioned by Janevic and Connell (2001), conceptual equivalence cannot be assumed via back-translation methods alone. Conceptual equivalence is the first step of measurement equivalence, and must be assumed before metric or structural equivalence can be tested. Reliable and culturally valid screening instruments are vital to clinical practice (Mui et al. 2001). These findings assist service providers and practitioners in that they do not have to use different scales to screen Black and White caregivers (though with the ZBI they may want to use the 14 Knight model items). The findings do not mean that Black and White caregivers have the same needs, but they do show that the CES-D is very 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. capable of measuring depressive symptoms in both types of caregivers and the evidence is much stronger that the ZBI also is capable of measuring burden equally well both groups. There is less certainty with the ZBI as this is the first study to look at how the ZBI operates across Black and White caregivers. This information may encourage counselors, social workers and other professionals to have greater confidence in using the ZBI and the CES- D to better understand the concerns of the caregivers they serve, and to help them navigate their caregiving experience. Practically, the ZBI and the CES-D having an accepted factor structure across race groups is important because even more information on the caregivers would be obtainable. Factors do offer more specific information. Knowing which dimensions of burden or depression contribute more or less to a total score tells much more to a researcher, social worker, or any other professional who works with caregivers about the needs of and the impact of caregiving on the caregiver (Novak & Guest, 1989). Such data may allow for increased ability to parse out patterns of caregiving effects or correlates and antecedents of caregiver burden (George & Gwyther, 1986). With set factors, researchers or clinicians could use the ZBI and CES-D not just to see if caregivers are burdened or depressed, and to what degree, but they could also see what sets of circumstances are most responsible for the caregiver’s experiences. Knowing that these scales operate similarly for 144 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. various groups offers professionals more assurance in using the measures and obtaining valid results. Regarding policy, these findings are also helpful information. In order to know prevalence rates and devise treatment plans and services for caregivers it is important that scales measure the constructs correctly and the same way for various people. If scales do not operate the same way for people from various cultures, then rates of mental health issues such as burden and depression could very well be biased or misestimated for those groups (Mui et al., 2001). Erroneous conclusions drawn from measures with dissimilar properties across groups can have wide-ranging implications for public health policy (Posner et al., 2001). Accurate detection gives policymakers the information they need to make decisions regarding the resources for which they will advocate on behalf of the caregivers in their constituency. Path analyses, MIMIC, and item response theory, along with subscale analyses would be great considerations for future research. Analyses including more variables can lead to even greater understanding of group similarities and differences. Such knowledge can then be used to better tailor services, support, and other resources for caregivers, such that they are most meaningful to targeted groups. Though there may not be need of different surveys for Black and White caregivers, perhaps such information may lead to more or less emphasis on certain types of questions as they are 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. related to situational differences such age, relationship to care recipient, and geography. This study of factor structure and metric invariance sets the stage for the analysis of more complicated questions. Research Recommendations In continuing work in this area, there are some research objectives that might be constructive. One key to continuing to increase theory regarding these two scales is cross-validation and replication. With over specified models (more degrees of freedom than estimated parameters) model fit can be tested, but there is the possibility of other structural models that could also fit the data. The more often models are found to adequately fit the various samples; the more confidence researchers can have in them. Hebert et al. said that their findings indicated that their empirical analyses fully supported the theoretical structure of the ZBI. The ZBI theoretical structure has not been established yet for their findings to have supported it. However, researchers are closer to understanding due to the contributions of Hebert et al., Knight et al. and others, as well as this study where replications of models have been done. Checking for invariance in cross-cultural research is needed to ensure comparable construct measurement in order to make appropriate comparisons. If there is not a consistent factor structure available, then choosing a factor structure from the literature that fits best with study goals may be good place to start. Then if one is unsuccessful, and has the 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. resources, the next recommended step is to try to confirm another structure in the literature. Replication in new samples ties the literature together. Trying EFA to see what emerges should be the last resort. Though helpful, discovering new structures does have the con of then needing to be replicated several times to be established in literature. It seems best to exhaust all the available options and try to build on foundations already set than to start a new project. This maximizes the knowledge available on established options. Science is a process of building not just discovering. When options do not prove to offer anything, then new things need be invented. Given group differences and individual differences even among similar groups, measures will very likely operate differently with each sample (Byrne, 2001), so if one wants to find a new structure odds are high that one will, based partly on maximized use of sampling error variance. However, finding that a particular structure does indeed replicate in multiple samples builds confidence in use of measures. Then when attempts at replication of certain factor structures fail time and again, one can confidently decide that this model was not valid and reliable across various groups, and making another measure was not done in haste, or over-ambitiousness, but done out of need with the statistical evidence as support. Publication of important details such that replication is possible is a must. Sometimes researchers desire to support another researchers’ findings, but due to lack of analytical details, they are unable to do so. For 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. example, Knight et al. wanted to examine a three-factor structure as found by Zarit (1989). They were not able to do this since Zarit mentioned the factors, but not how they were found. Instead they were left with the alternative of doing EFA to find a three-factor solution, then looking at factor compositions to see if they were possibly similar to the Zarit (1989) factors as implied by the factor names. In fairness, the Zarit study was done in 1989 or before and factor analysis was used a lot less in methods. However, the population is changing rapidly and a lot in terms of diversity. This diversity applies to age and race among other characteristics. Thus, there is and expectedly will be more of a need for careful and clear examination of psychometric properties of measures in an effort to keep up with their adequacy as circumstances and events shift. A good example of this being done is the literature on SEM fit indices. As more info is found on the strengths and weaknesses of each index, and changes in thresholds are made, statistics texts and articles update their info and the word is spread. Authors even instruct researchers on how to use various indices, and exhort them to try to get up-to-date information on them so as to best use them. Eventually, this may be how researchers will need to handle measures as well - updating info on each of them and passing on to others the pros and cons of each. It is also important to obtain some consistency of construct content and item interpretation. For instance, Vitaliano et al. (1991) were able to 148 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. compare ten burden measures by ten groups of researchers on the criteria of objective and subjective burden. Their review article is an example of trying to find correspondence and consistencies in the measurement of burden even across scales. Knight et al. (2000) discussed how their ZBI factors shared properties of factors found in other burden scales, such as item content and similarity in themes underlying the factors. These are examples of researchers trying to reconcile the burden literature. On the other hand, there were researchers who looked at the same scale (ZBI), ended up with the same number of factors, and used the same factor names (2 factors, Personal and Role Strain), but could not agree on the interpretation of some of the exact same items (Bedard et al., 2001; Hebert et al., 2000; Whitlatch et al., 1991). Researchers have the right to disagree, but no references were made as to why. Instead, researchers are left to wonder, and unfortunately, this adds to the lack of cohesion in the ZBI literature. Conclusion The motivation behind this research was the findings in the caregiving research for Black and White caregivers. Some studies found that White caregivers were higher on burden and depression, while others have found no differences between the two groups. This study examined the factor structure of two mental health measures to help answer whether or not these inconsistent findings were related to inconsistent measurement of burden and depression across these two groups. Factor analysis in and of itself was 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. not the main goal here. Learning more about the research that has been done based on these two scales was one, and learning more about the empirical and practical uses of these two scales was another. For the ZBI, the answers were less definitive than those of the CES-D, but the ZBI is in an earlier stage of its factor structure research. Studies such as the present one show that the ZBI, though often used as a single score scale, likely has several dimensions of burden that once established have promise for effective use not just with White groups, but other races and ethnicities as well. Further, CES-D was shown here to have a factor structure that not only fits various samples, but that is invariant across them. Future research should consider replicating the Knight model, which may prove to be a good short version of the ZBI, as well as look for and confirm factor solutions that include more of the ZBI items. Future research should also take advantage of the invariance of the CES-D factor structure to consider the degrees of emphasis placed on the subscales by groups, as this is likely one of the steps in understanding what underlies mean differences in the research so far. 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bibliography Allen-Kelsey, G. J. (1998). Caregiver burden among African-American and Anglo-American family caregivers. Topics in Geriatric Rehabilitation, 14, 63-73. Bedard, M., Molloy, D. W., Squire, L , Dubois, S., Lever, J. A., & O'Donnell, M. (2001). Zarit Burden Interview: A new short version and screening version. Gerontologist, 41, 652-657. Byrne, B. M. (2001). Structural Equation Modeling with AMOS: Basic concepts, applications, and programming. Lawrence Erlbaum Associates: Mahwah, NJ. Callahan, C. M. & Wolinsky, F. D. (1994). The effect of gender and race on the measurement properties of the CES-D in older adults. Medical Care, 32, 341-356. Chou, C-H. & Bentler, P. M. (1995). Estimates and tests in structural equation modeling. In R. H. Hoyle (ed.) Structural Equation Modeling: Concepts, Issues, and Applications (pp. 37-55). Sage: Thousand Oaks, CA. Davidson, H., Feldman, P. H., & Crawford, S. (1994). Measuring depressive symptoms in the frail elderly. Journal of Gerontology, 49, P159-P164. Fredman, L , Daly, M.P., & Lazur, A. M. (1995). Burden among White and Black caregivers to elderly adults. Journal of Gerontology, SOB, S110- S118. Gallo, J. J., Cooper-Patrick, L., & Lesikar, S. (1998). Depressive symptoms of Whites and African American aged 60 years and older. Journal of Gerontology: Psychological Sciences, 53B, P277-P286. Garson, G. D. (2004). Structural Equation Modeling Example Using WinAMOS (webpage). http://www2.chass.ncsu.edu/aarson/pa765/semAMOS1.htm. Gatz, M., Johansson, B., Pedersen, N., Berg, S., & Reynolds, C. (1993). A cross-national self-report measure of depressive symptomatology. International Psychogeriatrics, 5 ,147-156. George, L. K., & Gwyther, L. P. (1986). Caregiver well-being: A multidimensional examination of family caregivers of demented adults. Gerontologist, 26, 253-259. 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Haley, W. E„ West, C. A. C., Wadley, V. G., Ford, G. R.( White, F. A., Barrett, J. J., Harrell, L. E„, & Roth, D. L. (1995). Psychological, social, and health impact of caregiving: A comparison of black and white dementia family caregivers and noncaregivers. Psychology and Aging, 10, 540-552. Hebert, R., Bravo, G., & Preville, M. (2000). Reliability, validity and reference values of the Zarit Burden Interview for assessing informal caregivers of community-dwelling older persons with dementia. Canadian Journal on Aging, 19, 494-507. Hertzog, C., Van Alstine, J., Usala, P. D., Hultsch, D. F., & Dixon, R. (1990). Measurement properties of the Center for Epidemiological Studies Depression Scale (CES-D) in older populations. Psychological Assessment, 2, 64-72. Hinrichsen, G. A., & Ramirez, M. (1992). Black and white dementia caregivers: A comparison of their adaptation, adjustment, and service utilization. Gerontologist, 32, 375-381. Hoyle, R. H,, & Panter, A. T. (1995). Writing about structural equation models. In R. H. Hoyle (Ed.), Structural Equation Modeling: Concepts, Issues, and Applications (pp. 158-176). Sage: Thousand Oaks, CA. Hu, L-T., & P. M. Bentler (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural Equation Modeling: Concepts, Issues, and Applications (pp. 76-99). Sage: Thousand Oaks, CA. Hui, H. C. &Triandis, H. C. (1985). Measurement in cross-cultural psychology: A review and comparison of strategies. Journal of Cross- Cultural Psychology, 16, 131-152. Janevic, M. R. & Connell, C. M. (2001). Racial, ethnic, and cultural differences in the dementia caregiving experience: Recent findings. Gerontologist, 41, 334-347. Knight, B. G., Fox, L. S., & Chou, C-P. (2000). Factor structure of the Burden Interview. Journal of Clinical Geropsychology, 6, 249-258. Knight, B. G., & McCallum, T. J. (1998). Heart rate reactivity and depression in African-American and white dementia caregivers: Reporting bias or positive coping? Aging & Mental Health, 2, 212-221. 152 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Knight, B. G., SiSverstein, M., McCalium, T. J., & Fox, L. S. (2000). A sociocultural stress and coping model for mental health outcomes among African American caregivers in Southern California. Journal of Gerontology: Psychological Sciences, 53B, P142-P150. Krause, N. & Markides, K. S. (1985). Employment and psychological well being in Mexican American women. Journal of Health and Social Behavior, 26, 15-26. Lawton, M. P., Kleban, M. H., Moss, M., Rovine, M., & Glicksman, A. (1989). Measuring caregiving appraisal. Journal of Gerontology: Psychological Sciences, 44, P61-P71. Lawton, M. P., Rajagopal, D., Brody, E., & Kleban, M. H. (1992). The dynamics of caregiving for a demented elder among black and white families. Journal of Gerontology: Social Sciences, 47, S156-S164. Liang, J., Tran, T. V., Krause, N., Markides, K. S. (1989). Generational differences in the structure of the CES-D Scale in Mexican Americans. Journal of Gerontology: Social Sciences, 44, S100-S120. Long Foley, K., Reed, P. S., Mutran, E. J., & DeVellis, R. F. (2002). Measurement adequacy of the CES-D among a sample of older African- Americans. Psychiatry Research, 109, 61-69. MacCallum, R. C. (1995). Procedures, strategies, and related issues. In R. H. Hoyle (Ed.), Structural Equation Modeling: Concepts, Issues, and Applications (pp. 16-36). Sage: Thousand Oaks, CA. Mackinnon, A., McCalium, J., Andrews, G., & Anderson, I. (1998). The Center for Epidemiological Studies Depression Scale in older community samples in Indonesia, North Korea, Myanmar, Sri Lanka, and Thailand. Journal of Gerontology: Psychological Sciences, 53B, P343-352. Manson, S. M., Ackerson, L. M., Wiegman Dick, R., Baron, a. E., & Fleming, C. M. (1990). Depressive symptoms among American Indian adolescents: Psychometric characteristics of the Center for Epidemiological Studies Depression Scale (CES-D). Psychological Assessment, 2, 231-237. Miller, B., Campbell, R. T., Farran, C. J., Kaufman, J. E., & Davis, L. (1995). Race, control, mastery, and caregiver distress. Journal of Gerontology, SOB, S374-S382. 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Miller, T. Q„, Markides, K. S., & Black, S, A. (1997). Factor structure of the CES-D in two surveys of elderly Mexican Americans. Journals of Gerontology: Series B: Psychological Sciences and Social Sciences, 52B, S259-S269. Mui, A.C. (1992). Caregiver strain among Black and White daughter caregivers: A role theory perspective. The Gerontologist, 32, 203-212. Mui, A. C., Burnette, D., & Chen, L. .ML (2001). Cross-cultural assessment of geriatric depression: A review of the CES-D and the GDS. Journal of Mental Health and Aging, 7, 137-164. O'Rourke, N., Tuokko, H. A. (2003a). Psychometric properties of an abridged version of the Zarit Burden Interview within a representative Canadian caregiver sample. Gerontologist, 43, 121-127. O'Rourke, N., Tuokko, H. A. (2003b). The relative utility of four abridged versions of the Zarit Burden Interview. Journal of Mental Health and Aging, 9, 55-64. Posner, S. F., Stewart, A. L., Marin, G., & Perez-Stable, E. J. (2001). Factor validity of the Center for Epidemiological Studies Depression Scale (CES-D) Among Urban Latinos. Ethnicity and Health, 6, 137-144. Novak, M., & Guest, C. (1989). Application of a multidimensional Caregiver Burden Inventory. Gerontologist, 29, 798-803. Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1 , 385-401. Schulz, R., O’Brien, A. T., Bookwala, J. & Fleissner, K. (1995). Psychiatric and physical morbidity effects of dementia caregiving: Prevalence, correlates, and causes. Gerontologist, 35, 771-791. Schumacker, R. E., & Lomax, R. G. (1996). A Beginner’s Guide to Structural Equation Modeling. Lawrence Erlbaum Associates: Mahwah, NJ. Small, B. J., Hertzog, C., Hultsch, D. F., & Dixon, R. A. (2003). Stability and change in adult personality over 6 years: Findings from the Victoria Longitudinal Study. Journal of Gerontology: Psychological Sciences, 58B, P166-P176. 154 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Statistics Canada Statistical Reference Centre (2004). Visible minority population, provinces and territories webpage: httpi//www.statcan.ca/enqlish/Pqdb/demo52a.htm . Tran, T. V. (1997). Exploring the equivalence of factor structure in a measure of depression between black and white women: measurement issues in comparative research. Research on Social Work Practice, 7, 500-517. U.S. Bureau of the Census. (1998). Population Projections of the United States by Age, Sex, Race, and Hispanic Origin: 1995-2050, Current Population Reports, P25-P130. Wetherell, J. L , Gatz, M., & Pedersen, N. L. (2001). A longitudinal analysis of anxiety and depressive symptoms. Psychology and Aging, 16, 187-195. West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables. In R. H. Hoyle (ed.), Structural Equation Modeling: Concepts, Issues, and Applications (pp. 56-75). Sage: Thousand Oaks, CA. Whitlach, C. J., Zarit, S. H., & von Eye, A. (1991). Efficacy of interventions with caregivers: A reanalysis. Gerontologist, 31, 9-14. Zarit, S. H. (1989). Issues and directions in family intervention research. In E. Light and B. Lebowitz (eds.), Alzheimer’s disease: Treatment and family stress. Hemisphere: Washington, DC. Zarit, S. H., Reever, K., & Bach-Peterson, J. (1980). Relatives of impaired elderly: Correlates of feelings of burden. Gerontologist, 20, 373-377. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Effects of cultural values and behaviors in sociocultural stress and coping among Korean-American and Korean caregivers
PDF
Dementia caregiving and ethnicity: African American caregivers and the sociocultural stress and coping model
PDF
Experimental evaluation of a new intervention designed to promote dementia caregivers' acceptance and empathic responding towards care -recipients
PDF
Impact of housing arrangements on social support and health status among Chinese American elderly
PDF
Education and intelligence test scores: Predictors of dementia?
PDF
Frequency and risk factors of poststroke dementia
PDF
Depression as a risk factor for dementia
PDF
Heart disease among middle -aged and elderly persons in the United States: Trends and a multistate model
PDF
Gender differences in symptom presentation of depression in primary care settings
PDF
A daily diary approach to compare the accuracy of depressed and nondepressed participants' estimation of positive and negative mood: A test of the depressive realism hypothesis
PDF
Development and validation of the Cooper Quality of Imagery Scale: A measure of vividness of sporting mental imagery
PDF
Filial expectations and social exchange patterns among older Taiwanese parents and their adult children
PDF
Hispanic culture, acculturation, and distress among caregivers of dementia patients
PDF
Gender and the bump: An investigation of the reminiscence effect in the Long Beach Longitudinal Study
PDF
HIV/AIDS preventive behavior in Botswana: Trends and determinants at the turn of the 21st century
PDF
History of depression, antidepressant treatment, and other psychiatric illness as risk factors for Alzheimer's disease in a twin sample
PDF
Intergenerational social support and the psychological well-being of older parents in China
PDF
Impact of language and culture on a neuropsychological screening battery for Hispanics
PDF
Assessment of racial identity and self -esteem in an Armenian American population
PDF
Biopsychosocial factors in major depressive disorder
Asset Metadata
Creator
Flynn Longmire, Crystal V.
(author)
Core Title
Cross -cultural examination of mental health measures in dementia caregivers: Assessment of the Center for Epidemiologic Studies -Depression Scale and the Zarit Burden Inventory
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Gerontology
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Gerontology,OAI-PMH Harvest,psychology, psychometrics
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Knight, Bob G. (
committee chair
), Huey, Stanley J. (
committee member
), Silverstein, Merril (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-575048
Unique identifier
UC11340842
Identifier
3155442.pdf (filename),usctheses-c16-575048 (legacy record id)
Legacy Identifier
3155442.pdf
Dmrecord
575048
Document Type
Dissertation
Rights
Flynn Longmire, Crystal V.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
psychology, psychometrics