Participants
The sample consisted of 862 employees, which were interviewed using convenience sampling. From this initial sample, 95 cases were deleted due to their score above 12 on the Validity Scale (more details in the next section), and the remaining 767 employees (60.0 % males and 40.0 % females) were included in the study. The average age was 30.2 years old (SD = 11.7) and most of the employees had a high school degree (73.7 %), whereas 16.1 % had a bachelor degree. A total of 46.5 % of the participants are single and 42.1 % informed they are married. Employees who reported not having children accounted for 52.2 % of the sample.
Most of the employees reported a monthly income that ranged from 546 to 1,635 Brazilian Reais (R$) and represented 55.8 % of the sample. The average working time in the same job/position was 2.8 years (SD = 3.48). Managers comprised 22.6 % of the total sample, employees/workers 63.4, and 14 % were self-employed.
The test results were gathered from different sectors of the economy: primary (31.1 %), secondary (14.3 %), tertiary (48.6 %), and others (6.0 %). Most of the cases were collected in the state of Bahia, in the cities of Vitória da Conquista (33.0 %), Juazeiro (25.4 %), Luís Eduardo Magalhães (13.1 %), Barreiras (6.4 %) and Salvador (4.7 %). Cases were also collected in the state of Pernambuco, in the city of Petrolina (17.4 %).
Regarding the organizations investigated, 34.8 % have more than 500 employees and 21.1 % have from 250 to 499 employees. In the sample, 81.2 % of the companies are private, whereas only 15.6 % are public. Data were collected in 2012 from four private companies. All companies are based in the states of Bahia and Pernambuco (Brazil).
Instruments
The development of the Behavioral Intentions Scale of Organizational Citizenship was based on a comprehensive literature review, which listed more than 280 descriptors divided into four macro-dimensions. The first version of the scale comprised 59 items that were initially submitted to judge analysis. We gathered six experts in OCB to evaluate whether the items were related to the construct and to identify to which dimension each item belonged.
After the rater analysis, 17 items were excluded from the scale (three items for not representing the construct adequately, and 14 items for not meeting the required agreement level of 80 % among the experts). The remaining 42 items were sent to semantic analysis (Pasquali 2003), and the results from this analysis were combined with a critical review of the items in order to improve the quality of the scale. Finally, the operational version of the scale was created with 42 items, split up into four dimensions: 15 items to measure Voluntarism, nine items to Individual Initiative, 15 items to Extra Commitment, and three items to Organizational Defense.
As was mentioned above, the development of the scale was based on the Theory of Reasoned Action (TRA). Therefore, each item was developed to measure behavioral intentions. The items were designed as a problem-solving situation in which the subject had to decide between two mutually exclusive behaviors. These two options were separated by a semantic differential scale (Osgood et al. 1957) with seven intervals of response. Figure 1 presents an example of one of the items of the BISOC.
In addition to BISOC items, three more items were added to the scale. They composed the Validity scale, which evaluates the consistency of the responses to BISOC. In other words, it verifies whether the subjects responded to the scale with attention or if the subjects understood the task presented in the item. For a protocol to be considered valid, the sum of the responses on the Validity Scale must be less than or equal to 12, corresponding to all of the three items being scored as at most as four on the seven-point Osgood scale. The higher the total score on the Validity Scale, the stronger the concordance with the unrealistic situations presented.
The last instrument used was a Sociodemographic questionnaire that investigated some personal and professional characteristics, such as sex, age, marital status, time of service and organization size, among others.
Data collection procedures
All data were collected in the participating organizations during working hours. The questionnaire is self-explanatory, but a research agent previously trained to administer the instrument and answer any queries supervised the whole administration process.
The School of Nursing Ethics Committee at the Federal University of Bahia reviewed and approved this research. Therefore, all methods in this study followed the requirements and instructions of the Resolution 196/96 of the Brazilian National Health Council (1996).
Data analysis procedures
In order to study the construct validity of BISOC, we applied different techniques from Classical Test Theory (CTT) and Item Response Theory (IRT). The first step to take before carrying out the analysis was to investigate through CTT if the individual’s responses were evenly distributed across the scale intervals. As stated by Sisto et al. (2006), an interval with less than 15 % of responses may suggest that it was not chosen by the majority of the respondents and could be therefore withdrawn or collapsed with another interval. Based on the principles of the IRT, the intervals of a scale should be analyzed in terms of the order of their thresholds, which are the boundaries between categories. Disordered thresholds may represent a violation of the measurement construct since that a higher interval (e.g., 5-point) cannot assume the position of a lower interval (e.g., 4-point) in the latent trait scale. Similarly, if one of the intervals overlaps with others this means that the scale has probably been using more intervals than necessary to measure the construct.
To investigate whether one of these aforementioned situations holds true, we used the Rating Scale Model (Andrich 1978) to test if the category response curves were disordered. The results showed a very low variability across the intervals, with the category 1 overlapping the categories 2, 3 and part of the 4, and the category 7 also overlapping part of the category 4, in addition to the categories 6 and 5. Since the individual’s responses are polarized on the extreme intervals of the scale, a dichotomous scale was defined accordingly. The scale was then summarized into two categories: 1 – “Manifest OCB” and 2 – “Do not manifest OCB”.
After the scale has been set in two main categories, a Principal Component Analysis was performed based on the CCT assumptions, using the tetrachoric correlation among the items of BISOC. The aim of this analysis was to identify the factor structure that best describes the explained variance of the construct. Authors advise that the minimum value of factor loadings for interval scales should be greater than .30, given that the sample has 350 subjects (Hair et al. 2005). However, in this study we utilized a minimum value of .40 due to the dichotomization process and the resulting reduction of the scale intervals.
Under the assumptions of the IRT, a Full Information Factor Analysis (FIFA) was performed. This analysis allowed going further into the examination of the correlation matrix by investigating the individual’s patterns of responses. After that, the items were examined taking into account the three-parameter logistic model (a - discrimination, b - difficulty, and c - pseudo-guessing). An item thought to be discriminant should have a score greater than 0.35. Items with an average difficulty level are those in the Range of Validity determined by the Test Information Curve. The more the pseudo-guess parameter approaches zero, the better the quality of the measurement.
With the purpose of verifying the pattern of unexpected item responses, a residual analysis was performed using Rasch model (one-parameter logistic model). In this analysis the parameters evaluated were the infit mean square, which attenuate the importance of extreme residuals, and the outfit mean square, which is useful for the detection of extreme residuals (outliers) with misfit away from the latent trait of the subject. The fit values (infit and outfit) for samples with less than 1,000 subjects must be between 0.70 (presence of responses in the unexpected direction) and 1.30 (item more discriminant than the predicted by the Rasch Model (Bond and Fox 2007).
The reliability was finally examined with two methods. First, the Test Information Function (TIF) was calculated based on the sum of the information function for each item. The TIF gives the Test Information Curve that indicates the lower and upper bounds within the theta levels are valid and those in which they are not (Pasquali 2003). The quality of the item information increases when: a) the parameter b is close to theta, b) the parameter a presents high values, and c) the parameter c is close to zero. According to Hambleton (2004), the TIF value must be equal to or greater than 10 so that it guarantees an adequate level of precision in the measurement. Finally, the Kuder-Richardson (KR) coefficient was also calculated. Hair et al. (2005) suggest that a value of KR equal to or greater than .70 can be considered satisfactory.