This study aimed to: (i) obtain validity and reliability evidences for the Luria-Nebraska Test for Children from relations with external criteria (age), (ii) identify scores that predict IQ, and (iii) verify internal consistency.
The age effect analysis was performed controlling possible effects of full-scale IQ. Results showed that TLN-C’s scores increase with age. There was a systematic progression of the means, especially on the total score. This is an important type of validity evidence in neuropsychological screening tests, since sensitivity to detect changes along the development is one of the main parameters that allow the establishment of normative data (Pasquali, 2010).
Along the development from preschool age to adolescence there is acquisition and refinement of cognitive functions. This result is supported by the maturation of the nervous system (especially the myelination and optimization of neural networks by synaptic pruning) and environmental stimulation that usually puts the child before many cognitive challenges, mainly in school activities (Osborn & Pereira, 2012).
The detection of differences in almost all TLN-C’s subtests points that it has effectively measured both perceptual-motor and abstract functions, successfully differentiating development levels. This differentiation is carried out by detecting the maturation level of basic perceptual-motor functions and development level of academic skills. These two axes present on TLN-C, the first one with little influence from formal education and the second one directly linked to it, help to understand the increasing differences found on the performances until 10 years, the relative separation between ranges 6–10 and 11–13 years old, and the systematic differences in total score. It is especially relevant that differences among ages were present controlling IQ influence (except for one subtest), which confirms that they are related to age.
The verification of age effects is common in cognitive test validation, since cognitive functions can develop with aging and experience. This external variable is so relevant in this kind of assessment that, after the normatization process, it is common for normative tables of reference for result interpretation to be organized by age ranges. A recent example is the validation e normatization of the newest Brazilian adaptation of the WISC (Rueda et al. 2013).
The Receptive Speech subtest was the only one insensible to detect any changes with age. This subtest measures a basic cognitive skill, in the sense that it is a prerequisite for children to be able to comprehend what is demanded of them whenever they receive a verbal instruction. Even in this case, gains in this ability are expected along children development as they manage, increasingly, to: (i) comprehend more elaborate verbal sentences; (ii) retain more content as their immediate memory improves; and (iii) organize them with their working memory (Carneiro, 2008; Dias & Landeira–Fernandez, 2011). Therefore, the absence of differences on this score points to the need of task reformulation, so it may entail more levels of complexity.
Another observed result was the small changes in subtest means on the range from 9 years on, and in the total score from 11 years on. These results provide evidence about subtest difficulty and its adequacy to the age range the test is designed for. In a screening test it’s especially important to include simple items, enabling the detection of subtle deficits, and to avoid to include overly demanding items. The absence of differences found between some age ranges may point to the need of inclusion of harder items in several subtests, so they may become more sensible to performance differences on the range from 9 to 13 years.
Furthermore, there was no ceiling effect and an interruption on the progression of means was found in some subtests. A ceiling effect is expected for some of TLN-C subtests because of their content (e.g. the notion of left and right, present on the Tactile Skill subtest, depend on age, and skills such as reading and mathematical reasoning depend on years of instruction) and task difficulty, which is not scalar, so that even the most difficult of them is not challenging.
In most cases, this data behavior may be explained by the sample of the study being composed of children with learning difficulties. In previous studies, the LNNB proved to be sensible in detecting performance differences between subjects with and without learning disabilities (Lewis et al., 1993; Myers et al., 1989). In this sense, the variations found may be related to the sensibility of the test to detect deficits in this population; however, comparative studies are needed to test such hypothesis. This kind of study may also help to clarify whether the similar performance of higher and lower ages in some subtests is due to a real lack of discrepancy on these functions during the developmental period covered by the test, or whether older children with learning difficulties show a performance similar to younger children due to deficits in cognitive functions. Moreover, the interruption of progression of scores occurred only in a few subtests and were insufficient to establish a new pattern.
The Pearson analysis showed that all subtests and the total score of TLN-C correlated with WISC-III’s full-scale IQ. Both total scores are measures that reflect the performance on a heterogeneous set of cognitive functions. The adequate functioning of part of the functions assessed by TLN-C may be considered prerequisites for an individual to produce adequate answers on the WISC-III (exceptions being Reading, Writing and Mathematical Reasoning). For instance, a minimum of motor skill is needed in the performance tests, both these and the verbal tests have oral instructions, requiring the use of receptive speech, and the response to the second group of tasks demands the use of expressive speech.
These relations reflect the theoretical principles that neuropsychological functioning and intellectual ability are closely related and affect each other (Ardila & Bernal, 2007). In a study with the original battery for children, Gilger and Geary (1985) detected a good capability of the LNNB-CR to trace neuropsychological deficits in expressive and receptive language functions, which were in accordance with discrepant results between verbal and performance scales in the WISC-R. More recent studies, with another largely used neuropsychological battery, the Halsted-Reitan Neuropsychological Battery, are also grounded on relations between intelligence and neuropsychological functions. A study with children presenting learning disabilities showed distinct result profiles in this battery in children from the various inferior ranges of the WISC-R (Davis et al. 2001).
Significant correlations were found among all subtests of TLN-C, showing cohesion of the test as a whole. The magnitudes of the correlations show patterns well-related to theoretical foundations. Subtests from the axis of academic skills had moderate to high magnitudes. Correlations between items with small theoretical relation, like Rhythm and Visual Skill, had low magnitudes. A finding that reinforces the cohesion of the test as a whole is that, generally, the highest correlation magnitudes happened between subtests and the total score. The obtained correlations between TLN-C and the WISC suggest validity evidences from relations with external variables, in this case, with a previously standardized instrument. Furthermore, the correlation among subtests of TLN-C suggests cohesion throughout its scores.
The regression analysis results reinforce the importance of the total score, adding to its property of reflecting the internal coherence of TLN-C, the property of contributing to intellectual performance in this sample. The results suggest that the total score of TLN-C explains better the IQ. This characteristic is in accordance with the fact that both the total score of TLN-C and of the WISC are heterogeneous and correlated measures, as discussed previously (Pfeiffer et al., 1987; Boyd & Hooper, 1993). The fact that models considering specific subtests along with the total score were less effective predictors also agrees with what we presented above about the support neuropsychological functions provide to intellectual performance.
Boyd and Hooper (1993), in a study of multivariate regression models involving age and the performance on the original battery for adults found the verbal IQ and, more markedly, the full-scale IQ, to have predictive capabilities. From their results, they suggested that the LNNB is as good as abbreviated forms of the WISC to predict intellectual performance.
The group of evidences about the total score of TLN-C, gathered in the present study, contributes with validity evidences of the instrument as a whole. However, as Pawlowski et al. (2007) point out, in an instrument of fast application that involves the assessment of several theoretical constructs (neuropsychological functions, in this case), it is also important to gather evidences about individual subtest validity, the way they are internally related and the way they relate to the total score. A step in this direction was made in this work by the correlation analysis among subtests, and it may be complemented by other procedures, always respecting the characteristics of TLN-C, as follows: factorial analysis, relations with instruments or their parts that assess constructs similar to one or some subtests of TLN-C, and relations of the test with other external criteria apart from intelligence. It is also important to collect comparative data between control and criterion groups, since the sample of children presented herein shows learning difficulties.
Referring to the precision or reliability of TLN-C, the Cronbach’s alpha coefficient showed a satisfactory result (.79). According to the Resolution 002/2003 of the Brazilian Federal Council of Psychology (CFP, 2003), the minimal acceptable value for this index is .60. Freire and Almeida (2001) suggested value intervals for classification: .80-.90, very good; .70-.80, respectable; .65-.70, acceptable; .60-.65, undesirable; below .60, unacceptable. It is also relevant to point out the coherence shown by the fact that subtests Writing, Reading, Immediate Memory presented more links with most of the other test items, since they represent complex cognitive functions that are supported by many simpler functions assessed by other subtests. The low contribution of Receptive Speech to internal consistency comes alongside the other findings about this subtest, which indicates psychometric inadequacy in its present configuration.
Moreover, in spite of being commonly applied (Ladesma et al. 2002), it should be noted that Cronbach’s alpha may not be the best procedure to evaluate the reliability of batteries or screening instruments. Such instruments usually involve an important diversity of functions, which constructs are not immediately related, despite the correlations found in our results indicating that there is at least a global coherence among the subtests of the instrument evaluated here.
A closely related theoretical problem was found on the validation process of the NEUPSILIN, and the authors propose some alternatives to Cronbach’s alpha that may be useful in complementing the reliability evidences of TLN-C (Pawlowski et al., 2007). Alternatives proposed by the authors are the agreement among judge scores and the test-retest procedure, which has already been used in the validation of the original battery, with results of 75 % mean steadiness between results (Plaisted & Golden, 1982).
The present study is part of a large project that has aimed to provide the TLN-C for clinical use. Notwithstanding the relevance of this study, there were limitations that should be addressed in subsequent research: (i) conduct performance comparison in TLN-C by genders and clinical subgroups; (ii) analyze correlations between the subtests of WISC and TLN-C. Moreover, studies are needed to investigate other type of the validity, as well as the normalization of the instrument.