Measuring the reliable difference between ratings based on Inter-Rater`s reliability in our study led to a 100% rating agreement. On the other hand, a considerable number of different evaluations were identified in the RCI calculation based on the more conservative reliability of the manual tests; The absolute approval rate was 43.4%. The use of this conservative RCI estimate did not result in a significantly higher number of identical or divergent assessments, either for a single rating subgroup, or for the entire population studied. (see Table 2 for the results of the corresponding binomial tests). Therefore, the probability of a child receiving a matching assessment was no different from chance. When the reliability of the study was used, the probability of obtaining correlated ratings was 100%, which is significantly higher than random. Wolraich, M. L., Lambert, E.W., Bickman, L., Simmons, T., Doffing, M. A. and Worley, K. A. (2004). Assessing the impact of the parent-teacher agreement on the diagnosis of attention deficit hyperactivity disorder.

J. Dev. Mr. Behav. Pediatr. 25, 41-47. doi: 10.10.1097/00004703-2004020000-00007 Messick, p. (2000). “Consequences of test interpretation and test use: the fusion of validity and values in psychological assessment,” in Problems and Solutions in Human Assessment: Honoring Douglas N. Jackson at Seventy, eds R. D. Goffin and E.

Helmes (Norwell, MA: Klewer Academic Publishers), 3-20. As Marusteri and Bacarea (9) have found, there is never 100% certainty about the results of the research, even if the statistical significance is reached. The statistical results used to test hypotheses about the relationship between independent and dependent variables are meaningless when there are inconsistencies in the evaluation of variables by evaluators. If the agreement is less than 80%, more than 20% of the data analysed is wrong. With a reliability of only 0.50 to 0.60, it is understandable that 40 to 50% of the data analyzed is wrong. If Kappa values are less than 0.60, the confidence intervals around the received kappa are so wide that it can be assumed that about half of the data may be false (10). It is clear that statistical significance does not mean much when there are so many errors in the results tested. The agreement and the pre-agreement actually observed constitute a random agreement. As noted above, pearson correlations are the most commonly used statistics when reliability between rats is assessed in the field of expressive vocabulary (e.g. B Bishop and Baird, 2001; Janus, 2001; Norbury et al., 2004; Bishop et al., 2006; Massa et al., 2008; Gudmundsson and Gretarsson, 2009) and this trend extends to other areas such as. B language deficiencies (e.g. B Boynton Hauerwas and Addison Stone, 2000) or learning difficulties (z.B Van Noord and Prevatt, 2002).

## Recent Comments