Assessment - statistical concepts

Assessment metrics - statistical concepts

Using comprehensive and interdisciplinary assessments, ASD (autism spectrum disorder) can be reliably diagnosed during the second year of life, and studies have demonstrated a relationship between early identification and positive outcome over time (Ozonoff, 2018).

How is ASD identified?

Direct Observation

Observing the child across a variety of contexts
Using standardized screening and diagnostic measures
Using tools that directly assess the child's social, communication, play, and behavioral functioning

Interviews with caregivers

Using standardized parent report measures
Assessing both developmental history and current functioning

What is a standardized measure?

A measurement that has been transformed to fit a common scale, thereby making it easier to compare and interpret, is said to have been standardized. For example, measuring the length of an object using a standard ruler graduated in inches results in a quantity that can be compared to other lengths measured using the same scale. This makes it easier to compare, say, the length of a shirt to one's own torso length if both are measured in inches. In contrast, if the seller of the shirt advertises the length in units of their hand length and you have to compare it to your own torso measured in units of a kitchen spoon, the comparison becomes cumbersome and less meaningful.

What is a standardized assessment?

Standardized Assessment

a test or examination that is administered and scored in a consistent, or "standard," manner
ensures fairness and comparability across different individuals or groups

Norm-referenced assessment

purpose: to compare individual's performance to a normative group (representative sample of population)
score interpretation: scores are interpreted based on how the individual performed relative to others in the norm group
emphasizes ranking individuals
focus is on distinguishing between high and low performers

Criterion-referenced assessment

purpose: to assess achievement of specific learning objectives or mastery of specific skills in an individual
score interpretation: scores are interpreted based on whether the individual meets predetermined criteria or standards
emphasizes measuring specific knowledge or skills
focus is on whether each test-taker has achieved the required standard rather than how they compare to others

Interpretation of norm-referenced assessment scores

The raw scores from an assessment (such as the total number of correct answers or cumulative ratings on a common scale etc.) are converted to a standard score for comparison with other test-takers. The standard score expresses the position of the raw score with respect to the mean and standard deviation of the statistical distribution that best fits the scores. Most assessments for ASD produce standard scores that follow a normal distribution.

Statistical Distribution

a mathematical function that describes the likelihood of various outcomes in a random experiment.
In the context of assessment scores, the distribution describes the probability that any particular score is obtained if the test is administered to a random test-taker.

Normal Distribution

a statistical distribution that has outcome values symmetrically distributed about the mean.

Measures of Central tendency

Measures of central tendency are statistical summaries that describe the central or typical value of a dataset, around which data points tend to cluster. Any single data point can then be described in terms of how far or near it is from this typical value. There are three measures of central tendency for any statistical distribution:

Mean - The arithmetic average obtained by summing all values in a dataset and dividing by the number of values. For example, the mean income in a population can be understood as the income that each person would receive if the totality of incomes was equitably distributed among all the people.
Median - The middle value (or the average of two middle values for even number of values) when a dataset is ordered from least to greatest. It more accurately represents the central value of a dataset when the distribution is skewed or contains extreme outliers. For example, the median income in a population is the income that separates the lower half of earners from the upper half in the population.
Mode - The value that occurs most frequently in the dataset. It can be applied to non-numerical data as well. For example, the mode of grades obtained in a classroom in a particular subject is the grade obtained by most students.

In a normal distribution, the mean, median and mode coincide at the center of the distribution. Visually, this is the point at the peak of the bell curve associated with such a distribution.

Other measures of a distribution

Standard Deviation

The standard deviation is a measure of the amount of variation or dispersion in a dataset. It quantifies how spread out the values are around the mean. For example, a large standard deviation in income distribution in a population implies less equitable distribution of income compared to a low standard deviation.

Range

The range is another measure of the amount of spread or dispersion in a dataset. It represents the difference between the maximum and minimum values in the dataset but does not provide an idea of the distribution of values within the spread. For example, a large range of incomes in a population simply means that there are very high and very low earners. Such a population could still have most incomes clustered near the center.

What information does an individual's score on an assessment contain?

The amount of useful information conveyed by an assessment depends on the reliability and validity of the instrument.

Reliability

The consistency and stability of test scores across administrations
Indicates the degree to which an assessment is free from random error

Validity

The degree to which an assessment accurately measures what it is intended to measure.
Ensures that the test scores reflect the actual construct or skill being measured allowing for meaningful interpretation.

SEM (Standard Error of Measurement)

Quantifies the amount of error inherent in an individual's observed score due to imperfections in the assessment process.
Lower SEM implies the test score is more precise, i.e., the observed score is likely to be close to the true score.

Confidence Interval

The range of values within which an individual's true score falls at a specific level of certainty.
It is expressed in terms of the Margin of Error, which is based on the SEM of the test and degree of certainty (probability) desired.
Confidence Interval = [Observed Score - Margin of Error, Observed Score + Margin of Error]

The observed standard score earned by an individual in an assessment can be compared to the mean standard score of the standardization sample. In addition, the deviation of the score from the mean can be expressed in terms of the standard deviation of the standardization dataset using a measure known as the z-score. Another measure that helps in comparing an individual's score with the population scores is the percentile rank.

z-score

The z-score is a transformation of an individual's observed score that expresses the number of standard deviations that the individual's score is from the mean.
A z-score of (+/-)1.5 would mean that the individual's score on the assessment is one and a half standard deviations away from the mean score in the standardization sample.
The greater the magnitude of the z-score, the larger the deviation of the traits/performance of the individual measured by the assessment from the population average.

Percentile Rank

The percentile rank is a measure that indicates the relative standing of an individual score within a distribution of scores.
It is interpreted as the percentage of the test-taking population that is expected to score at or below an individual's score.

For example, if an individual's standard score on an assessment was found to be at a percentile rank of 50, this means that half the population is expected to score at or below this individual's score (the area under the bell curve shaded in red in the figure on the left). Whereas, if their percentile rank was found to be 95, this means that only 5% (a twentieth portion) the population is expected to score above this individual on this test (the area under the bell curve not shaded in blue in the figure on the right).

Rohini Knudson

SPED Professional Portfolio