Chance fluctuations in measurement
Examples:
- What is the mood of the P's?
- Did anyone misread the question?
- Did the data analyst mistype some data?
Tends to cancel out over time
Systematic Error
Influence of unmeasured variables
Examples:
- Socioeconomic status influences self-esteem but
we don't measure it
- A grocer weighs vegetables with his thumb on the
scale
- Everyone wants to appear nonprejudiced
Validity
Extent to which a variable measures what it's supposed to
- Essentially, it's accuracy
A watch that tells the right time has high validity
A watch that runs correctly but is always 93 minutes fast is not valid
A broken watch will be valid twice a day
Reliability
Extent to which a variable is free from random error
Essentially, it's consensus or consistency
A watch that runs correctly but is always 93 minutes fast is very reliable
despite low validity
Charades Example
Validity: The extent to which people got the right answer for each charade; accuracy
Reliability: The extent to which the team members agreed on their
responses; consensus/consistency
Bulls-eye Example
Hits centered around bulls-eye but scattered all over board: low reliability, OK validity
Hits concentrated in one spot, but not the bulls-eye: high reliability, low validity
Hits concentrated around bulls-eye: high reliability, high validity
Construct Validity
The basic kind of validity
- Does variable measure what we want it to?
- Does operational dv actually measure the conceptual
dv?
Systematic error problematic
Several ways to check construct validity...
Face Validity
Does dv seem to measure what we want it to?
Downsides of relying on face validity:
- We can be wrong!
- An obvious dv can lead to reactance
Content Validity
Does dv cover the range of the conceptual variable?
Examples
- If Quiz #1 only covers 2 of the 4 chapters, then
it lacks in content validity
- Same with a mood measure that only asks about
positive moods
Criterion Validity
Simply put, does dv also correlate with observable behavior?
Example:
- If mood measure doesn't correlate with number
of times person smiles, it's low in criterion validity
Finally...
Convergent Validity
- Extent to which dv is related to other dvs that
supposedly measure the same construct
Discriminant Validity
- Extent to which dv is unrelated to other dvs that
supposedly measure different constructs
Back to Reliability
Remember, it's consistency or consensus
Random error can reduce reliability
Systematic error doesn't influence reliability if it's truly systematic
Test-Retest Reliability
Extent to which you get the same score on a scale when you complete it twice
Retesting effects
- Practice effect
- Conversational norms
Avoid these with equivalent-forms reliability
Internal Reliability
Extent to which items on scale are correlated with each other
Expressed as alpha (a)
Interrater Reliability
Interrater Reliability is extent to which the judges agree on subjective coding/rating
Expressed as kappa (k)
Psychological Scale Example
When items on a scale measure what we want them to, that's validity
When items on a scale are consistently related to each other, that's reliability (a)
When one person gets the same score on a scale repeatedly, that's also
reliability
Back to Overheads Main Page
Back to 381-002 Main Page