9/26 Overheads

Random Error

Chance fluctuations in measurement

Examples:
    - What is the mood of the P's?
    - Did anyone misread the question?
    - Did the data analyst mistype some data?

Tends to cancel out over time

Systematic Error

Influence of unmeasured variables

Examples:
    - Socioeconomic status influences self-esteem but we don't measure it
    - A grocer weighs vegetables with his thumb on the scale
    - Everyone wants to appear nonprejudiced

Validity

Extent to which a variable measures what it's supposed to
- Essentially, it's accuracy

A watch that tells the right time has high validity

A watch that runs correctly but is always 93 minutes fast is not valid

A broken watch will be valid twice a day

Reliability

Extent to which a variable is free from random error
Essentially, it's consensus or consistency

A watch that runs correctly but is always 93 minutes fast is very reliable despite low validity

Charades Example

Validity: The extent to which people got the right answer for each charade; accuracy

Reliability: The extent to which the team members agreed on their responses; consensus/consistency

Bulls-eye Example

Hits centered around bulls-eye but scattered all over board: low reliability, OK validity

Hits concentrated in one spot, but not the bulls-eye: high reliability, low validity

Hits concentrated around bulls-eye: high reliability, high validity

Construct Validity

The basic kind of validity
- Does variable measure what we want it to?
- Does operational dv actually measure the conceptual dv?

Systematic error problematic

Several ways to check construct validity...

Face Validity

Does dv seem to measure what we want it to?

Downsides of relying on face validity:
- We can be wrong!
- An obvious dv can lead to reactance

Content Validity

Does dv cover the range of the conceptual variable?

Examples
- If Quiz #1 only covers 2 of the 4 chapters, then it lacks in content validity
- Same with a mood measure that only asks about positive moods

Criterion Validity

Simply put, does dv also correlate with observable behavior?

Example:
- If mood measure doesn't correlate with number of times person smiles, it's low in criterion validity

Finally...

Convergent Validity
- Extent to which dv is related to other dvs that supposedly measure the same construct

Discriminant Validity
- Extent to which dv is unrelated to other dvs that supposedly measure different constructs

Back to Reliability

Remember, it's consistency or consensus

Random error can reduce reliability

Systematic error doesn't influence reliability if it's truly systematic

Test-Retest Reliability

Extent to which you get the same score on a scale when you complete it twice

Retesting effects
- Practice effect
- Conversational norms

Avoid these with equivalent-forms reliability

Internal Reliability

Extent to which items on scale are correlated with each other

Expressed as alpha (a)

Interrater Reliability

Interrater Reliability is extent to which the judges agree on subjective coding/rating

Expressed as kappa (k)

Psychological Scale Example

When items on a scale measure what we want them to, that's validity

When items on a scale are consistently related to each other, that's reliability (a)

When one person gets the same score on a scale repeatedly, that's also reliability

Back to Overheads Main Page
Back to 381-002 Main Page