AP Psychology Unit 2 (Cognition): Understanding Intelligence and Psychological Testing

Intelligence and IQ Testing

What psychologists mean by “intelligence”

Intelligence is your capacity to learn from experience, solve problems, and use knowledge to adapt to new situations. In AP Psychology, it helps to think of intelligence as a constructed idea rather than a single, directly observable “thing” in the brain. You can’t open someone’s head and measure “intelligence units.” Instead, psychologists infer intelligence from patterns in performance: how well you reason, learn, remember, plan, and handle novel problems.

This matters because intelligence is used to make high-stakes decisions in schools and sometimes workplaces—placement in advanced courses, eligibility for services, and identifying learning needs. Because the consequences are real, the key question becomes: How do we measure intelligence in a way that is meaningful, consistent, and fair?

A common misconception is that intelligence equals “how much you know.” Knowledge matters, but many intelligence tests are designed to measure reasoning and problem-solving, not just memorized facts. Another misconception is that intelligence is fixed. While there are stable individual differences, performance can shift with development, education, health, motivation, and context.

What an IQ score is (and what it is not)

IQ (intelligence quotient) is a score derived from a standardized test designed to assess human intelligence. Modern IQ scores are usually deviation IQ scores—meaning your performance is compared to others in your age group.

Most widely used IQ tests are scaled so that:

The mean (average) is 100.
The standard deviation is often 15.

So an IQ score is best understood as a relative position within a norm group, not a direct measure of your worth, potential, or “brain power.” Two people can have the same IQ and very different strengths (for example, one excels verbally, the other spatially).

The normal curve and what “standardized” really means

Most IQ tests are designed so that scores approximate a normal distribution (a bell-shaped curve) in the population used for norms. Standardization is the process of administering the test to a large, representative sample and establishing norms—reference points that tell you what counts as “typical” performance.

Why this matters: without norms, a score is just a raw number. Norms allow interpretation.

A helpful way to connect scores to the bell curve is through z-scores, which tell you how many standard deviations a score is above or below the mean:

$z = \frac{X - \mu}{\sigma}$

$X$ = the raw score (or scaled score)
$\mu$ = the mean of the distribution
$\sigma$ = the standard deviation

If an IQ test uses mean 100 and standard deviation 15, you can relate z-scores to IQ values:

$IQ = 100 + 15z$

Worked example (concept first, then numbers): If you score one standard deviation above average, your z-score is 1. That places you higher than most people in the norm group.

Using the formula:

$IQ = 100 + 15(1) = 115$

A common mistake is treating these formulas like they reveal “true intelligence.” They don’t. They just map your performance onto a scale relative to a comparison group.

Major IQ tests you should recognize

You’re not expected to memorize every subtest, but you should know what these tests are and why they were historically important.

Stanford-Binet

The Stanford-Binet is a modern descendant of an early intelligence test developed from work by Alfred Binet (later revised at Stanford). Binet’s original goal was practical: identify children who needed educational support. The key idea was that test items should reflect skills that typically develop with age.

A frequent misconception: “Binet invented IQ to label kids permanently.” Historically, his intent was the opposite—he saw intelligence as improvable and wanted a tool to help students.

Wechsler scales

David Wechsler created widely used tests such as:

WAIS (Wechsler Adult Intelligence Scale)
WISC (Wechsler Intelligence Scale for Children)

Wechsler tests are important in AP Psychology because they emphasize that intelligence is not a single number. They yield:

a Full Scale IQ (overall score)
and index scores (different domains such as verbal comprehension, working memory, processing speed)

This structure matters because two students can have the same overall IQ but very different profiles—one might have strong verbal reasoning but weaker processing speed, which can change what supports are helpful.

Achievement vs aptitude: what tests are trying to predict

A classic AP distinction:

Achievement tests measure what you have learned (for example, a final exam).
Aptitude tests are designed to predict future performance (for example, a test used to predict success in a training program).

In real life, the distinction is blurrier than it sounds. Aptitude is often influenced by past learning opportunities. A student’s “aptitude” score can reflect quality of schooling, test familiarity, language background, and access to resources.

Reliability: do you get consistent results?

Reliability is the consistency of a measure. If a test is unreliable, you can’t trust the score, because it might change randomly from day to day.

Key types:

Test-retest reliability: do scores stay similar over time?
Split-half reliability: do two halves of the test produce similar scores?
Inter-rater reliability (more relevant for scored performance tasks): do different scorers agree?

Why it matters: If an intelligence test is used to place a student into a program, the score should not swing wildly because of random noise.

What goes wrong: Students often assume reliability means “accurate.” Reliability only means consistent. A bathroom scale that is always 10 pounds off is reliable but not valid.

Validity: are you measuring what you think you’re measuring?

Validity is the extent to which a test measures what it claims to measure (and supports appropriate interpretations and uses of scores).

Common validity ideas in AP Psychology:

Content validity: does the test cover the relevant behaviors/skills?
Criterion-related validity: does the test predict an outcome (for example, academic performance)?
Construct validity: does the test actually measure the psychological construct (like “general intelligence”) rather than something else?

Why it matters: A test can be reliable and still miss the target. For example, a vocabulary-heavy “intelligence” test might partly measure education and language exposure.

The Flynn effect (rising test scores over time)

The Flynn effect refers to the observed rise in average performance on intelligence tests across generations in many places. It matters because it highlights that test performance is influenced by environment (education, nutrition, familiarity with abstract problem-solving), not only genetics.

A common misconception is that the Flynn effect proves humans are becoming “smarter” in every sense. The effect is about test performance, and the gains are often strongest on certain kinds of tasks (like abstract reasoning) rather than all skills equally.

Exam Focus

Typical question patterns:
- Interpret a scenario to identify reliability vs validity (for example, consistent scores that don’t predict outcomes).
- Read a bell-curve description and infer what an IQ score implies about relative standing.
- Distinguish achievement vs aptitude and explain what each predicts.
Common mistakes:
- Treating IQ as a direct, fixed measure of “worth” rather than a norm-referenced score.
- Confusing reliability (consistency) with validity (accuracy/appropriate meaning).
- Forgetting that standardization and norms are what make scores interpretable.

Theories of Intelligence

Why there are multiple theories

Different theories exist because intelligence is complex: people show strengths in different areas, tests measure some abilities better than others, and psychologists disagree about whether intelligence is one general capacity or many specific ones. Knowing the major theories helps you interpret what an IQ score does—and does not—capture.

A big theme you’ll see: theories often disagree on structure (one general ability vs many abilities) and on emphasis (reasoning vs creativity vs practical skills vs emotional abilities).

Spearman’s general intelligence (g) and specific abilities (s)

Charles Spearman proposed that performance on diverse cognitive tasks tends to correlate: if you do well on one mental task, you often do reasonably well on others. He called this shared factor general intelligence (g).

He also recognized specific abilities (s)—skills that are more task-specific (for example, being particularly strong at mental rotation).

How it works (the logic):

Give many different cognitive tests to many people.
Notice positive correlations among scores.
Infer a common underlying factor (g) contributing to all performances.

In action: If a student tends to score above average on vocabulary, pattern recognition, and working memory tasks, Spearman’s view says there is likely a general factor helping across domains.

What goes wrong: Students sometimes think “g” means one brain module that controls everything. In reality, “g” is a statistical pattern (a factor inferred from correlations), not a single physical spot in the brain.

Thurstone’s primary mental abilities

Louis Thurstone challenged the idea that one factor explains everything. He proposed multiple primary mental abilities, such as verbal comprehension, numerical ability, spatial ability, and perceptual speed.

This matters because it supports the idea of a profile of abilities rather than one “true” intelligence. Many modern tests reflect this by reporting multiple index scores.

In action: Two students might have the same overall IQ score but very different Thurstone-style patterns: one strong spatially, one strong verbally.

Gardner’s multiple intelligences

Howard Gardner proposed multiple intelligences, arguing that humans have several relatively independent abilities (commonly described as including linguistic, logical-mathematical, spatial, musical, bodily-kinesthetic, interpersonal, intrapersonal, and naturalistic). The core idea is that what schools call “smart” is too narrow.

Why it matters: Gardner’s theory pushes you to think about real-world talents and culturally valued skills. It’s especially influential in education because it encourages varied ways to teach and assess.

How it works (conceptually): Instead of one mental engine, you have multiple capacity areas. A person can be “intelligent” in a domain even if they are average in others.

In action: A student who struggles with algebra but excels at music composition might be seen as highly capable in a way traditional tests under-measure.

What goes wrong (important nuance for AP): On exams, it’s common to note that Gardner’s theory is influential but that some critics argue these “intelligences” may be better described as talents or personality-related abilities rather than independent intelligences measured with the same psychometric rigor as IQ.

Sternberg’s triarchic theory: analytic, creative, practical

Robert Sternberg proposed that intelligence includes:

Analytic intelligence: problem-solving and academic-type reasoning.
Creative intelligence: generating novel ideas and adapting to new situations.
Practical intelligence: applying knowledge to real-world contexts (often described as “street smarts”).

Why it matters: Sternberg’s model explains why someone might do well in everyday problem-solving but not shine on traditional tests. It also highlights that success often depends on adapting to environments, not just solving abstract puzzles.

In action:

Analytic: figuring out the logic of a word problem.
Creative: designing a new way to study that fits your learning habits.
Practical: negotiating a schedule conflict with a teacher using social awareness and planning.

Common misconception: Students sometimes think practical intelligence is “not real intelligence.” Sternberg’s point is that adaptation and real-world effectiveness are central to what intelligence should mean.

Emotional intelligence

Emotional intelligence refers to the ability to perceive, understand, manage, and use emotions effectively in yourself and others. It overlaps with social skills but emphasizes emotion-related processing.

Why it matters: Emotional intelligence helps explain success in relationships, leadership, teamwork, and conflict resolution—areas not captured well by traditional IQ tests.

How it works (a useful breakdown):

Recognize emotions (facial expressions, tone of voice, internal cues).
Understand causes and patterns (why you feel stressed before a test).
Regulate emotions (calm yourself, reframe setbacks).
Use emotions to support goals (motivation, empathy, communication).

In action: A student who manages test anxiety and collaborates well in groups may achieve more than a similarly “smart” student who shuts down under stress.

Fluid vs crystallized intelligence (a developmental lens)

A very testable distinction:

Fluid intelligence: the capacity to reason and solve novel problems independent of specific learned knowledge.
Crystallized intelligence: accumulated knowledge and verbal skills built from experience and education.

Why it matters: This distinction helps you understand changes across the lifespan. Fluid abilities are often linked to speed and novel reasoning, while crystallized abilities reflect learning and tend to grow with education and experience.

In action:

Fluid: solving a new type of logic puzzle you’ve never seen.
Crystallized: knowing vocabulary words or historical facts.

What goes wrong: Students sometimes assume crystallized intelligence is “less real” because it’s learned. In psychology, learned knowledge is part of cognitive competence and is meaningful.

Exam Focus

Typical question patterns:
- Match a real-world example to a theory: Spearman (g), Gardner (multiple intelligences), Sternberg (triarchic), fluid vs crystallized.
- Explain why two people with similar IQs might differ in creativity, practical success, or emotional functioning.
- Compare theories: “one general factor” vs “multiple abilities.”
Common mistakes:
- Treating Gardner’s categories as if they are all measured and validated like IQ (AP questions often expect you to note measurement concerns).
- Mixing up fluid (novel reasoning) with crystallized (learned knowledge).
- Assuming “g” is a single brain structure rather than a factor inferred from correlations.

Bias in Testing

What “bias” means in psychological testing

In everyday speech, “bias” can mean “unfair.” In testing, bias has a more specific meaning: a test is biased if it measures something systematically differently for different groups in a way that leads to invalid or unfair interpretations.

It’s possible for a test to show group differences without being biased, and it’s also possible for a test to be biased even if average group scores look similar. The key is whether the test has the same meaning across groups and whether predictions based on the test are equally accurate.

This matters because intelligence tests can influence educational tracking and access to opportunities. If the measurement process contains systematic unfairness, it can reinforce social inequalities.

Cultural bias and the problem of unequal familiarity

Cultural bias can occur when test items assume knowledge, experiences, or language patterns more common in one cultural group than another. If an item relies on culturally specific references, then the test may partially measure cultural exposure rather than the intended reasoning skill.

How it works (mechanism):

A test item includes vocabulary, scenarios, or problem formats that are more familiar to some groups.
Familiarity reduces cognitive load and increases performance.
Scores reflect both reasoning ability and cultural exposure.

In action: If a verbal analogy uses an idiom common in one community but not another, students unfamiliar with the idiom may miss the item even if their underlying reasoning is strong.

Common misconception: “Removing all culture makes a test perfectly fair.” In practice, truly culture-free testing is extremely difficult because language, schooling, and problem-solving styles are shaped by culture. Many tests aim for reduced bias through careful item analysis and diverse norming samples rather than pretending culture doesn’t exist.

Language, socioeconomic status, and educational opportunity

Test performance is influenced by:

Language proficiency (especially on verbally loaded tasks)
Quality of schooling (curriculum coverage, teacher resources)
Socioeconomic factors (access to books, tutoring, stable study environments)

This is not a claim that tests measure “only privilege.” It’s a reminder that intelligence test scores reflect both individual cognitive skills and the opportunities that helped build and express those skills.

A key AP idea is that tests can be standardized and reliable yet still reflect environmental inequalities. Standardization ensures consistent administration; it does not guarantee equal preparation.

Stereotype threat: when context changes performance

Stereotype threat is a situational pressure in which people worry that their performance will confirm a negative stereotype about their group. That anxiety can consume working memory and attention, lowering performance.

How it works (step by step):

A person is placed in a situation where a stereotype is relevant (for example, an ability test).
They become vigilant about being judged through the stereotype.
Stress and self-monitoring increase.
Working memory resources are diverted away from the task.
Performance decreases, ironically making the stereotype seem “true.”

In action: If a student is reminded right before a test that “people like you usually do worse,” the reminder can reduce scores even if ability is unchanged.

What goes wrong: Students sometimes interpret stereotype threat as an “excuse.” In psychology, it’s a well-studied example of how context and cognition interact—especially through attention and working memory.

Test fairness, predictive validity, and differential prediction

A practical way to discuss fairness is through predictive validity: does the test predict relevant outcomes (like grades) equally well for different groups?

If a test systematically overpredicts or underpredicts performance for a group (for example, the same score corresponds to different average outcomes), that suggests a problem with how the score is being interpreted or used.

This framing matters because it separates two questions:

Are group averages different?
Does the test function the same way across groups and support fair decisions?

On AP-style questions, you may be asked to explain bias without claiming “any group difference proves bias.” The more defensible explanation is about measurement equivalence, opportunity, and context effects.

Strategies used to reduce bias (and their limits)

Testing organizations and psychologists use several approaches:

Representative standardization samples: norms based on diverse populations.
Item analysis: removing or revising items that function differently across groups.
Nonverbal or less language-loaded tasks: reducing dependence on specific vocabulary.
Clear administration and scoring rules: limiting scorer subjectivity.

Limits to remember: Even well-designed tests can’t fully remove the impact of unequal educational experiences or chronic stressors. Also, “nonverbal” does not automatically mean “culture-free”—test-taking strategies and familiarity with puzzle-like tasks can still vary.

Group differences: what AP expects you to say carefully

AP Psychology sometimes addresses group score differences alongside bias. The key skill is to explain that:

Differences in average scores can reflect many factors (environmental, educational, social, and potentially genetic influences).
Bias is about whether the test measures the construct equivalently and is interpreted fairly.

A common mistake is making absolute claims (“IQ tests are totally biased” or “IQ tests are totally unbiased”). A more accurate approach is nuanced: many modern tests are carefully standardized and can predict certain outcomes, but scores are still influenced by context, opportunity, and stereotype threat, and fairness depends on how tests are used.

Exam Focus

Typical question patterns:
- Apply stereotype threat to a scenario and explain the cognitive mechanism (anxiety and reduced working memory/attention).
- Identify how a test item could show cultural bias and propose a fix (rewording, different norming, item removal).
- Distinguish group differences from test bias and explain what would count as evidence of bias.
Common mistakes:
- Claiming that any group difference automatically proves a biased test (AP questions often want the more precise definition).
- Describing bias only as “someone was unfair” rather than as a measurement and interpretation problem.
- Forgetting to connect stereotype threat to performance processes (attention, stress, working memory), not just feelings.