Transcript
EXAMINING THE DIMENSIONALITY OF L2 READING COMPREHENSION OF TAIWANESE EFL BEGINNERS
P e i -Yu ( M a r i a n ) P a n Jeng-Shin Wu Hsin-Hao Chen Ya o - T i n g S u n g
1
LITERATURE REVIEW
2
What is reading comprehension? What skills can we measure?
Importance of identifying the dimensions of reading comprehension: Provide empirical support for test validity Influence the development of theories and models, assessment tools, instruction, and curriculum
Various classifications have been proposed.
3
Gray (1960) proposes three levels of understanding: reading the lines = literal meaning reading between the lines = inferred meaning reading beyond the lines = critical evaluation
Lennon (1962):
word knowledge comprehension of explicitly stated meaning comprehension of implicit/inferential meaning appreciation
4
Davis (1968):
Recalling word meanings Drawing inferences about the meaning of a word in context Finding answers to questions answered explicitly or in paraphrase Weaving together ideas in the content Drawing inferences from the content Recognizing a writer’s purpose, attitude, tone and mood Identifying a writer’s technique Following the structure of a passage
Munby’s (1978) taxonomy of microskills:
Recognizing the script of a language Deducing the meaning and use of unfamiliar lexical items Understanding explicitly stated information Understanding information when not explicitly stated Understanding conceptual meaning Understanding the communicative value of sentences ……
5
Weir (1994) proposed three operations in reading: Skimming Understanding main ideas and important detail Using linguistic contributory skills understanding grammatical notions, syntactic structure, discourse markers, lexical and or grammatical cohesion , and lexis
Abdullah’s (1994) critical reading skills:
evaluate deductive inferences evaluate inductive inferences evaluate the soundness of generalization recognize hidden assumptions identify bias in statements recognize author’s motives evaluate strength of arguments 6
Alderson (2005) - DIALANG: To understand/identify the main idea(s), main information in or main purpose of text(s) To find specific details or specific information To make inferences on the basis of the text by going beyond the literal meaning of the text or by inferring the approximate meaning of unfamiliar words
7
Those lists are theoretically persuasive, but lack suf ficient evidence. powerful frameworks for test construction
Can reading comprehension be divided into discrete skills? Unitary: highly overlapped skills can be represented by one underlying factor Multi-divisible
8
UNITARY VIEW AND EVIDENCE Rost (1993): L1 (Germany) reading comprehension ability of 220 second graders factor analysis: a general competence was found accounting for 85% of the variance for L1 reading comprehension
van Steensel, Oostdam, and van Gelderen (2013): SALT-reading 200 low-achieving seventh graders (L1) CFA: one underlying skill
Alderson (2005): the reading test of DIALANG 718 participants from different European nationalities Various factor analyses: one factor emerged and accounted for between 68% and 74% of the variance in reading 9
MULTI-DIVISIBLE VIEW AND EVIDENCE Jang & Roussos (2007) the reading subtest of TOEFL (1997) – July and August testlets about 3000 ESL students DIMTEST: July testlet: vocabulary, anaphora, main idea, synthesis, negation, and extrapolation August testlet: vocabulary, explicit info, inferencing, and synthesis
Song (2008) the Web-based English as a Second Language Placement Exam (WB -ESLPE) for ESL college students SEM 2 subskills 1. 2.
understand the main ideas, supporting information, and specific details (literal) make inferences (inferential)
Kong&Li (2009) the reading subtest of TEM4 (Test for English Majors – Level 4) 20,000 college students (English majors) EFA, CFA, and SEM 2 factors 1. literal comprehension 2. all the others (complex) 10
CONFIRMATORY FACTOR ANALYSIS x factors
RMSEA (Root Mean Square Error Of Approximation)
< 0.05
CFI
> 0.90 or 0.95
TLI
> 0.90 or 0.95
WRMR (Weighted Root Mean Square Residual)
Chi-square test for difference testing
1 vs 2
Value
0.026
Degree of freedom P-value
>1
1 0.8722
11
EXPLORATORY FACTOR ANALYSIS One of the most common methods to investigate dimensionality No presumptions; exploratory and linear factor analysis
Compare eigenvalues (>1); the % of the accounted variance 1 Eigenvalues 25.081
2
3
1.715
1.057
4
5
0.867
0.834
BCTEST 2009
Scree plot
30
Eigenvalues
25 20 15
10 5 0 1
2
3
4
5
6
7
8
9
10
12
Parallel analysis: combines exploratory factor analysis and simulation studies (Horn, 1966) Eigenvalues > simulated eigenvalues 1
2
3
4
5
Eigenvalues
21.972
1.435
1.034
0.880
0.816
Simulated Eigenvalues
1.369
1.342
1.311
1.296
1.281
13
NONLINEAR FACTOR ANALYSIS Problem of linear factor analysis: overestimate the number of factors item difficulty is sometimes mistaken for a latent variable (Carroll, 1945; McDonald & Ahlawat, 1974)
NOHARM, normal ogive harmonic analysis robust method (Fraser & McDonald, 2003) 1 factor
2 factors
4 factors
sum of squares of residuals (SSR)
0.0093
0.0040
0.0026
root mean square of residuals (RMSR)
0.0033
0.0022
0.0017
0.9975
0.9989
0.9993
Tanaka index
14
NONPARAMETRIC METHOD Use conditional covariance to analyze DIMTEST ( Stout, 1987; Stout, Froelich & Gao,2001 ): H0: essential unidimensionality vs H1: essential multidimensionality RESULTS
RESULTS
T
0.4696
T
2.1946
P-value
0.3193
P-value
0.0141
Result: do not reject H0 (unidimensional)
Result: reject H0 (unidimensional) multidimensional
15
DETECT (Zhang & Stout, 1999a, 1999b ): the data must conform to the approximate simple structure, meaning that one item only measures one dimension (more accurate results)
Maximum DETECT value (Kim, 1994) >1 , large multidimensionality 0.4~1 , moderate to large multidimensionality <0.4, weak multidimensionality <0.2, unidimensionality DIMPACK v1 .0 DIMTEST & DETECT Limitation: 7000 samples
16
TWO ISSUES IN EXISTING STUDIES ON L2 READING DIMENSIONALIT Y mostly applied explorary and confirmatory factor analysis, be it L1 or L2 ( e . g . , K o n g & L i , 2 0 0 9 ; M e n e g h e t t i , C a r r e t t i , & D e B e n i , 2 0 0 6 ; R o s t , 1 9 9 3 ; Song, 2 0 08; van Steensel , Oostdam , & van Gelderen , 2013; Zwick , 1987 )
Few lanugage test studies implemented other statistical techniques, such as DIMTEST, DETECT, or NOHARM (e.g., Jang& Roussos, 2007; Kim & Jang, 2009; Schedl, Thomas, & Way, 1996 )
17
tests being analyzed (e.g., TOEFL) more proficient learners lack observations on learners with low proficiency
Weir and Porter (1994): skill divisibility might be a function of the proficiency level Proficient readers unidimensional Less proficient readers possibly multidimensional
Alderson (2000): skills are more identifiable for beginning, weak, dyslexic or low -level second-language readers before their skills are matured and become integrated during the reading process May find multidimensionality of reading comprehension with less proficient readers (Alderson, 2000; Weir & Porter, 1994) Taiwan EFL students (junior high school students): ALTE level 1, CEFR A2, and ACTFL intermediate 18
RESEARCH METHOD
19
BCTEST Basic Competence Test for Junior High School Students (BCTEST)
a standardized achievement exam for 5 subjects, including English, Chinese, social studies, natural science, and math all junior high school students upon graduation in Taiwan
20
RESEARCH METHOD BCTEST 2009, 2010, and 2011 Conducted twice an year (May and July) Combined the reading comprehension items from both tests
May
July
Sum
2009
21
23
44
2010
24
25
49
2011
21
21
42
21
22
Literal comprehension: Extraction: retrieve required information from the text Integration: locate relevant pieces of information and integrate them to understand the main idea of the text or to obtain the answer
Inferential comprehension: Local inference: locate relevant information (usually 2 or 3 sentences) and infer its embedded meaning or message Global inference: : incorporate relevant information throughout the text (sometimes in conjunction with background knowledge) and infer its embedded meaning and message
Skill
Sub-skill
2009
2010
2011
19
21
14
Integration (global)
15
10
12
Local inference
7
11
8
Global inference
3
7
8
Literal comprehension Extraction (local) Inferential comprehension
23
Each year: random 7,000 participants Due to the limitation of DIMPACK (7000 only) Total: 21,000 participants
Conduct EFA, NOHARM, DIMTEST, and DETECT
24
RESULT
25
BCTEST 2009 30
25
Eigenvalues
20
15
10
5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
1
2
3
4
5
Eigenvalues
25.081
1.715
1.057
0.867
0.834
Simulated Eigenvalues
1.513
1.452
1.416
1.393
1.332
Accounted variance
0.570
+ 0.033 = 0.603
26
BCTEST 2010 35
30
Eigenvalues
25
20
15
10
5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
1
2
3
4
5
Eigenvalues
29.359
1.574
1.048
0.810
0.748
Simulated Eigenvalues
1.539
1.493
1.466
1.433
1.381
0.599
+ 0.032 = 0.631
Accounted variance
27
25
BCTEST 2011
20
Eigenvalues
15
10
5
0 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
1
2
3
4
5
Eigenvalues
21.972
1.435
1.034
0.880
0.816
Simulated Eigenvalues
1.369
1.342
1.311
1.296
1.281
0.523
+ 0.034 = 0.557
Accounted variance
28
BCTEST 2009 Chi-square test for difference testing
1 vs 2 (lit and inf)
1 vs 2 (loc and glob)
1 vs 4
0.026
*warning message
*warning message
Value Degree of freedom
1
P-value
0.8722 1 factor
RMSEA
0.024
CFI
0.991
TLI
0.991
WRMR
1.399
29
BCTEST 2010 Chi-square test for difference testing Value
1 vs 2 (lit and inf)
1 vs 2 (loc and glob)
1 vs 4
*warning message
6.502
*warning message
Degree of freedom
1
P-value
0.0108 1 factor
RMSEA
0.018
CFI
0.995
TLI
0.995
WRMR
1.189
30
BCTEST 2011 Chi-square test for difference testing
1 vs 2 (lit and inf)
1 vs 2 (loc and glob)
1 vs 4
1.661
*warning message
*warning message
Value Degree of freedom P-value
1 0.1975
1 factor RMSEA
0.019
CFI
0.994
TLI
0.994
WRMR
1.166
31
NOHARM BCTEST 2009
1-factor
2-factor
4-factor
Sum of squares of residuals
0.0175064 0.0126149 0.0114454
Root mean square of residuals
0.0043018 0.0036517 0.0034783
Tanaka index
0.9951499 0.9965051 0.9968291
BCTEST 2010
1-factor
2-factor
4-factor
Sum of squares of residuals
0.0230873 0.0217678 0.0195405
Root mean square of residuals
0.0044308 0.0043023 0.0040763
Tanaka index
0.9942921
0.994524
0.9950843
BCTEST 2011
1-factor
2-factor
4-factor
Sum of squares of residuals
0.0164699 0.0152079 0.0122780
Root mean square of residuals
0.0043736 0.0042027 0.0037763
Tanaka index
0.9961087 0.9964069 0.9970991
Result: unidimensional
32
DIMTEST BCTEST 2009
T
P-value
Trial 1
0.7935
0.2138
Trial 2
0.4621
0.3220
Trial 3
0.8687
0.1925
T
P-value
Trial 1
0.4696
0.3193
Trial 2
1.3067
0.0957
Trial 3
0.5062
0.3063
T
P-value
Trial 1
-0.9864
0.8380
Trial 2
1.4442
0.0743
Trial 3
1.0958
0.1366
BCTEST 2010
BCTEST 2011
Result: unidimensional
33
DETECT Maximum DETECT value BCTEST 2009
0.1075
BCTEST 2010
0.0803
BCTEST 2011
0.1131
Maximum DETECT value (Kim, 1994) >1, large multidimensionality 0.4~1, moderate to large multidimensionality <0.4, weak multidimensionality <0.2, unidimensionality
Result: unidimensional 34
SUM UP EFA ( + parallel analysis): the first factor accounted most of the variance (.52-.60) CFA: one factor (except for the bctest 2010:local and global) NOHARM SSR, RMSR, and Tanaka 4 factors (but the differences are actually very small) essentially 1 factor
DIMTEST P-value > .05 don’t reject HO unidimensional
DETECT Maximum DETECT values < .2 unidimensional
35
DISCUSSION
36
POSSIBLE CONSTRAINTS OF THE ITEMS MC items – students are limited to those options even when they may come up with their own unique interpretation which is equally legitimate “the very act of assessing and testing will inevitably af fect the reading process, and the fact that a learner has answered a question posed by a tester incorrectly does not necessarily mean that he or she has not understood the text in other ways or to his or her own satisfaction.” (Alderson, 2005, p. 120)
37
Sophia: The pizzas here are very good. Do you want some? Takako: Yeah, sure. Look! They have artichokes for the pizza La Primavera. What is an artichoke? Sophia: Well, it is a big flower. It has a heart in it. People take the heart and use it in salad or pizza. You can buy them in supermarkets. There is one near the train station. We may go there later. Here in Italy, people make pizzas with artichoke hearts. Takako: Cool! I want the pizza La Primavera then! Sophia: Great. Look! Your favorite chocolate ice cream comes with it. Isn’t it wonderful? Takako: I can’t wait! Dictionary: artichoke 朝鮮薊(一種蔬菜); heart 菜心; Italy義大利
According to the reading, where are Takako and Sophia?
Answer: a restaurant Other plausible answers: a place in a train station which sells pizza / in a train station
38
Local vs. Global In contrast to TOEFL or WB-ESLPE, TEM4, items are short and easy. One or two paragraphs maximum the distinction between local and global skills did not differ much
DIMTEST Local vs. Global (T and p-value)
BCTEST 2009 -1.7032 (0.9557)
BCTEST 2010
BCTEST 2011
-1.5662 (0.9414) 0.4071 (0.3419)
39
FINAL REMARKS Results are not meant to be generalized to other contexts (BCTEST EFL in Taiwan). BCTEST: standardized assessment (IRT) Currently, developing a reading comprehension test, covering from elementary to senior high school in Taiwan Only removed the items which had low discriminative power (2 or 3 items only) Conducted some initial analyses on dimensionality (gr 7 and 8) Still unidimensional
Psychological vs. psychometric dimensionality (Henning, 1992) Psychometrics can be confounded by the sample and the items being implemented. 40