[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Why MCT's? 2nd Try



If you reply to this very long (32 kB) post please don't hit the reply button unless you prune the copy of this post that may appear in your reply down to a few relevant lines, otherwise the entire already archived post may be needlessly resent to subscribers.

*******************************************************
ABSTRACT: In an ARN-L post of 7 March 2007, I pointed out that the approximately two-standard deviation superiority in normalized gains of interactive engagement (IE) over traditional (T) courses does not support Peter Campbell's suggestion of 27 January 2007 that students in IE courses do better than students in T courses only because the IE course students are inherently better at taking multiple choice tests (MCT's) than the T students. On 8 Mar 2007 Peter responded that (a) causal relationships between IE (T) courses and higher (lower) MCT scores cannot be established, (b) IE course students may do better than T course students because the IE courses preferentially enhance the MCT-taking abilities of IE students, (c) the claimed IE superiority may be an artifact of non-random selection. In this post I suggest that: "a" indicates a misunderstanding of my research, "b" is extremely unlikely, and "c" is a misconception related to the mistaken notion that randomized control trials are the gold standard of assessment.
*******************************************************

My ARN-L post of 7 March 2007 "Re: Why MCT's? 2nd Try" [Hake (2007b)], was my second attempt to get though to ARN-L with a post "Re: Why MCT's? (was Lauren Resnick and higher-order thinking skills)" [Hake (2007a)], transmitted to ARN-L and PhysLrnR on 6 Feb 2007. In both the 6 Feb and 7 March posts I wrote [bracketed by lines "HHHHHH. . . ."]

HHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Peter's reaction . . .[ of 27 January in Campbell (2007a) that MCT's only indicate how good students are at taking MCT's]. . . might well be justified for the results of most MCT evaluations. However for my own survey [Hake (1998a,b)] and subsequent confirming work by many other physics education research groups [for references see Hake (2007c)] **it would not be easy to argue that the approximately two-standard deviation superiority in normalized gains of interactive engagement (IE) over traditional (T) courses, was due to the fact that students in the IE courses just happened to be a lot better at taking MCT's than students in the T courses.**
HHHHHHHHHHHHHHHHHHHHHHHHHHHHH

In response, Peter Campbell (2007b) on 8 Mar 2007, made 4 points [my inserts at ". . . . [insert]. . . ."]:

111111111111111111111111111111111111111111111
1. "Richard - You're arguing that students who were taught using "interactive engagement" scored better on MCT's than students who were taught using "traditional" course methods. You use this as proof that MCT's reflect the superiority of "interactive engagement" over "traditional" course methods. Is that a fair summary? "

NO! Your (a) "scored better on MCT's" needs to be replaced with (b) "obtained higher normalized pre-to-post test normalized gains on MCT's." There's a world of difference between the correct wording "b" and your incorrect wording "a". I suspect that you may not have carefully read the online reports [Hake (1998a,b)] of my research.

Furthermore, it should also be noted that the "Mechanics Diagnostic" (MD) test and the "Force Concept Inventory" (FCI) were not just your everyday problematic MCT's. They were developed through lengthy and arduous qualitative and quantitative research by disciplinary experts [Halloun & Hestenes (1985a,b)] and their use has been shown to be valid ["internal", "external", "construct", and "statistical conclusion"- see e.g. Shadish, Cook, & Campbell (2000, pp. 33-42)]. Also the MD and FCI have been shown to be consistently reliable, as judged by relatively high Kuder-Richardson reliability coefficients KR-20 in the 0.8 to 0.9 range (see, e.g., Halloun & Hestenes, 1985b; Hake, 1998a, 1998b).


2222222222222222222222222222222222222222222
2. "If so, here are the problems I see (slightly edited by Hake):

A. You're establishing a causal relationship between 'interactive engagement' and higher scores. . . .[NO! between 'interactive engagement' and normalized pre-to-post test gains]. . . .

B. You're also establishing a causal relationship between 'traditional' course methods and lower scores. . . . .[NO! between 'traditional' course methods and normalized pre-to-post test gains]. . . .

C. You're arguing that MCT's can reflect the superiority of 'interactive engagement.'

Because A and B cannot be established, you cannot, therefore, argue that MCT's can measure something that cannot be established. Here's a rather crude analogy :

1. Babies fed soy milk are happy.

2. Babies fed breast milk are not happy.

3. The babies fed soy milk gained 20 pounds, whereas the breast fed babies gained only 5 pounds.

4. Conclusion: the 15-pound difference between the two groups shows the superiority of soy milk as a nutritional supplement for babies. "

I agree with Peter that the analogy is crude - so crude in fact that, in my opinion, it's irrelevant to the present argument - as, I hope will be made clear in the remainder of this post.

An analogy which IS relevant (as I hope will be made clear in the remainder of this post) is the following:

1. IE courses show relatively high average normalized gain <<g>> on MCT's.

2.  T courses show relatively low average normalized gain <<g>> on MCT's.

3.  IE courses gained approximately 2sd's greater in <<g>> than T courses.

4. Conclusion: the above approximately 2sd difference in the <<g>>' s between the IE and T courses shows the superiority of IE courses for enhancing students' conceptual understanding of Newtonian mechanics.

In the above the double angle brackets <<g>> indicate an average over courses of the average normalized gain <g> for each course.

Peter claims that "A" and "B" above can't be established. What does he you mean? IF he means that "A" and "B" (with my corrections) can't be established with complete 100% certainty for all time, then the same would be true for any purportedly causal relationship developed through *scientific* research, and his assertion would be correct but both trivial and inapplicable. I never argued that MCT's can measure anything that can be established with complete 100% certainty for all time.

The last sentence of the abstract of Hake (1998a) is: "The conceptual and problem-solving test results **strongly suggest** that the classroom use of IE methods can increase mechanics-course effectiveness well beyond that obtained in traditional practice. "

Shavelson & Towne (2002, p. 16) put the matter well:

"Mistakes are made as science moves for forward. The process is not infallible [see Lakatos & Musgrave (1970)]; science advances through professional criticism and self correction. . . . .Popper (1959) argues that knowledge always remains conjectural and potentially revisable, largely by the process of testing (seeking refutations) that Popper (1965) himself described."

In Hake (2007c) I wrote [see that article for references other than Hake (1998 a,b; 2002a,b) and Shavelson & Towne (2002)]:

HHHHHHHHHHHHHHHHHHHHHHHHHHHHH
The approximately two-sigma superiority of IE over T courses in introductory mechanics [shown in Hake (1998a,b)] has been independently corroborated in hundreds of courses with widely varying types of instructors, institutions, and student populations [see e.g., the references in Hake (2002a,b)], thus satisfying Shavelson & Towne's (2002) fifth principle of good scientific practice [my CAPS]:

"Replicate and Generalize Across Studies: By one replication we mean, at an elementary level, that if one investigator makes a set of observations, another investigator can make a similar set of observations under the same conditions . . . . . . . At a somewhat more complex level, REPLICATION MEANS THE ABILITY TO REPEAT AN INVESTIGATION IN MORE THAN ONE SETTING (FROM ONE LABORATORY TO ANOTHER OR FROM ONE FIELD SITE TO A SIMILAR FIELD SITE) AND REACH SIMILAR CONCLUSIONS."
. . . . . . . . . . . . . . . . . . . . . . . .  .
. . . . . . . . . . . . . . . . . . . . . . . .  .
Although ignored by most PEP's [psychologists, education specialists, and psychometricians], and even by some physicists [for example, those contributing to McCray, DeHaan, & Schuck (2003)], the above indicated research [Hake (1998a,b)] has been noted positively by workers in many different disciplines [astronomy, biology, chemistry, cognitive science, communication, economics, engineering, geoscience, mathematics, medicine, physics, and even psychology !]. See e.g. : Marchese (1997); Swartz (1999); Heller (1999); Zeilik et al. (1999); Breslow (1999, 2000); Rothman & Narum (1999); Nelson (2000); Albacete & VanLehn (2000); Stokstad (2001); Morote & Pritchard (2002); Savinainen & Scott (2002a,b); Dancy & Beichner (2002); Powell (2003); Elliott (2003); Klymkowsky et al. (2003); Wood & Gentile (2003); McConnell et al. (2003); Evans et al., (2003); Hegedus & Kaput (2004); Handelsman et al. (2004); Pavelich et al. (2004); Khodor et al. (2004); DeHaan (2005); Buck & Wage (2005); Smith et al. (2005); Hilborn (2005); Moore (2005); Wieman & Perkins (2005); Heron & Meltzer (2005); Kluck (2005); Bardar et al. (2006); Froyd et al. (2006); Nuhfer (2006a,b); and Michael (2006).
HHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Furthermore in "Design-Based Research in Physics Education Research: A Review" [Hake (2007d)] I wrote [see that article, when published, for references other than Hake (1998a,b; 2002a,b)] :

HHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Average normalized gain differences between T and IE courses that are consistent with the work of Hake (1998a, 1998b, 2002a, 2002b) and Figure 1. . . [of Hake (1998a)]. . . have been reported by: Redish, Saul, & Steinberg, 1997; Saul, 1998; Francis, Adams, & Noonan, 1998; Heller, 1999; Redish & Steinberg, 1999; Redish, 1999; Beichner et al., 1999; Cummings, Marx, Thornton, & Kuhl, 1999; Novak, Patterson, Gavrin, & Christian, 1999; Bernhard, 2000; Crouch & Mazur, 2001; Johnson, 2001; Meltzer, 2002a, 2002b; Meltzer & Manivannan, 2002; Savinainen & Scott, 2002a, 2002b); Steinberg & Donnelly, 2002; Fagan, Crouch, & Mazur, 2002; Van Domelen & Van Heuvelen, 2002;, and Belcher, 2003; Dori & Belcher, 2004; Hoellwarth, Moelter, & Knight, 2005; Lorenzo, Crouch, & Mazur, 2006; & Rosenberg, Lorenzo, & Mazur, 2006.
HHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Thus it would appear that Peter Campbell's [corrected] assertion that causal relationships between:

(a) "interactive engagement" and [normalized pre-to-post test gains] , and
(b)  "traditional" course methods and [normalized pre-to-post test gains],"
cannot be established is either;

a. trivial if "cannot be established" means "can't be established with complete 100% certainty for all time," or

b. problematic if "cannot be established" mean "cannot be shown to have a reasonable likelihood of being correct."


33333333333333333333333333333333333333333333333
3. "It's entirely possible that students who were taught using "interactive engagement" scored better on MCT's. . . .[NO! on average achieved higher pre-to-posttest average normalized gains]. . . than students who were taught using "traditional" course methods due to something in the design of the "interactive engagement" curriculum/pedagogy. In other words, these students might have been better trained and prepared to take MCT's. You do not seem to control for this variable."

Could it really be that the approximately two-standard deviation superiority in average normalized gains of interactive engagement (IE) over traditional (T) courses was simply due the fact that students subjected to IE courses became more expert in taking MCT's as the course progressed so that their posttest scores were elevated over their pretest scores by MCT smarts, rather than physics smarts?

If so, those who worked for years developing the IE methods [e.g., Collaborative Peer Instruction, Microcomputer-Based Laboratories, Concept Tests, Modeling, Active Learning Problem Sets, Overview Case Studies, and Socratic Dialogue Inducing Laboratories] in the courses that I surveyed will be disappointed that their methods failed to improve students' conceptual grasp of Newtonian mechanics more than traditional "direct instruction" courses with passive student lectures, recipe labs, and algorithmic homework and exam problems. However, not to worry, IE course developers can probably make a fortune preparing students for the MCT components of high-stakes tests such as the SAT's, GRE's, or NCLB-induced state tests.

But seriously, I think Peter may be hampered by his unfamiliarity with the nature of IE methods in physics. It may be worthwhile to quote the description of a fairly typical IE method [Hake (1992)] [see that article and Hake (2007c) for the references]:

HHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Socratic Dialogue Inducing (SDI) labs have been shown [Hake (1998a, 1998b - Table Ic)] to be relatively effective in guiding students to construct a coherent conceptual understanding of Newtonian mechanics. The SDI method might be characterized as "guided construction," rather than "guided discovery" or "inquiry." We think the efficacy of SDI labs is primarily due to the following essential features:

(1) interactive engagement of students who are induced to think constructively about simple Newtonian experiments which produce conflict with their commonsense understandings;

(2) the Socratic method [e.g., Arons (1973, 1974, 1990, 1993, 1997); Hake (1992, 2002 f,g,h,j)] of the *historical* Socrates [Vlastos (1990, 1991, 1993)], not Plato's alter ego in the "Meno"!, as mistakenly assumed by many - even some physicists; utilized by experienced instructors who have a good understanding of the material and are aware of common student preconceptions and failings;

(3) considerable interaction between students and instructors and thus a degree of individualized instruction;

(4) extensive use of multiple representations (verbal, written, pictorial, diagrammatic, graphical, and mathematical) to model physical systems;

(5) real world situations and kinesthetic sensations (which promote student interest and intensify cognitive conflict when students' direct sensory experience does not conform to their conceptions);

(6) cooperative group effort and peer discussions;

(7) repeated exposure to the coherent Newtonian explanation in many different contexts.
HHHHHHHHHHHHHHHHHHHHHHHHHHHHH

I'm very doubtful that features "1" - "7" above were collectively:

(a) no more effective in enhancing students' conceptual understanding of Newtonian mechanics than traditional passive-student lecture courses with recipe labs and algorithmic-problem homework and exams;

(b) so effective in enhancing students' MCT taking abilities that even despite "a" above, 5 IE (SDI) courses [see Table 1c of Hake (1998b)] improved their posttest scores over their pretest scores such as to obtain an average of average normalized gains <<g>>> = (0.60 plus or minus 0.04sd) compared to 14 T courses with <<g>> = (0.23 plus or minus 0.04sd), for a Cohen (1988) effect size d = 9.


4444444444444444444444444444444444444444444\
4. "Finally, it's also entirely possible that students who were taught using "interactive engagement" scored better on MCT's than students who were taught using "traditional" course methods due to the composition of the students in the "interactive engagement" group. Were the students randomly selected to each group? How large were the groups?"

The abstract of Hake (1998) states: ". . .forty-eight courses (N = 4458) which made substantial use of IE methods achieved an average [normalized] gain <g>IE-ave = 0.48 ± 0.14 (std dev), almost two standard deviations of <g>IE-ave above that of the traditional courses." So, on average, the size of the IE groups was 4458/48 = 92.8.

As to random selection, I'll repeat from my previous post "Re: Why MCT's? 2nd Try" [Hake (2007b)]: in "Should We Measure Change? Yes!" [Hake (2007c)] I wrote:

HHHHHHHHHHHHHHHHHHHHHHHHHH
THE VIEW FROM U.S. DEPARTMENT OF EDUCATION
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
"History" and maturation are among the nine threats to internal validity listed in Table 2.4 of Shadish et al. (2002), are discussed on pages 56-57 of that text, and are reiterated by the PEP. . . Psycholologist, Education specialist, Psychometrician]. . . dominated "Coalition for Evidence-Based Policy" (CEBP) at the U.S. Dept. of Education [USDE (2003)]:

USDE-USDE-USDE-USDE-USDE
There is persuasive evidence that the randomized controlled trial, when properly designed and implemented, is superior to other study designs in measuring an intervention's true effect.

1. "Pre-post" study designs often produce erroneous results. Definition: A "pre-post" study examines whether participants in an intervention improve or regress during the course of the intervention, and then attributes any such improvement or regression to the intervention.

The problem with this type of study is that, without reference to a control group, it cannot answer whether the participants' improvement or decline would have occurred anyway, even without the intervention. This often leads to erroneous conclusions about the effectiveness of the intervention.
USDE-USDE-USDE-USDE-USDE

But CEBP's criticism of pre/post testing is irrelevant for the recent pre/post studies in physics. The reason is that control groups HAVE been utilized - they are the introductory courses taught by the traditional method. The matching is due to the fact that (a) within any one institution the test [interactive engagement (IE)] and control [traditional (T)] groups are drawn from the same generic introductory course taken by relatively homogeneous groups of students, and (b) IE course teachers in all institutions are drawn from the same generic pool of introductory course teachers who, judging from uniformly poor average normalized gains <g> they obtain in teaching traditional (T) courses, do not vary greatly in their ability to enhance student learning.
HHHHHHHHHHHHHHHHHHHHHHHHHH

Furthermore, it's surprising that Peter appears to side with the USDE in their mistaken idea that randomized control trials (RCT's) are the gold standard of assessment.

In "Will the No Child Left Behind Act Promote Direct Instruction of Science?" [Hake (2005)], I gave, as one of the seven reasons why "Direct Science Instruction" threatens to predominate nationally under the aegis of the No Child Left Behind Act, the following [see that article for references other than Shavelson & Towne (2002) :

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
MOST INTERACTIVE ENGAGEMENT AND GUIDED INQUIRY METHODS HAVE NOT BEEN TESTED IN RANDOMIZED CONTROL TRIALS (RCT'S), THE "GOLD STANDARD" OF THE U.S. DEPT. OF EDUCATION (USDE)

That a single research method should be designated as the "gold standard" for evaluating an intervention's effectiveness appears antithetical to the report of the NRC's Committee on Scientific Principles for Education Research [Shavelson & Towne (2002) - ST]. ST state that scientific research should "pose significant questions that can be investigated empirically," and "use methods that permit direct investigation of the questions."

Furthermore, the USDE's RCT gold standard is considered problematic by a wide array of scholars. Taking issue with the RCT gold standard are philosophers Dennis Phillips [Shavelson, Phillips, Towne, & Feuer (2003)] and Michael Scrivin (2004); mathematicians Burkhardt & Schoenfeld (2003); engineer Woodie Flowers [Zaritsky, Kelly, Flowers, Rogers, Patrick (2003)]; and physicist Andre deSessa [Cobb, Confey, diSessa, Lehrer, & Schauble (2003)].

In addition, the following organizations oppose the RCT gold standard:
(a) American Evaluation Association (AEA)
      <http://www.eval.org/doestatement.htm>,

(b) American Education Research Association (AERA)
      <http://www.eval.org/doeaera.htm>, and

(c) National Education Association
      <http://www.eval.org/doe.nearesponse.pdf> (88 kB).
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Richard Hake, Emeritus Professor of Physics, Indiana University
24245 Hatteras Street, Woodland Hills, CA 91367
<rrhake@earthlink.net>
<http://www.physics.indiana.edu/~hake>
<http://www.physics.indiana.edu/~sdi>

"Conflict is the gadfly of thought. It stirs us to observation and memory. It instigates to invention. It shocks us out of sheep-like passivity, and sets us at noting and contriving. Not that it always effects this result; but that conflict is a sine qua non of reflection and ingenuity."
   John Dewey "Morals Are Human," Dewey: Middle Works, Vol.14, p. 207.


REFERENCES [Tiny URL's courtesy <http://tinyurl.com/create.php>.]
Campbell, P. 2007a. "Re: Lauren Resnick and higher-order thinking skills," ARN-L post of 27 Jan 2007 12:10:35-0600; online at
<http://interversity.org/lists/arn-l/archives/Jan2007/msg00184.html>.

Campbell, P. 2007b. "Re: Why MCT's? 2nd Try," ARN-L post of 8 Mar 2007 14:20:55 -0600, online at
<http://interversity.org/lists/arn-l/archives/Mar2007_date/msg00035.html>.

Cohen, J. 1988. "Statistical power analysis for the behavioral sciences." Second edition. Lawrence Erlbaum.

Hake, R.R. 1992. "Socratic pedagogy in the introductory physics lab." Phys. Teach. 30: 546-552; updated version (4/27/98) online at <http://www.physics.indiana.edu/~sdi/SocPed1.pdf> (88 kB).

Hake, R.R. 1998a. "Interactive-engagement vs traditional methods: A
six-thousand-student survey of mechanics test data for introductory physics courses," Am. J. Phys. 66(1): 64-74; online at
<http://www.physics.indiana.edu/~sdi/ajpv3i.pdf> (84 kB).

Hake, R.R. 1998b. "Interactive-engagement methods in introductory mechanics courses," online at <http://www.physics.indiana.edu/~sdi/IEM-2b.pdf> (108 kB) - a crucial companion paper to Hake (1998a).

Hake, R.R. 2002a. "Lessons from the physics education reform effort," Ecology and Society 5(2): 28; online at <http://www.ecologyandsociety.org/vol5/iss2/art28/>. Ecology and Society (formerly Conservation Ecology) is a free online "peer-reviewed journal of integrative science and fundamental policy research" with about 11,000 subscribers in about 108 countries.

Hake, R.R. 2002b. "Assessment of Physics Teaching Methods," Proceedings of the UNESCO ASPEN Workshop on Active Learning in Physics, Univ. of Peradeniya, Sri Lanka, 2-4 Dec. ; online at
<http://www.physics.indiana.edu/~hake/Hake-SriLanka-Assessb.pdf> (84 kB).

Hake, R.R. 2005. "Will the No Child Left Behind Act Promote Direct Instruction of Science?" Am. Phys. Soc. 50: 851 (2005); APS March Meeting, Los Angles, CA. 21-25 March; online at
<http://www.physics.indiana.edu/~hake/WillNCLBPromoteDSI-3.pdf> (256 kB).

Hake, R.R. 2006. "Possible Palliatives for the Paralyzing Pre/Post Paranoia that Plagues Some PEP's" [PEP's = Psychometricians, Education specialists, and Psychologists], Journal of MultiDisciplinary Evaluation, Number 6, November, online at <http://evaluation.wmich.edu/jmde/JMDE_Num006.html>.

Hake, R.R. 2007a. "Re: Why MCT's? (was Lauren Resnick and higher-order thinking skills)," online only at the PhysLrnR archives <http://tinyurl.com/2rlyju>. Post of 6 Feb 2007 23:22:43-0600 to ARN-L and PhysLrnR. Unfortunately, as of today, the ARN-L archives for Feb 2007 at <http://interversity.org/lists/arn-l/archives/Feb2007_date/index.html> are incomplete, having been last updated on Feb 05 19:14:05 2007. This archive failure has not, to my knowledge been explained by the ARN-L list manager.

Hake, R.R. 2007b. "Re: Why MCT's? 2nd Try," ARN-L post of 7 Mar 2007 21:44:07 -0800, online at <http://interversity.org/lists/arn-l/archives/Mar2007/msg00033.html>. This is the second try to post the message Hake (2007a) to ARN-L.

Hake, R.R. 2007c. "Should We Measure Change? Yes!" online as ref. 43 at <http://www.physics.indiana.edu/~hake>. To appear as a chapter in "Evaluation of Teaching and Student Learning in Higher Education," a Monograph of the American Evaluation Association <http://www.eval.org/>. A severely truncated version appears at Hake (2006).

Hake, R.R. 2007d. "Design-Based Research in Physics Education Research: A Review," in Kelly & Lesh (2007).

Halloun, I. & D. Hestenes. 1985a. "The initial knowledge state of college physics students." Am. J. Phys. 53:1043-1055; online at <http://modeling.asu.edu/R&E/Research.html>. Contains the "Mechanics Diagnostic" test, precursor to widely used the "Force Concept Inventory" [Hestenes et al. (1992)]

Halloun, I. & D. Hestenes. 1985b. "Common sense concepts about motion." Am. J. Phys. 53:1056-1065; online at <http://modeling.asu.edu/R&E/Research.html>.

Hestenes, D., M. Wells, & G. Swackhamer, 1992. "Force Concept Inventory," Phys. Teach. 30: 141-158; online (except for the test itself) at <http://modeling.asu.edu/R&E/Research.html>. The 1995 revision by Halloun, Hake, Mosca, & Hestenes is online (password protected) at the same URL, and is available in English, Spanish, German, Malaysian, Chinese, Finnish, French, Turkish, Swedish, and Russian.

Kelly, A.E. & R.A. Lesh, eds. 2007. "Handbook: Design-Based Research in Education, " in preparation. Mahwah, NJ: Lawrence Erlbaum Associates.

Lakatos, I. and A. Musgrave, eds. 1970. "Criticism and the growth of knowledge." Cambridge University Press, information at <http://tinyurl.com/2lnyto>.

Popper, K. 1959. "The Logic of Scientific Discovery." Basic Books.

Popper, K. 1965. "Conjectures and Refutations." Basic Books.

Shadish, W.R., T.D. Cook, & D.T. Campbell. 2002. "Experimental and Quasi-Experimental Designs for Generalized Causal Inference." Houghton Mifflin. A goldmine of references to the social-science literature of experimentation. Amazon.com information at <http://tinyurl.com/yowod6>. Note the "Look inside this book feature."

Shavelson, R.J. & L. Towne, eds. 2002. "Scientific Research in Education," National Academy Press; online at <http://www.nap.edu/catalog/10236.html>.

USDE. 2003. U.S. Department of Education, "Identifying and Implementing Educational Practices Supported by Rigorous Evidence: A User Friendly Guide." Institute of Education Sciences," National Center for Education Evaluation and Regional Assistance. The entire guide is online at <http://www.ed.gov/rschstat/research/pubs/rigorousevid/rigorousevid.pdf> (140 KB).