Article Summary

The list of articles below was compiled by Marks Nester while researching his article, “An Applied Statistician's Creed,” which we recommend. Marks was a researcher with the Australian Department of Agriculture and Fisheries, but is now retired. We reprint this list with Marks’s permission. We are, of course, indebted to Marks for the use of this list, which undoubtedly took him many hours to compile.

Unlike the list compiled by Bill Thompson, not all of the 268 articles below are critical of null hypothesis significance testing (NHST). In fact, many articles do not tackle NHST directly at all, instead focusing on other statistical and analytic concepts. Nonetheless, Marks’s Creed article — based on the below list — is quite critical of NHST in general. It argues for a refocusing of analysis away from dichotomous use of NHST: “Acceptance of the creed forces a data analyst to focus on the important issues, and it reminds the analyst that there are many assumptions which must be examined to ensure that the analysis is sound and appropriate.”

This list is part of The Research’s larger project to document the many critiques of NHST and to apply those critiques in specific business contexts (ex. A/B web testing).

Note that Marks’s citation style is slightly more formal than that we have used elsewhere on this site. Also note that we are currently in the process of adding direct links to all 268 articles, although this will take some time.

Aitkin, M. (1983). Comment on Professor Prais's Paper. J. R. Statist. Soc. A 146(2) : 170-171.
Altman, D. G. (1985). Discussion of Dr Chatfield's paper. J. R. Statist. Soc. A 148, Part 3 : 242.
Anscombe, F. J. (1956). Discussion on Dr. David's and Dr. Johnson's Paper. J. Roy. Statist. Soc. B 18 : 24-27.
Anscombe, F. J. (1961). Bayesian statistics. American Statistician 15 : 21-24. Reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 379-385.
Anscombe, F. J. (1963). Tests of goodness of fit. Journal of the Royal Statistical Society, Series B, 25 : 81-94.
Arbuthnott, J. (1710). An argument for Divine Providence, taken from the constant regularity observ'd in the births of both sexes. Philosophical Transactions of the Royal Society 23 : 186-190.
Bahadur, R. R. and Robbins, H. (1950). The problem of the greater mean. Annals of Mathematical Statistics 21 : 469-487.
Bailar III, J. C. (1986). Science, statistics and deception. Annals of Internal Medicine 104 : 259-260.
Bailar III, J. C. (1995). A larger perspective. The American Statistician 49(1) : 10-11.
Bakan, D. (1967). The test of significance in psychological research. From Chapter 1 of On Method, Jossey-Bass, Inc. (San Francisco). Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).
Baker, R. J. (1980). Multiple comparison tests. Can. J. Plant Sci. 60 : 325-327.
Barndorff-Nielsen, O. (1977). Discussion of D. R. Cox's paper. Scand. J. Statist. 4 : 67-69.
Beaven, E. S. (1935). Discussion on Dr. Neyman's Paper. Journal of the Royal Statistical Society, Supplement 2 : 159-161.
Berger, J. O. and Sellke, T. (1987a). Testing a point null hypothesis: the irreconcilability of P values and evidence. Journal of the American Statistical Association 82(397) : 112-122.
Berger, J. O. and Sellke, T. (1987b). Rejoinder. Journal of the American Statistical Association 82(397) : 135-139.
Berger, J. O. and Wolpert, R. L. (1988). The Likelihood Principle. Second edition. Lecture Notes-Monograph Series, Volume 6, Institute of Mathematical Statistics (Hayward, California).
Berkson, J. (1938). Some difficulties of interpretation encountered in the application of the chi-square test. J. Amer. Statist. Ass. 33 : 526-536.
Berkson, J. (1941). Comments on Dr. Madow's "Note on tests of departure from normality" with some remarks concerning tests of significance. Journal of the American Statistical Association 36 (216) : 539-541.
Berkson, J. (1942). Tests of significance considered as evidence. Journal of the American Statistical Association 37(219) : 325-335.
Binder, A. (1959). Considerations of the place of assumptions in correlational analysis. Reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 164-171.
Binder, A. (1963). Further considerations on testing the null hypothesis and the strategy and tactics of investigating theoretical models. Psychological Review 70 : 107-115. Reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 118-126.
Birnbaum, A. (1962). Another view on the foundations of statistics. American Statistician 16 : 17-21. Reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 363-370.
Boardman, T. J. (1994). The statistician who changed the world: W. Edwards Deming, 1900-1993. The American Statistician 48(3) : 179-187.
Bolles, R. C. (1962). The difference between statistical hypotheses and scientific hypotheses. Psychological Reports 11 : 639-645.
Bolles, R. and Messick, S. (1958). Statistical utility in experimental inference. Psychological Reports 4 : 223-227.
Boring, E. G. (1919). Mathematical vs. scientific significance. Psychological Bulletin 16(10) : 335-338.
Box, G. E. P. (1976). Science and statistics. J. Amer. Statist. Ass. 71 : 791-799.
Box, G. E. P. (1983). An apology for ecumenism in statistics. In Scientific Inference, Data Analysis, and Robustness, G. E. P. Box, T. Leonard and C. F. Wu (eds.), Academic Press, Inc. : 51-84.
Box, G. (1990). Commentary. Technometrics 32(3) : 251-252.
Box, J. F. (1980). R. A. Fisher and the Design of Experiments. The American Statistician 34(1) : 1-7.
Braithwaite, R. B. (1953). Scientific Explanation. A Study of the Function of Theory, Probability and Law in Science. Cambridge University Press.
Brewer, J. K. (1985). Behavioral statistics textbooks: Source of myths and misconceptions? Journal of Educational Statistics 10 : 252-268.
Bross, I. D. (1982). Simplicity and credibility: A counterstrategy. Statistics & Probability Letters 1 : 79-83.
Bryan-Jones, J. and Finney, D. J. (1983). On an error in "Instructions to Authors". HortScience 18(3) : 279-282.
Buchanan-Wollaston, H. J. (1935). The philosophic basis of statistical analysis. Journal of the International Council for the Exploration of the Sea 10 : 249-263.
Buchanan-Wollaston, H. J. (1936). The philosophic basis of statistical analysis. Journal of the International Council for the Exploration of the Sea 11 : 7-26.
Camilleri, S. F. (1962). Theory, probability, and induction in social research. American Sociological Review 27 : 170-178. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).
Camp, B. H. (1938). Further interpretations of the chi-square test. J. Amer. Statist. Ass. 33 : 537-542.
Campbell, D. T. (1972). Factors relevant to the validity of experiments in social settings. In ??? R. E. Kirk (ed.), Statistical Issues A Reader for the Behavioral Sciences, Wadsworth Publishing Company : 186-199. Reprinted from Psychological Bulletin (1957), 54 : 297-312.
Carmer, S. G. (1976). Optimal significance levels for application of the least significant difference in crop performance trials. Crop Science 16 : 95-99.
Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review 48 : 378-399.
Casella, G. and Berger, R. L. (1987a). Reconciling Bayesian and frequentist evidence in the one-sided testing problem. Journal of the American Statistical Association 82(397) : 106-111.
Casella, G. and Berger, R. L. (1987b). Rejoinder. Journal of the American Statistical Association 82(397) : 133-135.
Chatfield, C. (1985). The initial examination of data (with discussion). J. R. Statist. Soc. A 148, Part 3 : 214-253.
Chatfield, C. (1989). Comments on the paper by McPherson. Journal of the Royal Statistical Society, Series A, 152 : 234-238.
Chernoff, H. (1986). Comment. The American Statistician 40(1) : 5-6.
Chew, V. (1976a). Comparing treatment means: a compendium. HortScience 11(4) : 348-357.
Chew, V. (1976b). Uses and abuses of Duncan's multiple range test. Proc. Florida State Hort. Soc. 89 : 251-253.
Chew, V. (1977). Statistical hypothesis testing: an academic exercise in futility. Proc. Florida State Hort. Soc. 90 : 214-215.
Chew, V. (1980). Testing differences among means: correct interpretation and some alternatives. HortScience 15(4) : 467-470.
Clark, C. A. (1963). Hypothesis testing in relation to statistical methodology. Review of Educational Research 33 : 455-473.
Cochran, W. G. and Cox, G. M. (1957). Experimental Designs. 2nd edn. John Wiley & Sons, Inc.
Cohen, J. (1990). Things I have learned (so far). American Psychologist 45 : 1304-1312.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist 49 : 997-1003.
Connor, E. F. and Simberloff, D. (1986). Competition, scientific method, and null models in ecology. American Scientist 74 : 155-162.
Cormack, R. M. (1985). Discussion of Dr Chatfield's paper. J. R. Statist. Soc. A 148, Part 3 : 231-233.
Cox, D. R. (1958). Some problems connected with statistical inference. Annals of Mathematical Statistics 29 : 357-372.
Cox, D. R. (1977). The role of significance tests. (With discussion). Scand. J. Statist. 4 : 49-70.
Cox, D. R. (1982). Statistical significance tests. Br. J. clin. Pharmac. 14 : 325-331.
Cox, D. R. and Snell, E. J. (1981). Applied Statistics Principles and Examples. Chapman and Hall.
Cox, D. R. and Wermuth, N. (1994). Tests of linearity, multivariate normality and the adequacy of linear scores. Applied Statistics 43(2) : 347-355.
Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist 30 : 116-127.
Darlington, R. B. (1968). Multiple regression in psychological research and practice. Psychological Bulletin 69 : 161-182.
Dawkins, H. C. (1981). The misuse of t-tests, LSD and multiple-range tests. British Ecological Society Bulletin 12 : 112-115.
Derrick, T. (1976). The criticism of inferential statistics. Educational Research 19(1) : 35-40.
Dickey, J. M. (1987). Comment. Journal of the American Statistical Association 82(397) : 129-130.
Dooling, D. J. and Danks, J. H. (1975). Going beyond tests of significance: Is psychology ready? Bulletin of the Psychonomic Society 5(1) : 15-17.
Dudycha, A. L. and Dudycha, L. W. (1972). Behavioural statistics: an historical perspective. In Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 2-25.
Dyke, G. (1997). How to avoid bad statistics. Field Crops Research 51 : 165-187.
Edgeworth, F. Y. (1884). IV.The philosophy of chance. Mind 9 : 223-235.
Edgington, E. S. (1966). Statistical inference and nonrandom samples. Psychological Bulletin 66 : 485-487. Reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 146-149.
Edwards, W. (1965). Tactical note on the relation between scientific and statistical hypotheses. Psychological Bulletin 63 : 400-402. Reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 127-130.
Edwards, W., Lindman, H. and Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review 70 : 193-242.
Favre, A., Guitton, H., Guitton, J., Lichnerowicz, A. and Wolff, E. (1995). Chaos and Determinism. Johns Hopkins University Press.
Finney, D. J. (1988). Was this in your statistics textbook? III. Design and analysis. Expl Agric. 24 : 421-432.
Finney, D. J. (1989a). Was this in your statistics textbook? VI. Regression and covariance. Expl Agric. 25 : 291-311.
Finney, D. J. (1989b). Is the statistician still necessary? Biom. Praxim. 29 : 135-146.
Finney, D. J. (1992). Guest editorial: code for presentation of statistical analyses. Phil. Trans. R. Soc. Lond. B 337 : 381-382.
Finney, D. J. (1995a). A necessity for living or a source of nonsense? Trans IChemE 73 B Supplement : S15-17.
Finney, D. J. (1995b). Statistical science and effective scientific communication. Journal of Applied Statistics 22 : 293-308.
Finney, D. J. (1995c). Letter to the editor. Thoughts suggested by a recent paper: Questions on non-parametric analysis of quantitative data. Journal of Toxicological Sciences 20 : 165-170.
Finney, D. J. (1996). Make the numbers tell their story. Norwegian Journal of Agricultural Sciences Supplement 22 : 9-18.
Fisher, R. A. (1935a). The Design of Experiments. Oliver and Boyd (Edinburgh).
Fisher, R. A. and MacKenzie, W. A. (1923). Studies in crop variation. II. The manurial response of different potato varieties. Journal of Agricultural Science 13 : 311-320.
Fisher et al. (1935b). Discussion on Dr. Neyman's Paper. Journal of the Royal Statistical Society, Supplement 2 : 154-180.
Fisher, R. A. (1943). Note on Dr. Berkson's criticism of tests of significance. J. Amer. Statist. Ass. 38 : 103-104.
Freiman, J. A., Chalmers, T. C., Smith Jr., H. and Kuebler, R. R. (1978). The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. New England Journal of Medecine 299(13) : 690-694.
Gaito, J. (1958). The Bolles-Messick coefficient of utility. Psychological Reports 4 : 595-598.
Galton, F.. (1888). On head growth in students at the University of Cambridge. Journal of the Anthropological Institute 18 : 155-157.
Gardner, M. J. and Altman, D. G. (1986). Confidence intervals rather than P values: estimation rather than hypothesis testing. British Medical Journal 292 : 746-750.
Garsd, A. (1984). Spurious correlation in ecological modelling. Ecological Modelling 23 : 191-201.
Gauch Jr., H. G. (1988). Model selection and validation for yield trials with interaction. Biometrics 44 : 705-715.
Geary, D. N., Huntington, E. and Gilbert, R. J. (1992). Analysis of multivariate data from four clinical trials. J. R. Statist. Soc. A 155(1) : 77-79. (incomplete)
Geary, R. C. (1947). Testing for normality. Biometrika 34 : 209-242.
Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren and C. Lewis (eds.), A Handbook for Data Analysis in the Behavioural Sciences: Methodological Issues, Hillsdale, NJ: Erlbaum : 311-339.
Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J. and Krüger, L. (1989). The Empire of Chance. Cambridge University Press, Cambridge, England.
Gold, D. (1958). Comment on "A critique of tests of significane". American Sociological Review 23 : 85-86.
Gold, D. (1969). Statistical tests and substantive significance. The American Sociologist 4 : 42-46.
Good, I. J. (1978). Fallacies, Statistical. In International Encyclopedia of Statistics, W. H. Kruskal and J. M. Tanur (eds.), Free Press (New York) : 337-349.
Good, I. J. (1983). Good Thinking. The Foundations of Probability and Its Applications. University of Minnesota Press (Minneapolis).
Good, I. J. (1987). Comment. Journal of the American Statistical Association 82(397) : 125-128.
Goodman, S. N. and Royall, R. (1988). Evidence and scientific research. American Journal of Public Health 78(12) : 1568-1574.
Grant, D. A. (1962). Testing the null hypothesis and the strategy and tactics of investigating theoretical models. Psychological Review 69 : 54-61.
Graybill, F. A. (1976). Theory and Application of the Linear Model. Duxbury Press (Massachusetts).
Major Greenwood (1932). What is wrong with the medical curriculum? Lancet 1 : 1269-1270.
Gridgeman, N. T. (1959). The lady tasting tea, and allied topics. J. Amer. Statist. Ass. 54 : 776-783.
Guttman, L. (1977). What is not what in statistics. The Statistician 26 : 81-107.
Guttman, L. (1984). Secure prediction: The case of linear regression. Draft kindly supplied by ?????.
Guttman, L. (1985). The illogic of statistical inference for cumulative science. Applied Stochastic Models and Data Analysis 1 : 3-10.
Hahn, G. J. (1990). Commentary. Technometrics 32(3) : 257-258.
Harvey, P. H., Colwell, R. K., Silvertown, J. W. and May, R. M. (1983). Null models in ecology. Ann. Rev. Ecol. Syst. 14 : 189-211.
Hays, W. L. (1973). Statistics for the Social Sciences. Second edition. Holt, Rinehart and Winston.
Healy, M. J. R. (1978). Is statistics a science? J. R. Statist. Soc. A 141, Part 3 : 385-393.
Healy, M. J. R. (1989). Comments on the paper by McPherson. Journal of the Royal Statistical Society, Series A, 152 : 232-234.
Hinkley, D. V. (1987). Comment. Journal of the American Statistical Association 82(397) : 128-129.
Hodges Jr., J. L. and Lehmann, E. L. (1954). Testing the approximate validity of statistical hypotheses. Journal of the Royal Statistical Society, Series B, 16 : 261-268.
Hogben, L. (1957a). The contemporary crisis or the uncertainties of uncertain inference. Statistical Theory, W. W. Norton & Co., Inc. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).
Hogben, L. (1957b). Statistical prudence and statistical inference. Statistical Theory, W. W. Norton & Co., Inc. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).
Hotelling, H. (1951). The impact of R. A. Fisher on statistics. Journal of the American Statistical association 46 : 35-46.
Hunter, J. S. (1990). Commentary. Technometrics 32(3) : 261.
Inman, H. F. (1994). Karl Pearson and R. A. Fisher on statistical tests: A 1935 exchange from Nature. The American Statistician 48(1) : 2-11.
Johnson, D. H. Significance testing: statistics as pseudoscience. Draft paper prepared for the Journal of Wildlife Management.
Johnson, S. B. and Berger, R. D. (1982). On the status of statistics in PHYTOPATHOLOGY. Phytopathology 72(8) : 1014-1015.
Johnstone, D. J. (1986). Tests of significance in theory and practice. (With discussion). The Statistician 35 : 491-504.
Jones, D. (1984). Use, misuse, and role of multiple-comparison procedures in ecological and agricultural entomology. Environmental Entomology 13(3) : 635-649.
Jones, D. and Matloff, N. (1986). Statistical hypothesis testing in biology: a contradiction in terms. Journal of Economic Entomology 79(5) : 1156-1160.
Katzer, J. and Sodt, J. (1973). An analysis of the use of statistical testing in communication research. Journal of Communication 23 : 251-265.
Kempthorne, O. (1966). Some aspects of experimental inference. Journal of the American Statistical Association 61(313) : 11-34.
Kempthorne, O. (1976). Of what use are tests of significance and tests of hypotheses. Commun. Statist. - Theor. Meth A5 (8) : 763-777.
Kempthorne, O. (1989). The fate worse than death and other curiosities and stupidities. The American Statistician 43(3) : 133-134.
Kendall, M. G. (1942). On the future of statistics. Journal of the Royal Statistical Society Part II, 105 : 69-80.
Kendall, M. G. (1959). Hiawatha designs an experiment. American Statistician 13 : 23-24. Reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 175-176.
Kiefer, J. (1977). The foundations of statistics - are there any? Synthese 36 : 161-176.
Kirk, R. E. (1972). Statistical Issues. A Reader for the Behavioural Sciences. Wadsworth Publishing Company.
Kish, L. (1959). Some statistical problems in research design. American Sociological Review, 24 : 328-338. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).
Kruskal, W. H. (1978). Significance, Tests of. In International Encyclopedia of Statistics, eds. W. H. Kruskal and J. M. Tanur, Free Press (New York) : 944-958.
Kruskal, W. (1980). The significance of Fisher: a review of R. A. Fisher: The Life of a Scientist. Journal of the American Statistical Association 75(372) : 1019-1030.
Kruskal, W. and Majors, R. (1989). Concepts of relative importance in recent scientific literature. The American Statistician 43(1) : 2-6.
Lad, F. (1996). Operational Subjective Statistical Methods: A Mathematical, Philosophical and Historical Introduction. John Wiley & Sons, Inc. (New York). ISBN 0‑471‑14329‑4.
LaForge, R. (1967). Confidence intervals or tests of significance in scientific research. Psychological Bulletin 68 : 446-447.
Lindley, D. V. (1986). Discussion. The Statistician 35 : 502-504.
Lindsey, J. K. (1996). Parametric Statistical Inference. Oxford University Press.
Lindsey, J. K. (1996). Some statistical heresies. Symposium on the Foundations of Statistical Inference in Honour of David Sprott, University of Waterloo, 3-4 October, 1996.
Linn, R. L. and Werts, C. E. (1969). Assumptions in making causal inferences from part correlations, partial correlations, and partial regression coefficients. Psychological Bulletin 72 : 307-310.
Little, T. M. (1978). If Galileo published in HortScience. HortScience 13(5) : 504-506.
Little, T. M. (1981). Interpretation and presentation of results. HortScience 16(5) : 637-640.
Loehle, C. (1987). Hypothesis testing in ecology: psychological aspects and the importance of theory maturation. Quarterly Review of Biology 62(4) : 397-409.
Luce, R. D. (1988). The tools-to-theory hypothesis. Review of G. Gigerenzer and D. J. Murray, "Cognition as intuitive statistics". Contemporary Psychology 33 : 582-583.
Luce, R. D. and Raiffa, H. (1957). Games and Decisions. John Wiley & Sons, Inc.
Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin 70 : 151-159. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group) : 267-279. Also reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 150-159.
Madden, L. V., Knoke, J. K. and Louie, R. (1982). Considerations for the use of multiple comparison procedures in phytopathological investigations. Phytopathology 72(8) : 1015-1017.
Mainland, D. (1960). The use and misuse of statistics in medical publications. Clinical Pharmacology and Therapeutics 1 : 411-422.
Mather, K. (1951). R. A. Fisher's Statistical Methods for Research Workers. Journal of the American Statistical association 46 : 51-54.
Matloff, N. S. (1991). Statistical hypothesis testing: problems and alternatives. Environmental Entomology 20(5) : 1246-1250.
May, R. M. (1981). The role of theory in ecology. American Zoologist 21 : 903-910.
McCloskey, D. N. (1995). The insignificance of statistical significance. Scientific American 272(4) : 104-105.
McNemar, Q. (1960). At random: sense and nonsense. American Psychologist 15 : 295-300.
McPherson, G. (1989). The scientists' view of statistics - a neglected area. (Including discussion). Journal of the Royal Statistical Society, Series A, 152 : 221-240.
Medawar, P. B. (1969). Induction and Intuition in Scientific Thought. Jayne Lectures for 1968. Memoirs of the American Philosophical Society, Volume 75.
Meehl, P. E. (1967). Theory testing in psychology and physics: A methodological paradox. Philosophy of Science 34 : 103-115. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology 46 (4) : 806-834.
Meehl, P. E. (1986). What social scientists don't understand. In D. W. Fiske and R. A. Shweder (eds.), Metatheory in Social Science: Pluralisms and Subjectivities, Chicago: University of Chicago Press : 315-338.
Meehl, P. E. (1990a). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports 66 (Monograph Supplement 1-V66) : 195-244.
Meehl, P. E. (1990b). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry 1 : 108-141.
Mitchell, P. L. (1997). Misuse of regression for empirical validation of models. Agricultural Systems 54(1) : 313-326.
von Mises, R. (1957). Probability, Statistics and Truth. Second English Edition, Revised. George Allen and Unwin Ltd. (London).
Mize, C. W. and Schultz, R. C. (1985). Comparing treatment means correctly and appropriately. Can. J. For. Res. 15 : 1142-1148.
Moore, D. S. and McCabe, G. P. (1989). Introduction to the Practice of Statistics. W. H. Freeman and Company (New York).
Moore E. C., (Ed.) (1993). Charles S. Peirce and the Philosophy of Science. University of Alabama Press.
Morris, C. N. (1987). Comment. Journal of the American Statistical Association 82(397) : 131-133.
Morrison, D. E. and Henkel, R. E. (1969). Significance tests reconsidered. The American Sociologist 4 : 131-140. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).
Morrison, D. E. and Henkel, R. E. (Eds.) (1970). The Significance Test Controversy - A Reader. Aldine Publishing Company (Butterworth Group).
Morse, P. M. and Thompson, B. K. (1981). Presentation of experimental results. Can. J. Plant Sci. 61 : 799-802.
Moser, C. (1980). Statistics and public policy. Journal of the Royal Statistical Society, Series A, 143 : 1-31.
Müller-Hill, B. (1993). Science, truth and other values. Quarterly Review of Biology 68(3) : 399-407.
Murray, L. W. and Dosser Jr., D. A. (1987). How significant is a significant difference? Problems with the measurement of magnitude of effect. Journal of Counseling Psychology 34(1) : 68-72.
Natrella, M. G. (1960). The relation between confidence intervals and tests of significance. American Statistician 14 : 20-22, 33. Reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 113-117.
Nelder, J. A. (1971). Discussion on papers by Wynn, Bloomfield, O'Neill and Wetherill. Journal of the Royal Statistical Society, Series B, 33 : 244-246.
Nelder, J. A. (1985). Discussion of Dr Chatfield's paper. J. R. Statist. Soc. A 148, Part 3 : 238.
Nester, M. R. (1998). Significance tests cannot be justified in theory‑corroboration experiments. A review of S. L. Chow (1996), Statistical Significance: Rationale, Validity and Utility. Behavioral and Brain Sciences Journal 21(2) : 213.
Neyman, J. (1958). The use of the concept of power in agricultural experimentation. Journal of the Indian Society of Agricultural Statistics 9(1) : 9-17.
Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A, 231 : 289-337.
Nunnally, J. (1960). The place of statistics in psychology. Educational and Psychological Measurement 20(4) : 641-650.
Oakes, M. (1986). Statistical Inference: A Commentary for the Social and Behavioural Sciences. John Wiley & Sons.
Olaloye, A. O. (1992). Measurement of capital usage in Nigerian industries. J. R. Statist. Soc. A 155(2) 233-239. (incomplete)
O'Brien, T. C. and Shapiro, B. J. (1968). Statistical significance—what? Mathematics Teacher 61 : 673-676. Reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 109-112.
O'Neill, R. and Wetherill, G. B. (1971). The present state of multiple comparison methods. (Including discussion). Journal of the Royal Statistical Society, Series B, 36 : 218-250.
Ottenbacher, K. J. (1996). The power of replications and replications of power. The American Statistician 50(3) : 271-275.
Paterson, L. (1992). The influence of opportunity on aspirations among prospective university entrants from Scottish schools, 1970-88. J. R. Statist. Soc. A 155(1) : 37-60). (incomplete)
Pearce, S. C. (1992). Data analysis in agricultural experimentation. II. Some standard contrasts. Expl Agric. 28 : 375-383.
Pearson, E. S. (1948). Discussion of Dr. Wishart's paper 'The teaching of statistics'. Journal of the Royal Statistical Society, Series A 111 : 212-229.
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated systems of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, Series V, 1 : 157-175.
Pearson, K. (1920-21). Notes on the history of correlation. Biometrika 13 : 25-45.
Pearson, K. (???). On the general theory of the influence of selection on correlation and variation. Biometrika VIII : 437-443.
Pearson, K. (1935a). Statistical tests. Nature 136 : 296-297. (Not sighted, reproduced in H. F. Inman (1994). Karl Pearson and R. A. Fisher on statistical tests: A 1935 exchange from Nature. The American Statistician 48(1) : 2-11.)
Pearson, K. (1935b). Statistical tests. Nature 136 : 550. (Not sighted, reproduced in H. F. Inman (1994). Karl Pearson and R. A. Fisher on statistical tests: A 1935 exchange from Nature. The American Statistician 48(1) : 2-11.)
Perry, J. N. (1986). Multiple-comparison procedures: a dissenting view. Journal of Economic Entomology 79(5) : 1149-1155.
Petersen, R. G. (1977). Use and misuse of multiple comparison procedures. Agronomy Journal 69 : 205-208.
Platt, J. R. (1964). Strong inference. Science 146(3642) : 347-353.
Pratt, J. W. (1961). Review of E. L. Lehmann, Testing Statistical Hypotheses. Journal of the American Statistical Association 56 : 163-167.
Pratt, J. W. (1976). A discussion of the question: for what use are tests of hypotheses and tests of significance. Commun. Statist.-Theor. Meth. A5(8) : 779-787.
Pratt, J. W. (1987). Comment. Journal of the American Statistical Association 82(397) : 123-125.
Preece, D. A. (1982). The design and analysis of experiments: what has gone wrong? Utilitas Mathematica 21A : 201-244.
Preece, D. A. (1984a). Biometry in the Third World: science not ritual. Biometrics 40 : 519-523.
Preece, D. A. (1984b). Discussion of Dr Miller's Paper. Journal of the Royal Statistical Society, Series A, 147 : 419.
Preece, D. A. (1990). R. A. Fisher and experimental design: a review. Biometrics 46 : 925-935.
Ranstam, J. (1996). A common misconception about p-value and its consequences. Acta Orthopaedica Scandinavica 67 : 505-507.
C. R. Rao (1989). Statistics and Truth. International Co-operative Publishing House, Fairland, Maryland, USA.
Rennie, D. (1978). Vive la différence. New England Journal of Medecine 299(15) : 828-829.
Robinson, W. S. (1951). The logical structure of analytic induction. American Sociological Review 16 : 812-818.
Rosenthal, R. and Rubin, D. B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology 74(2) : 166-169.
Rosnow, R. L. and Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist 44 : 1276-1284.
Ross, N. P. (1995). What the government needs. The American Statistician 49(1) : 7-9.
Rothman, K. (1978). A show of confidence. New England Journal of Medicine 299 : 1362-1363.
Roughgarden, J. (1983). Competition and theory in community ecology. American Naturalist 122(5) : 583-601.
Rozeboom, W. W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin 57 : 416-428. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).
Ryan, T. A. (1959). Multiple comparisons in psychological research. Psychological Bulletin 56 : 26-47. Abridged version in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 291-306.
Santner, T. J. and Tamhane, A. C., Eds. (1984). Design of Experiments. Ranking and Selection. Marcel Dekker (New York).
Savage, I. R. (1957). Nonparametric statistics. J. Amer. Statist. Ass. 52 : 331-344.
Sayn-Wittgenstein, L. (1965). Statistics - salvation or slavery? Forestry Chronicle 41 : 103-105.
Schervish, M. J. (1996). P values: what they are and what they are not. The American Statistician 50(3) : 203-206.
Schmidt, F. L. (1992). What do data really mean? Research findings, meta-analysis, and cumulative knowledge in psychology. American Psychologist 47 : 1173-1181.
Schrödinger, E. (1992). What is Life? with Mind and Matter & Autobiographical Sketches. Canto edition. Cambridge University Press.
Schultz, B. B. (1989). In support of significance tests and of unplanned pairwise multiple comparisons. Environmental Entomology 18(6) : 901-907.
Schwertman, N. C. (1996). A connection between quadratic-type confidence limits and fiducial limits. The American Statistician 50(3) : 242-243.
Seewald, W. (1994). Time trend in historical controls for tumour incidences in long-term animal studies. Applied Statistics 43(1) : 127-137. (incomplete)
Selvin, H. C. (1957). A critique of tests of significance in survey research. American Sociological Review 22 : 519-527.
Shulman, L. S. (1970). Reconstruction of educational research. Review of Educational Research 40(3) : 371-396.
Simonoff, J. S. and Tsai, C.-L. (1994). Use of modified profile likelihood for improved tests of constancy of variance in regression. Applied Statistics 43(2) : 357-370.
Skipper Jr., J. K., Guenther, A. L. and Nass, G. (1967). The sacredness of .05: A note concerning the uses of statistical levels of significance in social science. The American Sociologist 2 : 16-18. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group) : 155-160. Also reprinted in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 141-145.
Smith, C. A. B. (1960). Book review of Norman T. J. Bailey: Statistical Methods in Biology. Applied Statistics 9 : 64-66.
Solo, V. (1984). An Alternative to Significance Tests. Technical Report 84-14, Purdue University, Dept. of Statistics.
Spicer, C. C. (1982). Statistical decision theory and clinical trials. Br. J. clin. Pharmac. 14 : 765-768.
Sterling, T. D., Rosenbaum, W. L. and Weinkam, J. J. (1995). Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. The American Statistician 49(1) : 108-112.
Stevens, S. S. (1968). Measurement, statistics, and the schemapiric view. Science 161 : 849-856. Abridged in Statistical Issues, A Reader for the Behavioural Sciences, Ed. R. E. Kirk, 1972, Wadsworth Publishing Company : 66-78.
Stouffer, S. A. (1958). Karl Pearson - an appreciation on the 100th anniversary of his birth. Journal of the American Statistical Association 53 : 23-27.
Street, D. J. (1990). Fisher's contributions to agricultural statistics. Biometrics 46 : 937-945.
Strube, M. J. (1988). Some comments on the use of magnitude-of-effect estimates. Journal of Counseling Psychology 35(3) : 342-345.
"Student" (1908). The probable error of a mean. Biometrika 6(1) : 1-25.
Suppes, P. (1974). The place of theory in educational research. Educational Researcher 3(6) : 3-10.
Swijtink, Z. G. (1987). The objectification of observation: Measurement and statistical methods in the nineteenth century. In L. Krüger, L. Daston and M. Heidelberger (eds.), The Probabilistic Revolution: Vol. 1. Ideas in History, Cambridge, MA: MIT Press : 261-285.
Tamhane, A. C. (1996). Review of R. E. Bechhofer, T. J. Santner and D. M. Goldsman, Design and Analysis of Experiments for Statistical Selection, Screening and Multiple Comparisons, John Wiley (New York), 1995. Technometrics 38 : 289-290.
Thompson, B. (1992). Two and one-half decades of leadership in measurement and evaluation. Journal of Counseling and Development 70 : 434-438.
Tullock, G. (1959). Publication decisions and tests of significance: A comment. J. Amer. Statist. Assoc. 54 : 593. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).
Tukey, J. W. (1949). Comparing individual means in the analysis of variance. Biometrics 5 : 99-114.
Tukey, J. W. (1962). The future of data analysis. Annals of Mathematical Statistics 33 : 1-67.
Tukey, J. W. (1973). The problem of multiple comparisons. Unpublished manuscript, Dept. of Statistics, Princeton University.
Tukey, J. W. (1991). The philosophy of multiple comparisons. Statistical Science 6 : 100-116.
Tversky, A. and Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin 76(2) : 105-110.
Tyler, R. W. (1931). What is statistical significance? Educational Research Bulletin 10(5) : 115-118.
Upton, G. J. G. (1992). Fisher's exact test. J. R. Statist. Soc. A 155(3) : 395-402.
Vardeman, S. B. (1987). Comment. Journal of the American Statistical Association 82(397) : 130-131.
Venn, J. (1888). Cambridge anthropometry. Journal of the Anthropological Institute 18 : 140-154.
Walker, M. A. (1992). Arrest rates and ethnic minorities: a study in a provincial city. J. R. Statist. Soc. A 155(2) : 259-272.
Wang, C. (1993). Sense and Nonsense of Statistical Inference. Marcel Dekker, Inc.
Warren, W. G. (1986). On the presentation of statistical analysis: reason or ritual. Can. J. For. Res. 16 : 1185-1191.
Wilson, J. (1973). Three myths in educational research. Educational Research 16(1) : 17-19.
Wilson, W. R. and Miller, H. (1964). A note on the inconclusiveness of accepting the null hypothesis. Psychological Review 71 : 238-242.
Winch, R. F. and Campbell, D. T. (1969). Proof? No. Evidence? Yes. The significance of tests of significance. The American Sociologist 4 : 140-143. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).
Dr. Wishart (1935). Discussion on Dr. Neyman's Paper. Journal of the Royal Statistical Society, Supplement 2 : 157-159.
Yates, F. (1951). The influence of Statistical Methods for Research Workers on the development of the science of statistics. Journal of the American Statistical Association 46 : 19-34.
Yates, F. (1964). Sir Ronald Fisher and the design of experiments. Biometrics 20 : 307-321.
Yates, F. (1975). The early history of experimental design. In A Survey of Statistical Design and Linear Models, J. N. Srivastava (ed.), North-Holland Publishing Company : 581-592.
Youden, W. J. (1951). The Fisherian revolution in methods of experimentation. Journal of the American Statistical association 46 : 47-50.
Young, A. (1770). A Course of Experimental Agriculture: Containing an Exact Register of All the Business Transacted during Five Years on Near Three Hundred Acres of Various Soils (2 vols). J. Dodsley (London).
Young, L. J. and Young, J. H. (1991). Alternative view of statistical hypothesis testing. Environmental Entomology 20(5) : 1241-1245.
Yule, G. U. (1897). On the theory of correlation. Journal of Royal Statistical Society LX Part iv : 812-854.
Zeisel, H. (1955). The significance of insignificant differences. Public Opinion Quarterly 17 : 319-321. Reprinted in The Significance Test Controversy - A Reader, Eds. D. E. Morrison and R. E. Henkel, 1970, Aldine Publishing Company (Butterworth Group).