Negative comments about statistical significance and p-values

Article summary: This article is a list of critical statements of p-values and statistical significance. This is not meant to imply that p-values have no value in science or in analyzing results. However, most analysts outside the halls of academia believe that statistical significance is the “gold standard” for experimental results. This article is meant to guide those outside fo statistics away from that view by sharing a history of criticism. The proper usage of p-values in statistics is still under debate, but it is agreed upon that statistical significance is not a “gold standard.” Instead, the gold standard for research includes thoughtful usage of p-values as well as other statistical metrics like effect sizes and various intervals of precision like confidence intervals.

Contact information

We’d love to hear from you! Please direct all inquires to info@theresear.ch


No p-value can reveal the plausibility, presence, truth, or importance of an association or effect. Therefore, a label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical nonsignificance lead to the association or effect being improbable, absent, false, or unimportant.
— Ronald Wasserstein, Allen Schirm, & Nicole Lazar

Reference: “Moving to a World Beyond ‘p < 0.05’” [link], The American Statistician. Ronald Wasserstein, Executive Director of The American Statistical Association [link]; Allen Schirm, Vice President and Director of Human Services Research at Mathematica Policy Research (retired) [link]; & Nicole Lazar, Professor of Statistics at the University of Georgia and President Elect of Caucus for Women in Statistics [link]. This article was part of The American Statistician’s March 2019 special edition, “Statistical Inference in the 21st Century: A World Beyond p < 0.05” [link].


Since the 1930s, many of our top methodologists have argued that significance tests are not conducive to science...Indeed, in the social sciences, the mindless ritual significance test is applied by researchers with little appreciation of its history and virtually no understanding of its actual meaning, and then—despite this alarming dearth of statistical insight—is held up as the hallmark of confirmatory evidence.
— Charles Lambdin

Reference: “Significance tests as sorcery: Science is empirical - significance tests are not” [link], Theory & Psychology, 2012. Charles Lambdin, Intel Corporation, former researcher at Wichita State University.


We...call for the entire concept of statistical significance to be abandoned.
— Valentin Amrhein, Sander Greenland, & Blake McShane on behalf of more than 800 signatories

Reference: “Scientists rise up against statistical significance” [link], Nature. Valentin Amrhein, Professor of Zoology at the University of Basel [link]; Sander Greenland, Professor Emeritus at the UCLA Fielding School of Public Health [link]; & Blake McShane, Associate Professor of Marketing at Northwestern’s Kellogg School of Management [link]. The full list of 854 scientists from 52 countries signing on to “Retire statistical significance” can be found here: [link].


Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold...by itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
— Ronald Wasserstein & Nicole Lazar on behalf of the American Statistical Association

Reference: “The American Statistical Association’s Statement on p-values: Context, Process, and Purpose” [link], The American Statistician, by By Ronald Wasserstein, Executive Director of The American Statistical Association [link] & Nicole Lazar, Professor of Statistics at the University of Georgia and President Elect of Caucus for Women in Statistics [link].


The process of turning data into insight is central to the scientific enterprise. It is therefore remarkable that the most widely used approach—null hypothesis significance testing (NHST)—has been subjected to devastating criticism for so long to so little effect.
— Robert Matthews

Reference: “Moving Towards the Post p < 0.05 Era via the Analysis of Credibility” [link], The American Statistician. Robert Matthews, Professor of Mathematics at Aston University [link]. This article was part of The American Statistician’s March 2019 special edition, “Statistical Inference in the 21st Century: A World Beyond p < 0.05” [link].


The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science, and technology.
— J.A. Nelder

Reference: “Statistics to statistical science” (1999), Journal of the Royal Statistical Society [link]. John Nelder (deceased), Visiting Professor at Imperial College London and Fellow of the Royal Society [link]


My personal view is that p-values should be relegated to the scrap heap and not considered by those who wish to think and act coherently.
— Dennis Lindley

Reference: Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting [link]. From a section by Dennis Lindley (deceased) , Professor University College London and Fellow of the American Statistical Association [link].


Even in situations where the hypothesis testing paradigm is correct, the common practice of basing inferences solely on p-values has been under intense criticism for over 50 years.
— Bayarri et. al.

Reference: “Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses” (1999), Journal of Mathematical Psychology [link], by Bayarri et. al.


Several methodologists have pointed out that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p-values.
— John Loannidis

Reference: “Why Most Published Research Findings Are False” [link], PLoS ONE. John P.A. loannidis of Stanford University, the C.F. Rehnborg Chair in Disease Prevention; Professor of Medicine, of Health Research and Policy, of Biomedical Data Science, and of Statistics; co-Director, Meta-Research Innovation Center at Stanford; Director of the PhD program in Epidemiology and Clinical Research [link].


Some have argued against change due to optimism, arguing that if we simply taught and used the NHST approach correctly all would be fine. We do not believe that the cognitive biases which p-values exacerbate can be trained away. Moreover, those with the highest levels of statistical training still regularly interpret p-values in invalid ways. Vulcans would probably use p-values perfectly; mere humans should seek safer alternatives.
— Robert Calin-Jageman & Geoff Cumming

Reference: “The New Statistics for Better Science: Ask How Much, How Uncertain, and What Else Is Known” [link], The American Statistician. Robert Calin-Jageman, Associate Professor of Psychology and Discipline Director of Neuroscience [link] & Geoff Cumming Emeritus Professor of Psychology at La Trobe University [link] . This article was part of The American Statistician’s March 2019 special edition, “Statistical Inference in the 21st Century: A World Beyond p < 0.05” [link].


We recommend dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences.
— Jennifer Tackett, Christian Robert, Andrew Gelman, David Gal, & Blakeley McShane

Reference: “Abandon Statistical Significance” [link], The American Statistician. Jennifer Tackett, Professor of Psychology and Director of Clinical Psychology at Weinberg College [link]; Christian Robert, Professor of Statistics at University of Warwick [link]; Andrew Gelman, Professor of Statistics and Director of the Applied Statistics Center at Columbia University [link]; David Gal, Professor of Marketing at the University of Illinois [link]; Blake McShane, Associate Professor of Marketing at Northwestern’s Kellogg School of Management [link]. This article was part of The American Statistician’s March 2019 special edition, “Statistical Inference in the 21st Century: A World Beyond p < 0.05” [link].


I believe that the almost universal reliance on merely refuting the null hypothesis as the standard method for corroborating substantive theories in the soft areas is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology.
— Paul Meehl

Reference: “Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology” [link], Journal of Consulting and Clinical Psychology, 1978. Paul Meehl (deceased), Professor of Psychology at the University of Minnesota and past president of the American Psychological Association [link].


The reliability and reproducibility of science are under scrutiny. However, a major cause of this lack of repeatability is not being considered: the wide sample to-sample variability in the P value. We explain why P is fickle to discourage the ill-informed practice of interpreting analyses based predominantly on this statistic.
— Lewis G Halsey, Douglas Curran-Everett, Sarah L Vowler & Gordon B Drummond

Reference: “The fickle P value generates irreproducible results” [link], Nature Methods [link], 2015. Lewis G Halsey, Professor of Life Sciences and the University of Roehampton London and head of the Roehampton University Behaviour and Energetics Lab (RUBEL) [link]; Douglas Curran-Everett, Division of Bioinformatics at the National Jewish Health Hospital and the Department of Biostatistics and Informatics at the University of Colorado Denver’s School of Public Health [link, link]; Sarah L Vowle, Cancer Research UK Cambridge Institute at the University of Cambridge [link]; Gordon B Drummond, Honorary Clinical Senior Lecturer of Anaesthesia at The University of Edinburgh [link]


Associating statistically significant findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems.
— Benjamin et. al.

Reference: “Redefine statistical significance” [link], Nature Human Behaviour, 2017. Daniel Benjamin plus 71 coauthors signed on to a proposal to lower the statistical significance threshold to 0.005. Their proposal is not without controversy [link], but there has been little disagreement about the problem they are trying to solve.


In formal statistical testing, the crude dichotomy of ‘pass/fail’ or ‘significant or not’ will scarcely do. We must determine the magnitudes (and directions) of any statistical discrepancies warranted, and the limits to any substantive claims you may be entitled to infer from the statistical ones.
— Deborah Mayo

Reference: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars [link], 2018. Deborah Mayo, Professor Emerita of Philosophy of Science at Virginia Tech University [link]


We have saddled ourselves with perversions of logic—p-values—and so we deserve our collective fate. I forgive nonstatisticians who cannot provide a correct interpretation of p < 0.05. p-values are fundamentally un-understandable. I cannot forgive statisticians who give understandable—and therefore wrong—definitions of p-values to their nonstatistician colleagues.
— Donald Berry

Reference: “A p-Value to Die For” [link], Journal of the American Statistical Association, 2017. Donald Berry, Professor of Biostatistics at The University of Texas MD Anderson Cancer Center [link]


There are no good uses for [p-values]; indeed, every use either violates frequentist theory, is fallacious, or is based on a misunderstanding.
— William Briggs

Reference: “The substitute for p-Values” [link], Journal of the American Statistical Association, 2017. William Briggs, Assistant Professor Statistics, Weill Medical College of Cornell University [link]


There is a long line of work documenting how applied researchers misuse and misinterpret p-values in practice.
— Blakeley McShanea and David Gal

Reference: “Statistical Significance and the Dichotomization of Evidence” [link], Journal of the American Statistical Association, 2017. David Gal, Professor of Marketing at the University of Illinois [link]; Blake McShane, Associate Professor of Marketing at Northwestern’s Kellogg School of Management [link].


Contrary to common dogma, tests of statistical null hypotheses have relatively little utility in science and are not a fundamental aspect of the scientific method.
— David Anderson, Kenneth Burnham, & William Thompson

Reference: “Null Hypothesis Testing: Problems, Prevalence, and an Alternative,” Journal of Wildlife Management, 2000. David Anderson (employment unknown), former scientist at the Cooperative Fish and Wildlife Research Unit [link]; Kenneth Burnham (retired), former Senior Scientist with the United States Geological Survey [link]; William Thompson, Adjunct Professor in Natural Resources at the University of Rhode Island, National Park Service Research Coordinator for the North Atlantic Coast Cooperative Ecosystem Studies Unit [link].


Few forecasting researchers or practitioners are aware that there is no empirical evidence supporting the use of statistical significance tests. Despite repeated calls for evidence, no one has shown that the applications of tests of statistical significance improve decision-making or advance scientific knowledge.
— Scott Armstrong

Reference: “Statistical Significance Tests are Unnecessary Even When Properly Done and Properly Interpreted: Reply to Commentaries” [link], International Journal of Forecasting, 2007. J. Scott Armstrong [link], Professor of Marketing at the University of Pennsylvania’s Wharton Business School, cofounder of the Journal of Forecasting [link], International Journal of Forecasting [link], International Institute of Forecasters [link], International Symposium on Forecasting [link], and PollyVote.com [link].


It is almost impossible to drag authors away from their p-values, and the more zeros after the decimal point, the harder people cling to them. It is almost as if all the statistics courses in the world stopped after introducing Type I error. ...Perhaps p-values are like mosquitos. They have an evolutionary niche somewhere and no amount of scratching, swatting, or spraying will dislodge them.
— John Campbell

References: “Editorial: Some Remarks From the Outgoing Editor” [link], Journal of Applied Psychology, 1982. John Campbell, Professor of Psychology at the University of Minnesota and former editor of the Journal of Applied Psychology [link].


[U]sing statistical significance tests does not (and cannot) make scientists (or their science) objective.
— Bruce Thompson

Reference: “Statistical Significance Tests, Effect Size Reporting and the Vain Pursuit of Pseudo-Objectivity”, Research article, 1999. Bruce Thompson [retired], former Distinguished Professor of Educational Psychology and Library Sciences at Texas A&M University and former Adjunct Professor of Community Medicine at the Baylor Collect of Medicine [link].


Despite their wide use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research...The issue was highlighted at the 1998 annual conference of The Wildlife Society, in Buffalo, New York, where the Biometrics Working Group sponsored a half-day symposium on Evaluating the Role of Hypothesis Testing-Power Analysis in Wildlife Science. Speakers at that session who addressed statistical hypothesis testing were virtually unanimous in their opinion that the tool was overused, misused, and often inappropriate.
— Douglas Johnson

Reference: "The Insignificance of Statistical Significance Testing" [link], USGS Northern Prairie Wildlife Research Center, 1999. Douglas Johnson, Researcher at the USGS Northern Prairie Wildlife Research Center [link].


The third objective was to examine how the textbooks’ presentations relate to current best practices and how much help they provide for students. The results show that almost all of the textbooks fail to acknowledge that there is controversy surrounding NHST. Most of the textbooks dealt, at least minimally, with the alleged misconceptions of interest, but they provided relatively little help for students.
— Jeffrey Gliner, Nancy Leech, & George Morgan

Reference: “Problems With Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say?” [link], The Journal of Experimental Education, 2002. Jeffrey Gliner, Professor Emeritus, Occupational Therapy at Colorado State University [link]; Nancy Leech, Professor of Education and Human Development at the University of Colorado, Denver [link]; George Morgan, Professor Emeritus, Education and Human Development at Colorado State University [link].


[F]ocusing on p-values and rejecting null hypotheses actually distracts us from our real goals: deciding whether data support our scientific hypotheses and are practically significant.
— Roger Kirk

Reference: “The importance of effect magnitude” [link], Handbook of research methods in experimental psychology, 2003. Roger Kirk (retired), former Distinguished Professor of Psychology and Statistics at Baylor University [link].


After 4 decades of severe criticism, the ritual of null hypothesis significance testing—mechanical dichotomous decisions around a sacred .05 criterion—still persists. This article reviews the problems with this practice, including its near-universal misinterpretation of p...[But] even a correct interpretation of p-values does not achieve very much, and has not for a long time.
— Jacob Cohen

Reference: “The Earth Is Round (p < .05)” [link], American Psychologist, 1994. Jacob Cohen (deceased) [link], former Professor of Psychology at New York University and head of the Quantitative Psychology Group; 1997 winner of the Distinguished Lifetime Achievement Award by the American Psychological Association; fellow of the American Association for the Advancement of Science, the American Psychological Association, and the American Statistical Association; inventor of three statistical measures: Cohen’s d, Cohen’s h, and Cohen’s kappa.


The conventional procedure for null hypothesis significance testing has long been the target of appropriate criticism.
— John Tukey & Lyle Jones

Reference: “A Sensible Formulation of the Significance Test” [link], Psychological Methods, 2000. John Tukey (deceased) [link], Professor and Founding Chairman of the Princeton University Statistics department; AT&T Bell Labs researcher; recipient of the National Medal of Science; recipient of the IEEE Medal of Honor; inventor of the box plot, the Fast Fourier Transform, and the Tukey range test; credited with coining the term “bit”. Lyle Jones (deceased) [link], Professor of Psychology at the University of Chicago, the University of Texas, and the University of North Carolina at Chapel Hill; Director of UNC’s Psychometric Laboratory; Managing editor of Psychometrika; President of the Psychometric Society.


P-values are a practical success but a critical failure. Scientists the world over use them, but scarcely a statistician can be found to defend them. Bayesians in particular find them ridiculous, but even the modern frequentist has little time for them.
— Stephen Senn

Reference: “Two cheers for P-values?” [link], Journal of Epidemiology and Biostatistics, 2001. Stephen Senn [link], statistical consultant; former Professor in the Department of Epidemiology and Public Health and the Department of Statistics at University College London.


...a common informal use of P values as measures of support or evidence for hypotheses has serious logical flaws.
— Mark Schervish

Reference: “P Values: What They Are and What They Are Not” [link], The American Statistician, 1996. Mark Schervish [link], Emeritus Professor of Statistics at Carnegie Mellon University.


In the null hypothesis schema we are trying only to nullify something .... But ordinarily evidence does not take this form. With the corpus delicti in front of you, you do not say, ‘Here is evidence against the hypothesis that no one is dead’. You say, ‘Evidently, someone has been murdered’.
— Joseph Berkson

Reference: “Tests of Significance Considered as Evidence” [link], Journal of the American Statistical Association, 1942. Joseph Berkson [link], former head of the Division of Biometry and Medical Statistics at the Mayo Clinic.


This paper started life as an attempt to defend p-values … I have, however, been led inexorably to the opposite conclusion, that the current use of p values as the ‘main means’ of assessing and reporting the results of clinical trials is indefensible.
— Peter Freeman

Reference: “The role of p-values in analysing trial results” [link], Statistics in Medicine, 1993. Peter Freeman [link] (no additional information found).


Statistical significance is perhaps the least important attribute of a good experiment; it is never a sufficient condition for claiming that a theory has been usefully corroborated, that a meaningful empirical fact has been established, or that an experimental report ought to be published.
— David Lykken

Reference: “Statistical Significance in Psychological Research” [link], Psychological Bulletin, 1968. David Lykken [link], Professor Emeritus of Psychology and Psychiatry at the University of Minnesota, Fellow of the American Association for the Advancement of Science, Fellow of the American Psychological Association, and Charter Fellow of the American Psychological Society.


Logically and conceptually, the use of statistical significance testing in the analysis of research data has been thoroughly discredited...Statistical significance testing retards the growth of scientific knowledge; it never makes a positive contribution.
— Frank Schmidt & John Hunter

Reference: “Eight Common But False Objections to the Discontinuation of Significance Testing in the Analysis of Research Data” [link], white paper, 1997. Frank Schmidt, former Professor of Psychology at the University of Iowa, won multiple scientific awards and sat on the editorial boards of eight different research journals [link]; John Hunter (deceased), former Professor of Psychology at Michigan State University, Distinguished Scientific Award for Contributions to Applied Psychology (joint with Frank L. Schmidt), and the Distinguished Scientific Contributions Award from the Society for Industrial and Organizational Psychology (SIOP) (also joint with Schmidt).


...the test of significance in psychological research may be taken as an instance of a kind of essential mindlessness in the conduct of research...
— David Bakan

Reference: “The Test of Significance in Psychological Research” [link], Psychological Bulletin, 1966. David Bakan (deceased), Professor of Psychology at several institutions including the University of Chicago, Ohio State, Harvard, and York University in Toronto, Canada [link].


Statistical significance testing has involved more fantasy than fact. The emphasis on statistical significance over scientific significance in educational research represents a corrupt form of the scientific method. Educational research would be better off if it stopped testing its results for statistical significance.
— Ronald Carver

Reference: “The Case Against Statistical Significance Testing” [link], Harvard Educational Review, 1978. Ronald Carver (deceased), former Professor of Reading Research at the University of Missouri [link].


It has been widely felt, probably for 30 years and more, that significance tests are overemphasized and often misused and that more emphasis should be put on estimation and prediction.
— David Cox

Reference: “Some General Aspects of the Theory of Statistics” [link], International Statistical Review, 1986. David Cox, former Chair of Statistics at Imperial College London and member of the Department of Statistics at Oxford University; winner of numerous awards including the prestigious International Prize in Statistics; pioneer of numerous statistical tools including binary logistic regression and proportional hazards models. Cox was a constant critic of significance tests, for example in Cox 1977 [link] where he stated that, “Overemphasis on tests of significance at the expense especially of interval estimation has long been condemned” and “The continued very extensive use of significance tests is alarming.”


Our position is that the prevalence of the significance-testing practice is due not only to mindlessness and the force of habit...there are profound psychological reasons leading scholars to believe that they cope with the question of chance and minimize their uncertainty via producing a significant result.
— Ruma Falk and Charles Greenbaum

Reference: “Significance Tests Die Hard: The Amazing Persistence of a Probabilistic Misconception” [link], Theory & Psychology, 1995. Ruma Falk, Professor Emeritus of Psychology at The Hebrew University of Jerusalem [link]; Charles Greenbaum, Professor Emeritus of Psychology at The Hebrew University of Jerusalem [link].


We shall marshal arguments against such [statistical significance] testing, leading to the conclusion that it be abandoned by all substantive science and not just by educational research and other social sciences which have begun to raise voices against the virtual tyranny of this branch of inference in the academic world.
— Louis Guttman

Reference: “The illogic of statistical inference for cumulative science”, Applied Stochastic Models and Data Analysis, [link], 1985. Louis Guttman, Professor of Social and Psychological Assessment at the Hebrew University of Jerusalem, winner of the Israel Prize in the social sciences (1978), winner of the Educational Testing Service Measurement Award from Princeton University (1984) [link].


Despite the plethora of critiques of statistical significance testing, most psychologists understand them poorly, frequently use them inappropriately, and pay little attention to the controversy they have generated.
— Brian Haig

Reference: “The philosophy of quantitative methods” [link], Oxford handbook of Quantitative Methods, 2012. Brian Haig, Professor of Psychology at University of Canterbury.


It seems inconceivable to admit that a methodology as bereft of value as SST [statistical significance testing] has survived, as the centerpiece of inductive inference no less, [after] more than four decades of criticism in the psychology literature.
— Raymond Hubbard and Patricia Ryan

Reference: “The Historical Growth of Statistical Significance Testing in Psychology–and Its Future Prospects” [link], Educational and Psychological Measurement, 2000. Raymond Hubbard, Professor Emeritus of Marketing at Drake University [link]; Patricia Ryan, Associate Professor of Finance and Real Estate at Colorado State University [link].


I argue that the criticisms have sufficient merit to support the minimization or elimination of NHST in the behavioral sciences.
— Rex Kline

Reference: “Beyond significance testing: Reforming data analysis methods in behavioral research (book) [link], American Psychological Association, 2004. Rex Kline, Professor of Psychology at Concordia University [link].


[T]he underlying problem is that NHST results are almost universally misinterpreted in the social-science literature. Along with this fundamental misinterpretation, there has grown a supporting mythology about statistical significance...Collectively, NHST and its emergent mythology have created a statistically and scientifically unsound basis for the evaluation and publication of business school research.
— John Kmetz

Reference: “Fifty Lost Years: Why International Business Scholars Must Not Emulate the. US Social-science Research Model” [link], World Journal of Management, 2011. John Kmetz, Former Associate Professor of Business at University of Delaware [link].


On the contrary, I believe that reliance on NHST has channeled our field into a series of methodological cul-de-sacs.
— Geoffrey Loftus

Reference: “Psychology will be a Much Better Science When We Change the Way We Analyze Data” [link], Current Directions in Psychological Science, 1996. Geoffrey Loftus, Professor of Psychology at the University of Washington [link].


We...say that a finding of ‘statistical’ significance...is on its own almost valueless, a meaningless parlor game.
— Stephen Ziliak and Deirdre McCloskey

Reference: The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives [link], University of Michigan Press, 2008. Stephen Ziliak, faculty member of the Angiogenesis Foundation, conjoint Professor of Business and Law at the University of Newcastle in Australia, and Professor of Economics at Roosevelt University [link]; Deirdre McCloskey, Distinguished Professor of Economics, History, English, and Communication at the University of Illinois at Chicago [link].


Even the strongest proponents [of statistical significance testing] agree that there is much misuse, misinterpretation, and meaningless use of the tests.
— Ramon Henkel and Denton Morrison

Reference: The Significance Test Controversy [link, link], Routledge Press 1970. Denton Morrison (deceased), Professor of Sociology at Michigan State University [link]; Ramon Henkel, Professor of Sociology at the University of Maryland [link].


Clearly, point hypothesis testing has no place in statistical practice if we adopt the creed. This means that most paired and unpaired t-tests, analyses of variance (except for estimating variance components), linear contrasts and multiple comparisons, and tests of significance for correlation and regression coefficients should be avoided by statisticians and discarded from the scientific literature.
— Marks Nester

Reference: “An Applied Statistician's Creed” [link], Journal of the Royal Statistical Society, 1996. Marks Nester (retired), researcher with the Australian Department of Agriculture and Fisheries [link].


It may not be an exaggeration to say that for many PhD students, for whom the .05 alpha has acquired almost an ontological mystique, it can mean joy, a doctoral degree, and a tenure track position at a major university if their dissertation p is less than .05. However, if the p is greater than .05, it can mean ruin, despair...
— Ralph Rosnow and Robert Rosenthal

Reference: “Statistical Procedures and the Justification of Knowledge in Psychological Science”, [link], American Psychologist, 1989. Ralph Rosnow (retired), Professor of Psychology at Temple University [link]; Robert Rosenthal, Professor of Psychology at the University of California, Riverside [link].


The thesis to be advanced is that despite the awesome pre-eminence this method has attained in our experimental journals and textbooks of applied statistics, it is based upon a fundamental misunderstanding of the nature of rational inference, and is seldom if ever appropriate to the aims of scientific research.
— WILLIAM ROZEBOOM

Reference: “The Fallacy of the Null-Hypothesis Significance Test” [link], Psychological Bulletin, 1960. William Rozeboom, Professor of Psychology at the University of Alberta [link].


Significance testing of null hypotheses is the standard epistemological method for advancing scientific knowledge in psychology, even though it has drawbacks and it leads to common inferential mistakes...Although these mistakes have been discussed repeatedly for decades, there is no evidence that the academic discussion has had an impact.
— Patrick Shrout

Reference: “Should Significance Tests be Banned? Introduction to a Special Section Exploring the Pros and Cons” [link], Psychological Science, 1997. Patrick Shrout, .


The null hypothesis significance test (NHST) is the most frequently used statistical method, although its inferential validity has been widely criticized since its introduction....The statistical textbooks, with only some exceptions, do not even mention the NHST controversy. Instead, the myth is spread that NHST is the “natural” final action of scientific inference and the only procedure for testing hypotheses. However, relevant specialists and important regulators of the scientific world advocate avoiding them.
— Luis Carlos Silva-Ayçaguer, Patricio Suárez-Gil, and Ana Fernández-Somoano

Reference: “The null hypothesis significance test in health sciences research (1995-2006): statistical analysis and interpretation” [link], BMC Medical Research Methodology, 2010. Luis Carlos Silva-Ayçaguer, Professor at Centro Nacional de Investigación de Ciencias Médicas [link]; Patricio Suárez-Gil, Hospital de Cabueñes, Servicio de Salud del Principado de Asturias (SESPA) [link]; Ana Fernández-Somoano, CIBER Epidemiología y Salud Pública (CIBERESP), Spain and Departamento de Medicina, Unidad de Epidemiología Molecular del Instituto Universitario de Oncología.


It has been stated that the P-value is perhaps the most misunderstood statistical concept in clinical research. As in the social sciences, the tyranny of SST is still highly prevalent in the biomedical literature even after decades of warnings against SST. The ubiquitous misuse and tyranny of SST threatens scientific discoveries and may even impede scientific progress. In the worst case, misuse of significance testing may even harm patients who eventually are incorrectly treated because of improper handling of P-values.
— Andreas Stang, Charles Poole, and Oliver Kuss

Reference: “The ongoing tyranny of statistical significance testing in biomedical research” [link], European Journal of Epidemiology, 2010. Andreas Stang, Professor of Epidemiology at the Medical Faculty, University of Duisburg-Essen [link]; Charles Pool, Professor of Epidemiology at the University of North Carolina [link]; Oliver Kuss, former Acting Director of the Institute of Medical Statistics, Düsseldorf University Hospital and Medical Faculty of the Heinrich Heine University Düsseldorf [link].


It’s science’s dirtiest secret: The ‘scientific method’ of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions.
— Tom Siegfried

Reference: “Odds Are, It's Wrong” [link], Science News, 2010. Tom Siegfried, former Managing Editor of Science News [link].


This paper shows how p-values do not only create, as well known, wrong expectations in the case of flukes, but they might also dramatically diminish the ‘significance’ of most likely genuine signals.
— Gilles D’Agostini

Reference: “The Waves and the Sigmas (To Say Nothing of the 750 GeV Mirage)” [link], white paper, 2016. Gilles D’Agostini, Universit`a “La Sapienza” and INFN [link].


Our unfortunate historical commitment to significance tests forces us to rephrase these good questions in the negative, attempt to reject those nullities, and be left with nothing we can logically say about the questions...
— Peter Killeen

Reference: “An Alternative to Null-Hypothesis Significance Tests” [link], Psychological Science, 2006. Peter Killeen, Professor of Psychology at Arizona State University, Fellow of the American Psychological Association, the American Psychological Society, and the Association for Behavior Analysis; former President of the Society for Quantitative Analysis of Behavior [link].