Difference between revisions of "FND-STA-Significance"

From "A B C"
Jump to navigation Jump to search
m
m
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div id="BIO">
+
<div id="ABC">
  <div class="b1">
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;">
 
Significance
 
Significance
  </div>
+
<div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; ">
 
+
(Probability and p-values; significance as a threshold of p-values; deriving probability distributions from simulation and interpreting in terms of significance.)
  {{Vspace}}
+
</div>
 
 
<div class="keywords">
 
<b>Keywords:</b>&nbsp;
 
Probability and p-values; significance as a threshold of p-values; deriving probability distributions from simulation and interpreting in terms of significance.
 
 
</div>
 
</div>
  
{{Vspace}}
+
{{Smallvspace}}
 
 
 
 
__TOC__
 
 
 
{{Vspace}}
 
 
 
 
 
{{LIVE}}
 
 
 
{{Vspace}}
 
  
  
</div>
+
<div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;">
<div id="ABC-unit-framework">
+
<div style="font-size:118%;">
== Abstract ==
+
<b>Abstract:</b><br />
 
<section begin=abstract />
 
<section begin=abstract />
<!-- included from "../components/FND-STA-Significance.components.wtxt", section: "abstract" -->
 
 
The probability of an event is the chance of it occurring, but how do we relate that to the question whether an observation is significant? In this context we talk about ''p''-values and the meaning of a ''p''-value is not the same as the probability of an observation. The ''p''-value of an observation is the probability that - assuming a null hypothesis is true - an event as extreme or more extreme is observed. This unit contains R code to study this concept.
 
The probability of an event is the chance of it occurring, but how do we relate that to the question whether an observation is significant? In this context we talk about ''p''-values and the meaning of a ''p''-value is not the same as the probability of an observation. The ''p''-value of an observation is the probability that - assuming a null hypothesis is true - an event as extreme or more extreme is observed. This unit contains R code to study this concept.
 
<section end=abstract />
 
<section end=abstract />
 
+
</div>
{{Vspace}}
+
<!-- ============================ -->
 
+
<hr>
 
+
<table>
== This unit ... ==
+
<tr>
=== Prerequisites ===
+
<td style="padding:10px;">
<!-- included from "../components/FND-STA-Significance.components.wtxt", section: "prerequisites" -->
+
<b>Objectives:</b><br />
<!-- included from "ABC-unit_components.wtxt", section: "notes-prerequisites" -->
 
You need to complete the following units before beginning this one:
 
*[[FND-STA-Probability_distribution]]
 
 
 
{{Vspace}}
 
 
 
 
 
=== Objectives ===
 
<!-- included from "../components/FND-STA-Significance.components.wtxt", section: "objectives" -->
 
 
* Introduce the difference between ''p''-values and event probability;
 
* Introduce the difference between ''p''-values and event probability;
 
* Show how we interpret ''p''-values in terms of "significance";
 
* Show how we interpret ''p''-values in terms of "significance";
Line 52: Line 28:
 
* Present a permutation example, a strategy that can be used as an alternative to the integration of probability density functions.
 
* Present a permutation example, a strategy that can be used as an alternative to the integration of probability density functions.
 
* Discuss a common error that is made when establishing the significance of an observation in the biomedical sciences.
 
* Discuss a common error that is made when establishing the significance of an observation in the biomedical sciences.
 
+
</td>
{{Vspace}}
+
<td style="padding:10px;">
 
+
<b>Outcomes:</b><br />
 
 
=== Outcomes ===
 
<!-- included from "../components/FND-STA-Significance.components.wtxt", section: "outcomes" -->
 
 
;After working through this unit you should...
 
;After working through this unit you should...
 
* Be able to define a "''p''-value";
 
* Be able to define a "''p''-value";
Line 64: Line 37:
 
* Be able to critically assess whether an observation should be considered "significant" in that context;
 
* Be able to critically assess whether an observation should be considered "significant" in that context;
 
* Be able to identify a common error that is made in the literature when two effects are compared.
 
* Be able to identify a common error that is made in the literature when two effects are compared.
 +
</td>
 +
</tr>
 +
</table>
 +
<!-- ============================  -->
 +
<hr>
 +
<b>Deliverables:</b><br />
 +
<section begin=deliverables />
 +
<li><b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.</li>
 +
<li><b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.</li>
 +
<li><b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].</li>
 +
<section end=deliverables />
 +
<!-- ============================  -->
 +
<hr>
 +
<section begin=prerequisites />
 +
<b>Prerequisites:</b><br />
 +
This unit builds on material covered in the following prerequisite units:<br />
 +
*[[FND-STA-Probability_distribution|FND-STA-Probability_distribution (Probability Distribution)]]
 +
<section end=prerequisites />
 +
<!-- ============================  -->
 +
</div>
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Deliverables ===
 
<!-- included from "../components/FND-STA-Significance.components.wtxt", section: "deliverables" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-time_management" -->
 
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" -->
 
*<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these.
 
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" -->
 
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]].
 
  
{{Vspace}}
+
{{Smallvspace}}
  
  
=== Evaluation ===
+
__TOC__
<!-- included from "../components/FND-STA-Significance.components.wtxt", section: "evaluation" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "eval-none" -->
 
<b>Evaluation: NA</b><br />
 
:This unit is not evaluated for course marks.
 
  
 
{{Vspace}}
 
{{Vspace}}
  
  
</div>
+
=== Evaluation ===
<div id="BIO">
+
<b>Evaluation: NA</b><br />
 +
<div style="margin-left: 2rem;">This unit is not evaluated for course marks.</div>
 
== Contents ==
 
== Contents ==
<!-- included from "../components/FND-STA-Significance.components.wtxt", section: "contents" -->
 
  
  
=== "Significance" concepts ===
+
=== "Significance" concepts in practice ===
  
{{R-unit|FND-STA-Significance}}
+
Here we discuss the idea of a p-value, in particular how to compute "empirical p-values". These are very easy to compute and simulate in R. There is just one thing to be aware of: while we normally approximate a p-value from observed events ''r'' divided by the number of observations ''N'' as ''r'' / ''N'', if we use this approach to evaluate '''significance''', i.e. we are asking whetehr our ''r'' events are taken from the '''same distribution''' as the ''N'' observations, we need to apply a correction factor: (''r'' + 1) / (''N'' + 1)<ref>{{#pmid:12596795}}</ref>.
 +
{{Smallvspace}}
 +
;Empirical p-value: (''r'' + 1) / (''N'' + 1)
 +
:for ''r'' events of interest in ''N'' observations.
 +
{{Smallvspace}}
 +
{{ABC-unit|FND-STA-Significance.R}}
  
 
{{Vspace}}
 
{{Vspace}}
Line 113: Line 99:
  
  
 +
== Self-evaluation ==
 +
<!--
 +
=== Question 1===
 +
 +
Question ...
 +
 +
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 +
Answer ...
 +
<div class="mw-collapsible-content">
 +
Answer ...
  
{{Vspace}}
+
</div>
 +
  </div>
  
 +
  {{Vspace}}
  
 +
-->
 
== Further reading, links and resources ==
 
== Further reading, links and resources ==
 
{{DOI
 
{{DOI
Line 127: Line 126:
 
|URL=
 
|URL=
 
|doi = 10.1198/000313008X332421
 
|doi = 10.1198/000313008X332421
|file= Zhang(2012)StructurePrediction.pdf
+
|file=
 
|abstract= ''P''-values are taught in introductory statistics classes in a way that confuses many of the students, leading to common misconceptions about their meaning. In this article, we argue that ''p''-values should be taught through simulation, emphasizing that ''p''-values are random variables. By means of elementary examples we illustrate how to teach students valid interpretations of ''p''-values and give them a deeper understanding of hypothesis testing.
 
|abstract= ''P''-values are taught in introductory statistics classes in a way that confuses many of the students, leading to common misconceptions about their meaning. In this article, we argue that ''p''-values should be taught through simulation, emphasizing that ''p''-values are random variables. By means of elementary examples we illustrate how to teach students valid interpretations of ''p''-values and give them a deeper understanding of hypothesis testing.
 
}}
 
}}
 
{{Vspace}}
 
 
 
 
== Notes ==
 
== Notes ==
<!-- included from "../components/FND-STA-Significance.components.wtxt", section: "notes" -->
 
<!-- included from "ABC-unit_components.wtxt", section: "notes" -->
 
 
<references />
 
<references />
  
 
{{Vspace}}
 
{{Vspace}}
  
 
</div>
 
<div id="ABC-unit-framework">
 
== Self-evaluation ==
 
<!-- included from "../components/FND-STA-Significance.components.wtxt", section: "self-evaluation" -->
 
<!--
 
=== Question 1===
 
 
Question ...
 
 
<div class="toccolours mw-collapsible mw-collapsed" style="width:800px">
 
Answer ...
 
<div class="mw-collapsible-content">
 
Answer ...
 
 
</div>
 
  </div>
 
 
  {{Vspace}}
 
 
-->
 
 
{{Vspace}}
 
 
 
 
{{Vspace}}
 
 
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_ask" -->
 
 
----
 
 
{{Vspace}}
 
 
<b>If in doubt, ask!</b> If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
 
 
----
 
 
{{Vspace}}
 
  
 
<div class="about">
 
<div class="about">
Line 190: Line 143:
 
:2017-08-05
 
:2017-08-05
 
<b>Modified:</b><br />
 
<b>Modified:</b><br />
:2017-09-06
+
:2020-09-22
 
<b>Version:</b><br />
 
<b>Version:</b><br />
:1.0
+
:1.1.1
 
<b>Version history:</b><br />
 
<b>Version history:</b><br />
 +
*1.1 2020 Maintenance
 +
*1.1 Added definition of empirical p-value
 
*1.0 First live
 
*1.0 First live
 
*0.1 First stub
 
*0.1 First stub
 
</div>
 
</div>
[[Category:ABC-units]]
 
<!-- included from "ABC-unit_components.wtxt", section: "ABC-unit_footer" -->
 
  
 
{{CC-BY}}
 
{{CC-BY}}
  
 +
[[Category:ABC-units]]
 +
{{UNIT}}
 +
{{LIVE}}
 
</div>
 
</div>
 
<!-- [END] -->
 
<!-- [END] -->

Latest revision as of 09:28, 25 September 2020

Significance

(Probability and p-values; significance as a threshold of p-values; deriving probability distributions from simulation and interpreting in terms of significance.)


 


Abstract:

The probability of an event is the chance of it occurring, but how do we relate that to the question whether an observation is significant? In this context we talk about p-values and the meaning of a p-value is not the same as the probability of an observation. The p-value of an observation is the probability that - assuming a null hypothesis is true - an event as extreme or more extreme is observed. This unit contains R code to study this concept.


Objectives:

  • Introduce the difference between p-values and event probability;
  • Show how we interpret p-values in terms of "significance";
  • Illustrate this with an example;
  • Present a permutation example, a strategy that can be used as an alternative to the integration of probability density functions.
  • Discuss a common error that is made when establishing the significance of an observation in the biomedical sciences.

Outcomes:

After working through this unit you should...
  • Be able to define a "p-value";
  • Be able to set up a permutation test or a sampling simulation to estimate a probability density;
  • Be able to interpret the frequency of values in that probability density in terms of a p-value;
  • Be able to critically assess whether an observation should be considered "significant" in that context;
  • Be able to identify a common error that is made in the literature when two effects are compared.

Deliverables:

  • Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
  • Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
  • Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

  • Prerequisites:
    This unit builds on material covered in the following prerequisite units:


     



     



     


    Evaluation

    Evaluation: NA

    This unit is not evaluated for course marks.

    Contents

    "Significance" concepts in practice

    Here we discuss the idea of a p-value, in particular how to compute "empirical p-values". These are very easy to compute and simulate in R. There is just one thing to be aware of: while we normally approximate a p-value from observed events r divided by the number of observations N as r / N, if we use this approach to evaluate significance, i.e. we are asking whetehr our r events are taken from the same distribution as the N observations, we need to apply a correction factor: (r + 1) / (N + 1)[1].

     
    Empirical p-value
    (r + 1) / (N + 1)
    for r events of interest in N observations.
     

    Task:

     
    • Open RStudio and load the ABC-units R project. If you have loaded it before, choose FileRecent projectsABC-Units. If you have not loaded it before, follow the instructions in the RPR-Introduction unit.
    • Choose ToolsVersion ControlPull Branches to fetch the most recent version of the project from its GitHub repository with all changes and bug fixes included.
    • Type init() if requested.
    • Open the file FND-STA-Significance.R and follow the instructions.


     

    Note: take care that you understand all of the code in the script. Evaluation in this course is cumulative and you may be asked to explain any part of code.


     


     

    Controversies

    Task:
    Examine the papers below that introduce difficulties with P-values and statistical significance. Rephrase the issues in your own words to make sure that you understand what the discussion is about.

    Baker (2016) Statisticians issue warning over misuse of P values. Nature 531:151. (pmid: 26961635)

    PubMed ] [ DOI ]

    Nieuwenhuis et al. (2011) Erroneous analyses of interactions in neuroscience: a problem of significance. Nat Neurosci 14:1105-7. (pmid: 21878926)

    PubMed ] [ DOI ] In theory, a comparison of two experimental effects requires a statistical test on their difference. In practice, this comparison is often based on an incorrect procedure involving two separate tests in which researchers conclude that effects differ when one effect is significant (P < 0.05) but the other is not (P > 0.05). We reviewed 513 behavioral, systems and cognitive neuroscience articles in five top-ranking journals (Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience) and found that 78 used the correct procedure and 79 used the incorrect procedure. An additional analysis suggests that incorrect analyses of interactions are even more common in cellular and molecular neuroscience. We discuss scenarios in which the erroneous procedure is particularly beguiling.


    Self-evaluation

    Further reading, links and resources

    Duncan J Murdoch, Yu-Ling Tsai & James Adcock (2008) P-Values are Random Variables. The American Statistician 62:3:242-245. (pmid: None)
    DOI ] P-values are taught in introductory statistics classes in a way that confuses many of the students, leading to common misconceptions about their meaning. In this article, we argue that p-values should be taught through simulation, emphasizing that p-values are random variables. By means of elementary examples we illustrate how to teach students valid interpretations of p-values and give them a deeper understanding of hypothesis testing.

    Notes

    1. North et al. (2003) A note on calculation of empirical P values from Monte Carlo procedure. Am J Hum Genet 72:498-9. (pmid: 12596795)

      PubMed ] [ DOI ]


     


    About ...
     
    Author:

    Boris Steipe <boris.steipe@utoronto.ca>

    Created:

    2017-08-05

    Modified:

    2020-09-22

    Version:

    1.1.1

    Version history:

    • 1.1 2020 Maintenance
    • 1.1 Added definition of empirical p-value
    • 1.0 First live
    • 0.1 First stub

    CreativeCommonsBy.png This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.