Difference between revisions of "BCH441 Code submisson instructions"
m (→Instructions) |
m (→Instructions) |
||
Line 28: | Line 28: | ||
;Developing | ;Developing | ||
− | :You must develop your code in one or more '''functions''', not just ''ad hoc'' scripts. The analysis must be run from a function. Make sure that your variables are instantiated within the function body, or passed as parameters. Do not refer to variables in the global workspace, create everything you need in the local context. <small>(Ask, if you are not sure how to do that e.g. for a genetic code table or such.)</small> Please consider the [[ | + | :You must develop your code in one or more '''functions''', not just ''ad hoc'' scripts. The analysis must be run from a function. Make sure that your variables are instantiated within the function body, or passed as parameters. Do not refer to variables in the global workspace, create everything you need in the local context. <small>(Ask, if you are not sure how to do that e.g. for a genetic code table or such.)</small> Please consider the [[ABC-Rubrics]]. |
;Final analysis | ;Final analysis |
Revision as of 23:23, 17 November 2020
Code submission instructions
Contents
This course has strict formal requirements for submitting code for credit. These requirements are designed to demonstrate that it was you who ran the code, that the code you have submitted is the exact code that was used to produce the submitted output, and that the results have not been altered after submission.[1] Just to remind you:
- Once you have developed your analysis and submitted your results and code
- The submitted code must be exactly the code that you have used to obtain your results. Altering the code in your documentation - even if only for cosmetic purposes - is an academic offence.
- The reported results must be exactly the results your code has produced at the time you ran it. Altering the results you obtained - for any reason whatsoever - is an academic offence.
- It must be possible to reproduce your exact results from your posted code.
Of course a situation might arise where you do need to alter the code - e.g. to format the output better, to add a reference to a source that you forgot, or to add a clarifying comment. Not a problem. Simply modify whatever you need, and replace the entire old code and output in your documentation.
When in doubt: be safe and ask; don't make assumptions.
Instructions
- Developing
- You must develop your code in one or more functions, not just ad hoc scripts. The analysis must be run from a function. Make sure that your variables are instantiated within the function body, or passed as parameters. Do not refer to variables in the global workspace, create everything you need in the local context. (Ask, if you are not sure how to do that e.g. for a genetic code table or such.) Please consider the ABC-Rubrics.
- Final analysis
- Your final analysis should run from a function call without manual intervention. Your function needs to execute
sealKey()
as its first instruction and as its last instruction.sealKey()
takes a snapshot of the function state, deposits it in an encrypted version on the course server, and returns a validating key. (Ask, if you feel this does not work well for the solution you had in mind and you need a different approach.)
Please follow the following code pattern:
mySubmissionCode <- function(<parameters>) { sealKey() # run sealKey() as the first command # put your code here ... # make sure it is well commented # make sure you reference your sources # make your function draw your plots, compute averages, etc. and use print() or cat() # as required to output results. sealKey() # run sealKey() again, as the last command return() }
Finally: run your code as a single command (e.g. mySubmissionCode(a = "this", b="that",...)
), and paste the entire code and all of its output into your submission
Example
Here is a formatted example, in case you need one.
Example code
- The code I used
myAApairs <- function(mySeq, N=1000) { # compare observed with expected amino acid pair frequencies # input: char a sequence vector of one-letter codes # int number of times to run a trial # value: NULL (invisible). Results are printed as side-effect sealKey() # run sealKey() as the first command aa <- c("a", "c", "d", "e", "f", "g", "h", "i", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "y") aap <- paste0(aa[rep(1:20, each = 20)], aa[rep(1:20, 20)]) # all pairs pairs <- function(s) { # Return a table of amino acid pair frequencies from a vector s. # We make all pairs from s. Then we convert the observed pairs into factors # and use table(). Since we define the levels with aap, table() also reports # zero-counts. (cf. https://stackoverflow.com/questions/1617061/ ) l <- length(s) myPairs <- paste0(s[1:(l-1)], s[2:l]) return(table(factor(myPairs, levels = aap))) } if (FALSE) { # test the function myS <- rep(c("a","c"), each = 3) #"a" "a" "a" "c" "c" "c" x <- pairs(myS) length(x) # 400 x[x != 0] # look at all non-zero counts # aa ac cc # 2 1 2 } # process ... mySeq <- tolower(mySeq) tObs <- pairs(mySeq) tSim <- pairs("") # initialize with empty string: all zero counts for (i in 1:N) { tSim <- tSim + pairs(sample(mySeq)) # add paircounts of shuffled sequence } tSim <- tSim / N # average # Report differences as log-ratio. # First, add dummy observations instead of zero-counts to avoid infinities. tObs <- tObs + 0.0001 tSim <- tSim + 0.0001 # Get log ratios tDiff <- log(tObs / tSim) # print results cat("\nTen highest enriched pairs (tObs > tSim):\n") ord <- order(tDiff, decreasing = TRUE) cat(" ") # IDK why this is needed, but it doesn't align otherwise cat(sprintf("Pair: %s log(q): %5.3f (fObs: %5.2f fSim: %5.2f)\n", names(tObs[ord[1:10]]), tDiff[ord[1:10]], tObs[ord[1:10]], tSim[ord[1:10]])) cat("\nTen most depleted pairs (tObs < tSim):\n") ord <- order(tDiff, decreasing = FALSE) cat(" ") cat(sprintf("Pair: %s log(q): %5.3f (fObs: %5.2f fSim: %5.2f)\n", names(tObs[ord[1:10]]), tDiff[ord[1:10]], tObs[ord[1:10]], tSim[ord[1:10]])) sealKey() # run sealKey() again, as the last command return(invisible(NULL)) }
- How it was run
sel <- grep("MBP1", myDB$protein$name) # Mbp1 orthologues mySeq <- paste0(myDB$protein$sequence[sel], collapse="-") # fetch and separate mySeq <- unlist(strsplit(mySeq, "")) # turn into vector myAApairs(mySeq)
- What it produced
[1] "sealKey> 2020-10-12 05:10:18 (6ca9e8aa-37cc-877f-ef98-f60fafec7ec1)" Ten highest enriched pairs (tObs > tSim): Pair: hw log(q): 2.053 (fObs: 11.00 fSim: 1.41) Pair: wi log(q): 1.568 (fObs: 13.00 fSim: 2.71) Pair: tw log(q): 1.527 (fObs: 14.00 fSim: 3.04) Pair: hh log(q): 1.215 (fObs: 19.00 fSim: 5.64) Pair: pp log(q): 0.925 (fObs: 65.00 fSim: 25.77) Pair: wa log(q): 0.921 (fObs: 13.00 fSim: 5.18) Pair: wv log(q): 0.906 (fObs: 7.00 fSim: 2.83) Pair: dw log(q): 0.889 (fObs: 9.00 fSim: 3.70) Pair: fh log(q): 0.837 (fObs: 12.00 fSim: 5.19) Pair: yf log(q): 0.750 (fObs: 11.00 fSim: 5.20) Ten most depleted pairs (tObs < tSim): Pair: fy log(q): -10.868 (fObs: 0.00 fSim: 5.25) Pair: aw log(q): -10.864 (fObs: 0.00 fSim: 5.23) Pair: my log(q): -10.854 (fObs: 0.00 fSim: 5.18) Pair: lw log(q): -10.784 (fObs: 0.00 fSim: 4.82) Pair: ew log(q): -10.616 (fObs: 0.00 fSim: 4.08) Pair: wd log(q): -10.524 (fObs: 0.00 fSim: 3.72) Pair: wr log(q): -10.498 (fObs: 0.00 fSim: 3.62) Pair: pw log(q): -10.330 (fObs: 0.00 fSim: 3.07) Pair: ic log(q): -10.323 (fObs: 0.00 fSim: 3.04) Pair: wq log(q): -10.215 (fObs: 0.00 fSim: 2.73) [1] "sealKey> 2020-10-12 05:10:22 (5151048f-1ba2-e074-3281-18a0e0a080e7)"
Notes
- ↑ This all is in response to an academic offence that occurred when a student could not get their code to run correctly and then confabulated the results.