Difference between revisions of "FND-Biocomputing setup"
m |
m |
||
Line 1: | Line 1: | ||
− | <div id=" | + | <div id="ABC"> |
− | + | <div style="padding:5px; border:1px solid #000000; background-color:#b3dbce; font-size:300%; font-weight:400; color: #000000; width:100%;"> | |
Computer Setup for Biocomputing | Computer Setup for Biocomputing | ||
− | + | <div style="padding:5px; margin-top:20px; margin-bottom:10px; background-color:#b3dbce; font-size:30%; font-weight:200; color: #000000; "> | |
− | + | (Paths, folders and files; Course Folder; Setup for biocomputing: Xcode, R and RStudio, python, homebrew, TeX ...) | |
− | + | </div> | |
− | |||
− | |||
− | |||
− | Paths, folders and files; Course Folder; Setup for biocomputing: Xcode, R and RStudio, python, homebrew, TeX ... | ||
</div> | </div> | ||
− | {{ | + | {{Smallvspace}} |
− | + | <div style="padding:5px; border:1px solid #000000; background-color:#b3dbce33; font-size:85%;"> | |
− | + | <div style="font-size:118%;"> | |
− | + | <b>Abstract:</b><br /> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | < | ||
− | <div | ||
− | |||
<section begin=abstract /> | <section begin=abstract /> | ||
− | |||
Some considerations are required to turn your laptop into an effective tool for biocomputing tasks. This includes consistent principles for organizing files and folders, and availability of tools to create, install, and deploy software. This unit introduces those concepts. | Some considerations are required to turn your laptop into an effective tool for biocomputing tasks. This includes consistent principles for organizing files and folders, and availability of tools to create, install, and deploy software. This unit introduces those concepts. | ||
<section end=abstract /> | <section end=abstract /> | ||
− | + | </div> | |
− | + | <!-- ============================ --> | |
− | + | <hr> | |
− | + | <table> | |
− | == | + | <tr> |
− | === | + | <td style="padding:10px;"> |
− | < | + | <b>Objectives:</b><br /> |
This unit will ... | This unit will ... | ||
* ... inform you about file- and folder names and paths; | * ... inform you about file- and folder names and paths; | ||
* ... outline a basic set of software tools that are useful. | * ... outline a basic set of software tools that are useful. | ||
− | + | </td> | |
− | + | <td style="padding:10px;"> | |
− | + | <b>Outcomes:</b><br /> | |
− | |||
− | |||
− | < | ||
After working through this unit you ... | After working through this unit you ... | ||
* ... can correctly identify (and write) file names with extensions, and file paths, on your computer; | * ... can correctly identify (and write) file names with extensions, and file paths, on your computer; | ||
* ... have created a Course Folder for this course or workshop on your computer; | * ... have created a Course Folder for this course or workshop on your computer; | ||
* ... are able to to further configure your computer for biocomputing tasks. | * ... are able to to further configure your computer for biocomputing tasks. | ||
− | + | </td> | |
− | + | </tr> | |
− | + | </table> | |
− | + | <!-- ============================ --> | |
− | === | + | <hr> |
− | < | + | <b>Deliverables:</b><br /> |
+ | <section begin=deliverables /> | ||
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-time_management" --> | <!-- included from "./data/ABC-unit_components.txt", section: "deliverables-time_management" --> | ||
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. | *<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. | ||
Line 63: | Line 46: | ||
<!-- included from "./data/ABC-unit_components.txt", section: "deliverables-insights" --> | <!-- included from "./data/ABC-unit_components.txt", section: "deliverables-insights" --> | ||
*<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]]. | *<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]]. | ||
+ | <section end=deliverables /> | ||
+ | <!-- ============================ --> | ||
+ | </div> | ||
+ | |||
+ | {{Smallvspace}} | ||
+ | |||
+ | |||
+ | |||
+ | {{Smallvspace}} | ||
+ | |||
+ | |||
+ | __TOC__ | ||
{{Vspace}} | {{Vspace}} | ||
− | |||
− | |||
== Contents == | == Contents == | ||
<!-- included from "./components/FND-Biocomputing_setup.components.txt", section: "contents" --> | <!-- included from "./components/FND-Biocomputing_setup.components.txt", section: "contents" --> | ||
Line 173: | Line 166: | ||
{{Vspace}} | {{Vspace}} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Self-evaluation == | == Self-evaluation == | ||
− | |||
<!-- | <!-- | ||
=== Question 1=== | === Question 1=== | ||
Line 213: | Line 183: | ||
--> | --> | ||
− | + | == Notes == | |
− | {{ | + | <!-- included from "./components/FND-Biocomputing_setup.components.txt", section: "notes" --> |
− | + | <!-- included from "./data/ABC-unit_components.txt", section: "notes" --> | |
− | + | <references /> | |
+ | == Further reading, links and resources == | ||
+ | <!-- {{#pmid: 19957275}} --> | ||
+ | <!-- {{WWW|WWW_GMOD}} --> | ||
+ | <!-- <div class="reference-box">[http://www.ncbi.nlm.nih.gov]</div> --> | ||
{{Vspace}} | {{Vspace}} |
Revision as of 19:32, 26 January 2018
Computer Setup for Biocomputing
(Paths, folders and files; Course Folder; Setup for biocomputing: Xcode, R and RStudio, python, homebrew, TeX ...)
Abstract:
Some considerations are required to turn your laptop into an effective tool for biocomputing tasks. This includes consistent principles for organizing files and folders, and availability of tools to create, install, and deploy software. This unit introduces those concepts.
Objectives:
|
Outcomes:
|
Deliverables:
- Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
- Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
- Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.
Contents
Contents
Creating and consuming
Over the last decade the central paradigm of how we work with computing devices has changed for a large fraction of our day to day activities. Our previous, open and effective data-centric model of computation - i.e. storing data and then transforming it through various interoperable tools - has largely been replaced by a landscape where the software ecosystems consists of apps that lock down the data, aim to own the user experience, and thus find ways to monetize it. We are now in an application-centric era of computing. That is in itself not suprising - after all, much of this functionality is free, and you must realize that if you use free tools, you are not the customer, you are the product. However, this development is not helpful for scientific computing and you have to make some effort to go beyond these constraints that are designed to make users convenient consumers of data and services.
You need to become creators instead.
You need to take control of how your computer is organized, and how you work with it.
Paths, Folders and Files
The first step of organizing your computer is to obtain a good awareness of files, folders (or directories) that contain them, and how files are identified. A file is some stored information that is uniquely identified in a catalogue called a "filesystem". Files can be data, like documents, music or movies, files can be applications (yes, computer programs are also data - a set of machine-level instructions), and your computer can also have subsystems that look like files, because one can read from them and write to them - but they actually handle the I/O (input / output) of your computer: things like keyboards and printers, or sockets for communicating with other computers.
Organizing files means giving them good names and placing them into folders where they are easy to find.
A filename is a label that identifies a file. Often filenames have two parts: the actual name, and an extension. To specify a file on the computer's command line, or when working with R, you need to specify its full name including the extension. Now, the problem is that you can switch off the display of extensions in Windows; I'm afraid this is actually done by default. This means you don't see what the file is actually called, lest you be frightened by a .jpg
or a .mp4
suffix to the name. But then all hell breaks lose when you are trying to do "real" work. Files can't be found, or worse, can be inadvertently overwritten. Never allow your operating system to hide file extensions from you. You must be able to see the full name[1].
A path is the complete specification of where a file is located in the hierarchically organized directory tree of your computer. Paths are simply directories strung together into a long string, separated by a forward slash "/
" (on Mac or Unix) or a backslash "\
" on Windows.
- Folder name and path examples
- /Users/Pierette/Documents/BCB420 ◁ Looking good on a Mac or a Linux system.
- C:\Users\Pulcinella\Documents\CBW ◁ Looking good on a Windows computer.
The "top level directory" is the letter of the drive followed by ":\
" on Windows computers, and a simple forward-slash "/
" on Mac and Unix computers. All other directories are "sub-directories". Note that you can't tell from a directory listing alone whether e.g. "Users
" is a directory or a file. The operating system will usually identify this with an icon, and R has different commands to differentiate the two[2].
It's really useful to get into a consistent habit of giving your files a meaningful name. The name should include something that tells you what the file contains, and something that tells you the date or version. I give versions major and minor numbers, and - knowing how much things always change - I write major version numbers with a leading zero eg. 04
so that they will be correctly sorted by name in a directory listing. The same goes for dates: always write YYYY-MM-DD
to ensure proper sorting.
In my experience, it is better to organize file hierarchies wide, not deep. This means I aim to put more things in one folder rather than create elaborate directory structures. I need to look for stuff a lot, and looking more-or-less in the same folder keeps my files more visible. Files that are tucked away in sub-directories are harder to find. And to avoid having very, very, very many subdirectories in one place, you should consider adding an 99-Archive
folder (the 99-...
prefix keeps it sorted at the bottom of the directory listing, and move directories that you keep only for reference into there.
One more thing, one golden rule that you should make every effort to adhere to: don't store the same contents in more than one place. In the best case this is merely unnecessarily needlessly redundant, but in the common worst case the two copies will go out of sync. If you need to have a file in two different folder, keep it in only one folder and put an "alias" into the other. On the Mac, you select a file or folder and <option><command><drag>
it to a new location to create an alias, or hit <command>L
. On Windows - ???.
Course Folder
Files for this course (or workshop) should all be in one "Course Folder".
Task:
Create a folder (directory) on your computer in which to keep materials for this course (or workshop). Put it into the right place, and give it the right name:
- The right place is directly in the
Documents
folder of your account.
- The right name is simply the
<Coursecode>
e.g. for a CBW workshop in 2016, you call the folderCBW
, for a BCH441 course, the name should beBCH441
, or you could use my genericABC
(A Bioinformatics Course). Keep it short, but specific.
Do not use spaces, hyphens, or any other special characters in your filename - we have encountered various problems with such filenames in the past.[3].
We will refer to this folder as the Course Folder. (I use the words "folder" and "directory" synonymously and completely interchangeably.)
Biocomputing tools
There are a number of tools you will commonly find on a professionally configured computer:
- A commandline interface
- While graphical user interfaces (GUI) are very helpful for interactive work, they (generally) can't be scripted and thus are an obstacle to high-throughput- and repetitive tasks. A commandline interface allows more expressive commands and easy scripting. For Mac and Linux users, your systems have terminal applications that make the underlying unix commands available. On the Mac, find "Terminal in your Applications/Utilities folder and put it in the dock. For windows users there is
cmd.exe
- but the command set is very different and that requires you to learn yet another language. The better solution is to instal Cygwin, which creates a unix shell that interacts with the Windows operating system.
- A package manager
- Complex software uses other software, such as libraries for graphics, numerical methods, or security and such libraries are "dependencies" of the code. Often dependencies need a specific version to work with a particular software. Especially when code needs to be compiled from source-code because it is not available as a pre-compiled bundel, dependencies need to be updated in very specific ways. Package managers to the rescue. These programs have validated recipes on how to install, maintain and update software. On the mac, the go-to system is Homebrew. On Linux, this depends on which windows version you are running. On Windows I have heard of "chocolatey" and "scoop"; can't give a recommendation though.
- A version control system
- Creating assets for reproducible research requires version control, i.e. the ability to document when what change was made, and to return to previous versions if necessary. The current go-to tool for this is Git, which is especially useful since it interfaces with GitHub
- Programming languages
- Besides R and RStudio, you need to be able to compile C- and C++ code, you will need a working version of Java, and you should have a recent Python installation and an IDE to write code.
- LaTeX
In engineering and computer science, you practically grow up with it. Conference abstracts are submitted in LaTex, papers are written in LaTex, most likely your thesis will be submitted in LaTex - it's everywhere. But in the life sciences you can go through your entire career without having heard of LaTex. LaTex is a document preparation and layout system that is very powerful, very flexible - and, boy, does it have a learning curve. Fortunately you don't need to know LaTeX to use LaTeX, at least for basic tasks - there are now pretty decent WYSIWYG editors, and many applications have some wrapper function that uses LaTeX as its backend - for example to produce PDF documents. Some of the functions provided by R work that way. Thus, installing LaTeX on your system and is a good thing to have - even though you can get by without for most analysis tasks, you will sooner or later need it when it comes to publication quality plots. However, installation is not only platform dependent, but depends on the version of your OS, and sometimes the level - since most of the time LaTeX is used by other programs, the installers need to get the path just right, set some environment variables, etc. You should be fine with the instructions at the LaTeX project but don't do this from home if you have a bandwidth cap - a full installation runs to 2.5 GB (though there are smaller basic installs available). Linux users are best served through one of the standard package managers, and on the Mac there is a homebrew "cask" but folklore has it that the direct installation directly is more robust[4].
OS X
A useful guide to configuring your system is posted here.
Windows
I don't know about a good, current guide for setting up Windows computers for biocomputing. If you have specific experience and advice, let's collect it here.
Linux
I mention Linux rarely because I find that people who work on a Linux platform already know what they are doing. Yay. Ask, if you have specific questions.
Self-evaluation
Notes
- ↑ RStudio is actually very helpful in this regard, since it always shows you the full name of your file in its file-pane, and it always also shows you the "hidden" files that your operating system does not show to you, lest they hurt our little brains.
- ↑
list.dirs()
andlist.files()
. - ↑ After the course, you can rename / move the directory to whatever, wherever you want, but during the course, we need your files in a predictable location to be able to troubleshoot problems.
- ↑ See here: https://tex.stackexchange.com/questions/307483/setting-up-basictex-homebrew - but keep in mind that people who post on the TeX forum at stackexchange may have a different requirements profile than you do.
Further reading, links and resources
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2017-09-09
Version:
- 1.1
Version history:
- 1.1 Add note on LaTeX
- 1.0 Completed to first live version
- 0.1 Material collected from previous assignments
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.