Difference between revisions of "FND-Biocomputing setup"
m (Created page with "<div id="BIO"> <div class="b1"> Computer Setup for Biocomputing </div> {{Vspace}} <div class="keywords"> <b>Keywords:</b> Setup for biocomputing: Xcode, R and ...") |
m |
||
Line 8: | Line 8: | ||
<div class="keywords"> | <div class="keywords"> | ||
<b>Keywords:</b> | <b>Keywords:</b> | ||
− | Setup for biocomputing: Xcode, R and RStudio, python, homebrew, TeX ... | + | Paths, folders and files; Project Directory; Setup for biocomputing: Xcode, R and RStudio, python, homebrew, TeX ... |
</div> | </div> | ||
Line 19: | Line 19: | ||
− | {{ | + | {{LIVE}} |
{{Vspace}} | {{Vspace}} | ||
Line 27: | Line 27: | ||
<div id="ABC-unit-framework"> | <div id="ABC-unit-framework"> | ||
== Abstract == | == Abstract == | ||
+ | <section begin=abstract /> | ||
<!-- included from "../components/FND-Biocomputing_setup.components.wtxt", section: "abstract" --> | <!-- included from "../components/FND-Biocomputing_setup.components.wtxt", section: "abstract" --> | ||
− | ... | + | Some considerations are required to turn your laptop into an effective tool for biocomputing tasks. This includes consistent principles for organizing files and folders, and availability of tools to create, install, and deploy software. This unit introduces those concepts. |
+ | <section end=abstract /> | ||
{{Vspace}} | {{Vspace}} | ||
Line 44: | Line 46: | ||
=== Objectives === | === Objectives === | ||
<!-- included from "../components/FND-Biocomputing_setup.components.wtxt", section: "objectives" --> | <!-- included from "../components/FND-Biocomputing_setup.components.wtxt", section: "objectives" --> | ||
− | ... | + | This unit will ... |
+ | * ... inform you about file- and folder names and paths; | ||
+ | * ... outline a basic set of software tools that are useful. | ||
{{Vspace}} | {{Vspace}} | ||
Line 51: | Line 55: | ||
=== Outcomes === | === Outcomes === | ||
<!-- included from "../components/FND-Biocomputing_setup.components.wtxt", section: "outcomes" --> | <!-- included from "../components/FND-Biocomputing_setup.components.wtxt", section: "outcomes" --> | ||
− | ... | + | After working through this unit you ... |
+ | * ... can correctly identify (and write) file names with extensions, and file paths, on your computer; | ||
+ | * ... have created a Project Directory for this course or workshop on your computer; | ||
+ | * ... are able to to further configure your computer for biocomputing tasks. | ||
{{Vspace}} | {{Vspace}} | ||
Line 61: | Line 68: | ||
*<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. | *<b>Time management</b>: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit. | ||
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" --> | <!-- included from "ABC-unit_components.wtxt", section: "deliverables-journal" --> | ||
− | *<b>Journal</b>: Document your progress in your [[FND-Journal| | + | *<b>Journal</b>: Document your progress in your [[FND-Journal|Course Journal]]. Some tasks may ask you to include specific items in your journal. Don't overlook these. |
<!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" --> | <!-- included from "ABC-unit_components.wtxt", section: "deliverables-insights" --> | ||
− | *<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|insights! page]]. | + | *<b>Insights</b>: If you find something particularly noteworthy about this unit, make a note in your [[ABC-Insights|'''insights!''' page]]. |
{{Vspace}} | {{Vspace}} | ||
Line 81: | Line 88: | ||
== Contents == | == Contents == | ||
<!-- included from "../components/FND-Biocomputing_setup.components.wtxt", section: "contents" --> | <!-- included from "../components/FND-Biocomputing_setup.components.wtxt", section: "contents" --> | ||
− | ... | + | |
+ | ==Creating and consuming== | ||
+ | |||
+ | Over the last decade the central paradigm of how we work with computing devices has changed for a large fraction of our day to day activities. Our previous, open and effective data-centric model of computation - i.e. storing data and then transforming it through various interoperable tools - has largely been replaced by a landscape where the software ecosystems consists of apps that lock down the data, aim to own the user experience, and thus find ways to monetize it. We are now in an application-centric era of computing. That is in itself not suprising - after all, much of this functionality is free, and you must realize that if you use free tools, you are not the customer, you are the product. However, this development is <b>not</b> helpful for scientific computing and you have to make some effort to go beyond these constraints that are designed to make users convenient consumers of data and services. | ||
+ | |||
+ | You need to become creators instead. | ||
+ | |||
+ | You need to take control of how your computer is organized, and how you work with it. | ||
+ | |||
+ | {{Vspace}} | ||
+ | |||
+ | ==Paths, Folders and Files== | ||
+ | |||
+ | The first step of organizing your computer is to obtain a good awareness of files, folders (or directories) that contain them, and how files are identified. A file is some stored information that is uniquely identified in a catalogue called a "filesystem". Files can be data, like documents, music or movies, files can be applications (yes, computer programs are also data - a set of machine-level instructions), and your computer can also have subsystems that look like files, because one can read from them and write to them - but they actually handle the I/O (input / output) of your computer: things like keyboards and printers, or sockets for communicating with other computers. | ||
+ | |||
+ | Organizing files means giving them good names and placing them into folders where they are easy to find. | ||
+ | |||
+ | A '''filename''' is a label that identifies a file. Often filenames have two parts: the actual name, and an extension. To specify a file on the computer's command line, or when working with '''R''', you need to specify its full name <u>including the extension</u>. Now, the problem is that you can switch off the display of extensions in Windows; I'm afraid this is actually done by default. This means you don't see what the file is <b>actually</b> called, lest you be frightened by a <code>.jpg</code> or a <code>.mp4</code> suffix to the name. But then all hell breaks lose when you are trying to do "real" work. Files can't be found, or worse, can be inadvertently overwritten. '''Never allow your operating system to hide file extensions from you.''' You must be able to see the full name<ref>RStudio is actually very helpful in this regard, since it always shows you the full name of your file in its file-pane, and it always also shows you the "hidden" files that your operating system does not show to you, lest they hurt our little brains.</ref>. | ||
+ | |||
+ | A '''path''' is the complete specification of where a file is located in the hierarchically organized directory tree of your computer. Paths are simply directories strung together into a long string, separated by a forward slash "<code>/</code>" (on Mac or Unix) or a backslash "<code>\</code>" on Windows. | ||
+ | |||
+ | ;Folder name and path examples | ||
+ | *<span class="right"> <tt>/Users/Pierette/Documents/BCB420</tt></span> ◁ Looking good on a Mac or a Linux system. | ||
+ | *<span class="right"> <tt>C:\Users\Pulcinella\Documents\CBW</tt></span> ◁ Looking good on a Windows computer. | ||
+ | |||
+ | The "top level directory" is the letter of the drive followed by "<code>:\</code>" on Windows computers, and a simple forward-slash "<code>/</code>" on Mac and Unix computers. All other directories are "sub-directories". Note that you can't tell from a directory listing alone whether e.g. "<code>Users</code>" is a directory or a file. The operating system will usually identify this with an icon, and R has different commands to differentiate the two<ref><code>list.dirs()</code>and <code>list.files()</code>.</ref>. | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | {{Vspace}} | ||
+ | |||
+ | It's really useful to get into a consistent habit of giving your files a meaningful name. The name should include something that tells you what the file contains, and something that tells you the date or version. I give versions major and minor numbers, and - knowing how much things always change - I write major version numbers with a leading zero eg. <code>04</code> so that they will be correctly sorted by name in a directory listing. The same goes for dates: always write <code>YYYY-MM-DD</code> to ensure proper sorting. | ||
+ | |||
+ | In my experience, it is better to organize file hierarchies <b>wide, not deep</b>. This means I aim to put more things in one folder rather than create elaborate directory structures. I need to look for stuff a lot, and looking more-or-less in the same folder keeps my files more visible. Files that are tucked away in sub-directories are harder to find. And to avoid having very, very, very many subdirectories in one place, you should consider adding an <code>99-Archive</code> folder (the <code>99-...</code> prefix keeps it sorted at the bottom of the directory listing, and move directories that you keep only for reference into there. | ||
+ | |||
+ | One more thing, one golden rule that you should make every effort to adhere to: <b>don't store the same contents in more than one place</b>. In the best case this is merely unnecessarily needlessly redundant, but in the common worst case the two copies will go out of sync. If you need to have a file in two different folder, keep it in only one folder and put an "alias" into the other. On the Mac, you select a file or folder and <option><command><drag> it to a new location to create an alias, or hit <command>L; On Windows - ???. | ||
+ | |||
+ | {{Vspace}} | ||
+ | |||
+ | ==Project Directory== | ||
+ | |||
+ | Files for this course (or workshop) should all be in one "Project Directory". | ||
+ | |||
+ | {{task|1= | ||
+ | Create a folder (directory) on your computer in which to keep materials for this course (or workshop). Put it into the right place, and give it the right name: | ||
+ | |||
+ | :'''The right place''' is directly in the <code>Documents</code> folder of your account. | ||
+ | |||
+ | :'''The right name''' is simply the <code><Coursecode></code> e.g. for a CBW workshop in 2016, you call the folder <code>CBW</code>, for a BCH441 course, the name should be <code>BCH441</code>, or you could use my generic "ABC" (A Bioinformatics Course). Keep it short, but specific. | ||
+ | |||
+ | <b>Do not use spaces, hyphens, or any other special characters in your filename</b> - we have encountered various problems with such filenames in the past.<ref>'''After''' the course, you can rename / move the directory to whatever, wherever you want, but during the course, we need your files in a predictable location to be able to troubleshoot problems.</ref>. | ||
+ | |||
+ | {{Vspace}} | ||
+ | |||
+ | We will refer to this folder as the '''Project Directory'''. (I use the words "folder" and "directory" synonymously and completely interchangeably.) | ||
+ | |||
+ | }} | ||
+ | |||
+ | |||
+ | {{Vpace}} | ||
+ | |||
+ | ==Biocomputing tools== | ||
+ | |||
+ | There are a number of tools you will commonly find on a professionally configured computer: | ||
+ | |||
+ | ;A commandline interface | ||
+ | :While graphical user interfaces (GUI) are very helpful for interactive work, they (generally) can't be scripted and thus are an obstacle to high-throughput- and repetitive tasks. A commandline interface allows more expressive commands and easy scripting. For Mac and Linux users, your systems have terminal applications that make the underlying unix commands available. On the Mac, find "Terminal in your Applications/Utilities folder and put it in the dock. For windows users there is <code>cmd.exe</code> - but the command set is very different and that requires you to learn yet another language. The better solution is to instal {{WP|Cygwin}}, which creates a unix shell that interacts with the Windows operating system. | ||
+ | |||
+ | ;A package manager | ||
+ | :Complex software uses other software, such as libraries for graphics, numerical methods, or security and such libraries are "dependencies" of the code. Often dependencies need a specific version to work with a particular software. Especially when code needs to be compiled from source-code because it is not available as a pre-compiled bundel, dependencies need to be updated in very specific ways. Package managers to the rescue. These programs have validated recipes on how to install, maintain and update software. On the mac, the go-to system is {{WP|Homebrew}}. On Linux, this depends on which windows version you are running. On Windows I have heard of "chocolatey" and "scoop"; can't give a recommendation though. | ||
+ | |||
+ | ;A version control system | ||
+ | :Creating assets for reproducible research requires version control, i.e. the ability to document when what change was made, and to return to previous versions if necessary. The current go-to tool for this is {{WP|Git}}, which is especially useful since it interfaces with {{WP|GitHub}} | ||
+ | |||
+ | ;Programming languages | ||
+ | :Besides R, you need to be able to compile C- and C++ code, you will need a working version of Java, and you should have a recent Python installation. | ||
+ | |||
+ | {{Vspace}} | ||
+ | |||
+ | ===OS X=== | ||
+ | |||
+ | A useful guide to configuring your system is posted [http://www.benjack.io/2016/01/02/el-capitan-biocomputing.html '''here''']. | ||
+ | |||
+ | {{Vspace}} | ||
+ | |||
+ | ===Windows=== | ||
+ | |||
+ | <small>I don't know about a good, current guide for setting up Windows computers for biocomputing. If you have specific experience and advice, let's collect it here.</small> | ||
+ | |||
+ | |||
+ | {{Vspace}} | ||
+ | |||
+ | ===Linux=== | ||
+ | |||
+ | I mention Linux rarely because I find that people who work on a Linux platform already know what they are doing. Yay. Ask, if you have specific questions. | ||
+ | |||
+ | |||
+ | |||
{{Vspace}} | {{Vspace}} | ||
Line 150: | Line 255: | ||
:2017-08-05 | :2017-08-05 | ||
<b>Modified:</b><br /> | <b>Modified:</b><br /> | ||
− | :2017- | + | :2017-09-07 |
<b>Version:</b><br /> | <b>Version:</b><br /> | ||
− | :0 | + | :1.0 |
<b>Version history:</b><br /> | <b>Version history:</b><br /> | ||
− | *0.1 | + | *1.0 Completed to first live version |
+ | *0.1 Material collected from previous assignments | ||
</div> | </div> | ||
[[Category:ABC-units]] | [[Category:ABC-units]] |
Revision as of 16:03, 9 September 2017
Computer Setup for Biocomputing
Keywords: Paths, folders and files; Project Directory; Setup for biocomputing: Xcode, R and RStudio, python, homebrew, TeX ...
Contents
Abstract
Some considerations are required to turn your laptop into an effective tool for biocomputing tasks. This includes consistent principles for organizing files and folders, and availability of tools to create, install, and deploy software. This unit introduces those concepts.
This unit ...
Prerequisites
This unit has no prerequisites.
Objectives
This unit will ...
- ... inform you about file- and folder names and paths;
- ... outline a basic set of software tools that are useful.
Outcomes
After working through this unit you ...
- ... can correctly identify (and write) file names with extensions, and file paths, on your computer;
- ... have created a Project Directory for this course or workshop on your computer;
- ... are able to to further configure your computer for biocomputing tasks.
Deliverables
- Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.
- Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don't overlook these.
- Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.
Evaluation
Evaluation: NA
- This unit is not evaluated for course marks.
Contents
Creating and consuming
Over the last decade the central paradigm of how we work with computing devices has changed for a large fraction of our day to day activities. Our previous, open and effective data-centric model of computation - i.e. storing data and then transforming it through various interoperable tools - has largely been replaced by a landscape where the software ecosystems consists of apps that lock down the data, aim to own the user experience, and thus find ways to monetize it. We are now in an application-centric era of computing. That is in itself not suprising - after all, much of this functionality is free, and you must realize that if you use free tools, you are not the customer, you are the product. However, this development is not helpful for scientific computing and you have to make some effort to go beyond these constraints that are designed to make users convenient consumers of data and services.
You need to become creators instead.
You need to take control of how your computer is organized, and how you work with it.
Paths, Folders and Files
The first step of organizing your computer is to obtain a good awareness of files, folders (or directories) that contain them, and how files are identified. A file is some stored information that is uniquely identified in a catalogue called a "filesystem". Files can be data, like documents, music or movies, files can be applications (yes, computer programs are also data - a set of machine-level instructions), and your computer can also have subsystems that look like files, because one can read from them and write to them - but they actually handle the I/O (input / output) of your computer: things like keyboards and printers, or sockets for communicating with other computers.
Organizing files means giving them good names and placing them into folders where they are easy to find.
A filename is a label that identifies a file. Often filenames have two parts: the actual name, and an extension. To specify a file on the computer's command line, or when working with R, you need to specify its full name including the extension. Now, the problem is that you can switch off the display of extensions in Windows; I'm afraid this is actually done by default. This means you don't see what the file is actually called, lest you be frightened by a .jpg
or a .mp4
suffix to the name. But then all hell breaks lose when you are trying to do "real" work. Files can't be found, or worse, can be inadvertently overwritten. Never allow your operating system to hide file extensions from you. You must be able to see the full name[1].
A path is the complete specification of where a file is located in the hierarchically organized directory tree of your computer. Paths are simply directories strung together into a long string, separated by a forward slash "/
" (on Mac or Unix) or a backslash "\
" on Windows.
- Folder name and path examples
- /Users/Pierette/Documents/BCB420 ◁ Looking good on a Mac or a Linux system.
- C:\Users\Pulcinella\Documents\CBW ◁ Looking good on a Windows computer.
The "top level directory" is the letter of the drive followed by ":\
" on Windows computers, and a simple forward-slash "/
" on Mac and Unix computers. All other directories are "sub-directories". Note that you can't tell from a directory listing alone whether e.g. "Users
" is a directory or a file. The operating system will usually identify this with an icon, and R has different commands to differentiate the two[2].
It's really useful to get into a consistent habit of giving your files a meaningful name. The name should include something that tells you what the file contains, and something that tells you the date or version. I give versions major and minor numbers, and - knowing how much things always change - I write major version numbers with a leading zero eg. 04
so that they will be correctly sorted by name in a directory listing. The same goes for dates: always write YYYY-MM-DD
to ensure proper sorting.
In my experience, it is better to organize file hierarchies wide, not deep. This means I aim to put more things in one folder rather than create elaborate directory structures. I need to look for stuff a lot, and looking more-or-less in the same folder keeps my files more visible. Files that are tucked away in sub-directories are harder to find. And to avoid having very, very, very many subdirectories in one place, you should consider adding an 99-Archive
folder (the 99-...
prefix keeps it sorted at the bottom of the directory listing, and move directories that you keep only for reference into there.
One more thing, one golden rule that you should make every effort to adhere to: don't store the same contents in more than one place. In the best case this is merely unnecessarily needlessly redundant, but in the common worst case the two copies will go out of sync. If you need to have a file in two different folder, keep it in only one folder and put an "alias" into the other. On the Mac, you select a file or folder and <option><command><drag> it to a new location to create an alias, or hit <command>L; On Windows - ???.
Project Directory
Files for this course (or workshop) should all be in one "Project Directory".
Task:
Create a folder (directory) on your computer in which to keep materials for this course (or workshop). Put it into the right place, and give it the right name:
- The right place is directly in the
Documents
folder of your account.
- The right name is simply the
<Coursecode>
e.g. for a CBW workshop in 2016, you call the folderCBW
, for a BCH441 course, the name should beBCH441
, or you could use my generic "ABC" (A Bioinformatics Course). Keep it short, but specific.
Do not use spaces, hyphens, or any other special characters in your filename - we have encountered various problems with such filenames in the past.[3].
We will refer to this folder as the Project Directory. (I use the words "folder" and "directory" synonymously and completely interchangeably.)
Biocomputing tools
There are a number of tools you will commonly find on a professionally configured computer:
- A commandline interface
- While graphical user interfaces (GUI) are very helpful for interactive work, they (generally) can't be scripted and thus are an obstacle to high-throughput- and repetitive tasks. A commandline interface allows more expressive commands and easy scripting. For Mac and Linux users, your systems have terminal applications that make the underlying unix commands available. On the Mac, find "Terminal in your Applications/Utilities folder and put it in the dock. For windows users there is
cmd.exe
- but the command set is very different and that requires you to learn yet another language. The better solution is to instal Cygwin, which creates a unix shell that interacts with the Windows operating system.
- A package manager
- Complex software uses other software, such as libraries for graphics, numerical methods, or security and such libraries are "dependencies" of the code. Often dependencies need a specific version to work with a particular software. Especially when code needs to be compiled from source-code because it is not available as a pre-compiled bundel, dependencies need to be updated in very specific ways. Package managers to the rescue. These programs have validated recipes on how to install, maintain and update software. On the mac, the go-to system is Homebrew. On Linux, this depends on which windows version you are running. On Windows I have heard of "chocolatey" and "scoop"; can't give a recommendation though.
- A version control system
- Creating assets for reproducible research requires version control, i.e. the ability to document when what change was made, and to return to previous versions if necessary. The current go-to tool for this is Git, which is especially useful since it interfaces with GitHub
- Programming languages
- Besides R, you need to be able to compile C- and C++ code, you will need a working version of Java, and you should have a recent Python installation.
OS X
A useful guide to configuring your system is posted here.
Windows
I don't know about a good, current guide for setting up Windows computers for biocomputing. If you have specific experience and advice, let's collect it here.
Linux
I mention Linux rarely because I find that people who work on a Linux platform already know what they are doing. Yay. Ask, if you have specific questions.
Further reading, links and resources
Notes
- ↑ RStudio is actually very helpful in this regard, since it always shows you the full name of your file in its file-pane, and it always also shows you the "hidden" files that your operating system does not show to you, lest they hurt our little brains.
- ↑
list.dirs()
andlist.files()
. - ↑ After the course, you can rename / move the directory to whatever, wherever you want, but during the course, we need your files in a predictable location to be able to troubleshoot problems.
Self-evaluation
If in doubt, ask! If anything about this learning unit is not clear to you, do not proceed blindly but ask for clarification. Post your question on the course mailing list: others are likely to have similar problems. Or send an email to your instructor.
About ...
Author:
- Boris Steipe <boris.steipe@utoronto.ca>
Created:
- 2017-08-05
Modified:
- 2017-09-07
Version:
- 1.0
Version history:
- 1.0 Completed to first live version
- 0.1 Material collected from previous assignments
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.