Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
393 changes: 393 additions & 0 deletions bash_class_assignment.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,393 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# `bash` practicals\n",
"\n",
"## Directory and file structure\n",
"\n",
"Using one command move to your home directory."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"cd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Change your directory on your local drive (mounted from virtualbox). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Confirm that you are in the correct location."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/malay\n"
]
}
],
"source": [
"pwd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Download the data required for the class. The files are `split` into two part because of GitHub limitation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"wget https://github.com/cb2edu/CB2-101-BioComp/raw/2020/01-Linux_101/data/linux_data.tar.xz.partaa\n",
"wget https://github.com/cb2edu/CB2-101-BioComp/raw/2020/01-Linux_101/data/linux_data.tar.xz.partab"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check that there are two files in the directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ls"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Join the parts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cat linux_data.tar.xz.parta* > linux_data.tar.xz\n",
"ls"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We don't need the parts anymore."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rm linux_data.tar.xz.parta*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unzip the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tar -xvJf linux_data.tar.xz"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using one command list the contents of the reference_data directory that is within the linux_data directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a new folder in `linux_data` called `selected_fastq`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copy over the Irrel_kd_2.subset.fq and Mov10_oe_2.subset.fq from raw_fastq to the linux_lesson/selected_fastq folder"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rename the `selected_fastq`folder and call it `exercise1`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Wildcards"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do each of the following using a single ls command without navigating to a different directory.\n",
"\n",
"1. List all of the files in /bin that start with the letter 'c'\n",
"2. List all of the files in /bin that contain the letter 'a'\n",
"3. List all of the files in /bin that end with the letter 'o'\n",
"4. BONUS: Using one command to list all of the files in /bin that contain either 'a' or 'c'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## History\n",
"\n",
"1. Checking the output of the history command, how many commands have you typed in so far?\n",
"2. Use the up arrow key to check the command you used before history command. What is it? Does it make sense?\n",
"3. Type several random characters on the command prompt. Can you bring the cursor to the start with Ctrl + A? Next, can you bring the cursor to the end with Ctrl + E? Finally, what happens when you use Ctrl + C?\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Files\n",
"\n",
"**Do the following in the terminal**\n",
"\n",
"1. Change directories into genomics_data. You can do this using a full or relative path.\n",
"2. Use the less command to open up the file Encode-hesc-Nanog.bed.\n",
"3. Search for the string chr11; you'll see all instances in the file highlighted.\n",
"4. Staying in the less buffer, use the shortcut to get to the end of the file. \n",
"5. Exit the less buffer and come back to the command prompt.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Searching files\n",
"\n",
"1. Using `find` command search for the sequence file `Mov10_oe_1.subset.fq`.\n",
"2. Search for the sequence CTCAATGAGCCA in Mov10_oe_1.subset.fq. How many sequences do you find?\n",
"3. If you want to search for that sequence in **all** Mov10 replicate fastq files, what command would you use?\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Searching and redirection\n",
"\n",
"How many unique exons are present on chromosome 1 using chr1-hg19_genes.gtf?\n",
"\n",
"1. Extract only the genomic coordinates of exon features\n",
"2. Subset dataset to only keep genomic coordinates\n",
"3. Remove duplicate exons\n",
"4. Count the total number of exons"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Shell scripts\n",
"\n",
"1. Write a script `listing.sh`. Add the command which prints to screen the contents of the file `Mov10_rnaseq_metadata.txt`.\n",
"2. Add an echo statement for the command, which tells the user \"This is information about the files in our dataset:\"\n",
"3. Run the new script. Report the contents of the new script and the output you got after running it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" ## Bash variables\n",
"\n",
"1. Use the `$file` variable as input to the head and tail commands, and modify the arguments to display only four lines from any file. \n",
"2. Create a new variable called meta and assign it the value Mov10_rnaseq_metadata.txt. For the following questions, use the $meta variable but do not change directories. Provide the code you would run to:\n",
" \n",
" a. Display the contents of the file using cat.\n",
" b. Retrieve only the lines which contain normal samples. (Hint: use grep)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
"## Filename\n",
"\n",
"1. How would you modify basename command above to only return Mov10_oe_1 from the filename `Mov10_oe_1.subset.fq`?\n",
"2. Use basename with the file Irrel_kd_1.subset.fq as input. Return only Irrel_kd_1 to the terminal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `for` loop\n",
"\n",
"Write a loop to print out the number of lines in each fasta file in the dataset. The output should look something like this:\n",
"```\n",
" Irrel_kd_1.subset.fq 891684\n",
" Irrel_kd_2.subset.fq 767072\n",
" Irrel_kd_3.subset.fq 586196\n",
" Mov10_oe_1.subset.fq 1223600\n",
" Mov10_oe_2.subset.fq 1110016\n",
" Mov10_oe_3.subset.fq 690816\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Bash",
"language": "bash",
"name": "bash"
},
"language_info": {
"codemirror_mode": "shell",
"file_extension": ".sh",
"mimetype": "text/x-sh",
"name": "bash"
}
},
"nbformat": 4,
"nbformat_minor": 4
}