Load the DNA sequence fishes.fna.gz
using functions from the seqinr
package and the Biostrings
package.
Note the differences between the created variables.
Next, focus on the Biostrings
package. Practice working with loaded data:
- Check the number of loaded sequences:
length(seq)
- Determine the lengths of each sequence:
width(seq[1])
- View the sequence names (FASTA headers):
names(seq)
- Assign the first sequence including the name to the variable
seq1
:seq1 <- seq[1]
- Assign the first sequence without the name to the variable
seq1_sequence
:seq1_sequence <- seq[[1]]
- Assign the first sequence as a vector of characters to the variable
seq1_string
:seq1_string <- toString(seq[1])
- Learn more about the
XStringSet
class and theBiostrings
package:help(XStringSet)
Globally align the two selected sequences using the BLOSUM62 matrix, a gap opening cost of -1 and a gap extension cost of 1.
Practice working with regular expressions:
- Create a list of names, e.g.:
names_list <- c("anna", "jana", "kamil", "norbert", "pavel", "petr", "stanislav", "zuzana")
- Search for name
jana
:grep("jana", names_list, perl = TRUE)
- Search for all names containing letter
n
at least once:grep("n+", names_list, perl = TRUE)
- Search for all names containing letters
nn
:grep("n{2}", names_list, perl = TRUE)
- Search for all names starting with
n
:grep("^n", names_list, perl = TRUE)
- Search for names
Anna
orJana
:grep("Anna|Jana", names_list, perl = TRUE)
- Search for names starting with
z
and ending witha
:grep("^z.*a$", names_list, perl=TRUE)
- Load an amplicon sequencing run from 454 Junior machine
fishes.fna.gz
. - Get a sequence of a sample (avoid if conditions), that is tagged by forward and reverse MID
ACGAGTGCGT
. - How many sequences are there in the sample?
Create a function demultiplexer()
for demultiplexing of sequencing data.
Function has four inputs:
- path to fasta file,
- a list of forward MIDs,
- a list of reverse MIDs,
- a list of samples labels.
The outputs of the function are:
- fasta files that are named after the samples and contain sequences of the sample without MIDs (perform MID trimming),
- table named
report.txt
containing samples‘ names and the number of sequences each sample has.
Check the functionality again on the fishes.fna.gz
file, the list of samples and MIDs can be found in the corresponding table fishes_MIDs.csv
.
Basic Git settings
- Configure the Git editor
git config --global core.editor notepad
- Configure your name and email address
git config --global user.name "Zuzana Nova" git config --global user.email [email protected]
- Check current settings
git config --global --list
-
Create a fork on your GitHub account. On the GitHub page of this repository find a Fork button in the upper right corner.
-
Clone forked repository from your GitHub page to your computer:
git clone <fork repository address>
- In a local repository, set new remote for a project repository:
git remote add upstream https://github.com/mpa-prg/exercise_02.git
Create a new commit and send new changes to your remote repository.
- Add file to a new commit.
git add <file_name>
- Create a new commit, enter commit message, save the file and close it.
git commit
- Send a new commit to your GitHub repository.
git push origin main