The exercise_02 from aleksandraslaby

Introduction to (R and) R/Bioconductor and regular expressions

Task 1

Load the DNA sequence fishes.fna.gz using functions from the seqinr package and the Biostrings package. Note the differences between the created variables.

Task 2

Next, focus on the Biostrings package. Practice working with loaded data:

Check the number of loaded sequences:
```
length(seq)
```
Determine the lengths of each sequence:
```
width(seq[1])
```
View the sequence names (FASTA headers):
```
names(seq)
```
Assign the first sequence including the name to the variable seq1:
```
seq1 <- seq[1]
```
Assign the first sequence without the name to the variable seq1_sequence:
```
seq1_sequence <- seq[[1]]
```
Assign the first sequence as a vector of characters to the variable seq1_string:
```
seq1_string <- toString(seq[1])
```
Learn more about the XStringSet class and the Biostrings package:
```
help(XStringSet)
```

Task 3

Globally align the two selected sequences using the BLOSUM62 matrix, a gap opening cost of -1 and a gap extension cost of 1.

Task 4

Practice working with regular expressions:

Create a list of names, e.g.:

names_list <-  c("anna", "jana", "kamil", "norbert", "pavel", "petr", "stanislav", "zuzana")

Search for name jana:
```
grep("jana", names_list, perl = TRUE)
```
Search for all names containing letter n at least once:
```
grep("n+", names_list, perl = TRUE)
```
Search for all names containing letters nn:
```
grep("n{2}", names_list, perl = TRUE)
```
Search for all names starting with n:
```
grep("^n", names_list, perl = TRUE)
```

Search for names Anna or Jana:

grep("Anna|Jana", names_list, perl = TRUE)

Search for names starting with z and ending with a:
```
grep("^z.*a$", names_list, perl=TRUE)
```

Task 5

Load an amplicon sequencing run from 454 Junior machine fishes.fna.gz.
Get a sequence of a sample (avoid if conditions), that is tagged by forward and reverse MID ACGAGTGCGT.
How many sequences are there in the sample?

Task 6

Create a function demultiplexer() for demultiplexing of sequencing data.

Function has four inputs:

path to fasta file,
a list of forward MIDs,
a list of reverse MIDs,
a list of samples labels.

The outputs of the function are:

fasta files that are named after the samples and contain sequences of the sample without MIDs (perform MID trimming),
table named report.txt containing samples‘ names and the number of sequences each sample has.

Check the functionality again on the fishes.fna.gz file, the list of samples and MIDs can be found in the corresponding table fishes_MIDs.csv.

Download files from GitHub

Basic Git settings

Configure the Git editor
git config --global core.editor notepad
Configure your name and email address
git config --global user.name "Zuzana Nova"
git config --global user.email [email protected]
Check current settings
git config --global --list

Create a fork on your GitHub account. On the GitHub page of this repository find a Fork button in the upper right corner.
Clone forked repository from your GitHub page to your computer:

git clone <fork repository address>

In a local repository, set new remote for a project repository:

git remote add upstream https://github.com/mpa-prg/exercise_02.git

Send files to GitHub

Create a new commit and send new changes to your remote repository.

Add file to a new commit.

git add <file_name>

Create a new commit, enter commit message, save the file and close it.

git commit

Send a new commit to your GitHub repository.

git push origin main

aleksandraslaby / exercise_02 Goto Github PK

exercise_02's Introduction

Introduction to (R and) R/Bioconductor and regular expressions

Task 1

Task 2

Task 3

Task 4

Task 5

Task 6

Download files from GitHub

Send files to GitHub

exercise_02's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs