GithubHelp home page GithubHelp logo

sbourgeat / brain_behavior Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 95.9 MB

Brain morphology and behaviour analyses

Jupyter Notebook 49.76% Python 0.07% R 0.01% HTML 50.12% Julia 0.01% SCSS 0.01% JavaScript 0.04% CSS 0.01%

brain_behavior's Introduction

brain_behavior

Brain morphology and behaviour analyses

  • I) Importing all the available data|Importing all the available data
  • II) Extract the data for males and females|Extract the data for males and females
  • III) Define the subset of phenotypes to use|Define the subset of phenotypes to use
  • IV) Find the studies ID|Find the studies ID
  • V) Extraction of the relevant behavioural data|Extraction of the relevant behavioural data

The repository to get all the scripts : https://github.com/Sbourgeat/brain_behavior

I- Extract all the relevant DGRP dataset

The script for this part is dgrpool_behav.R

Importing all the available data

data_all_pheno <- readRDS("/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/data.all_pheno_21_03_23_filtered.rds")

Web link https://github.com/DeplanckeLab/dgrpool

  • From the DGRPool GitHub, we downloaded the .rds file containing the information of all the experiments done with DGRP flies
    • A script is also given to get the most up to date version of the dataset

To reproduce exactly their data gathering steps see the following excerpt from their git page :

In order to be fully reproducible, we downloaded the phenotypes on the website at a given timepoint. The script used, download_phenotypes.R, access our API to download a "studies.json" file containing all metadata for each study. Then it uses the same API to download the phenotypes study by study, and format everything in a common format. It then generates a RDS file with a given timestamp, which is the common file used by all other methods, so that all scripts are using the same data, collected at a given timestamp. We here provided the data used in the latest version of the manuscript in data.all_pheno_21_03_23_filtered.rds, but you can run again download_phenotypes.R to generate a new RDS with the latest up-to-date phenotyping data.

Extract the data for males and females

# Extract the data for males, females and NA

data_male <- data_all_pheno[["M"]]

data_female <- data_all_pheno[["F"]]

data_na <- data_all_pheno[["NA"]]

Now we have all the data existing for the DGRP lines male, female, and NA. Here is an example of a table that can be found in data_male or data_female with the given column names (DGRP, study_id_1, study_id_2, study_id_...):

DGRP study_id_1 study_id_2 study_id_n
DGRP 1 Data 1 Data 2 Data 3
DGRP 2 Data 4 Data 5 Data 6
DGRP 3 Data 7 Data 8 Data 9

Define the subset of phenotypes to use

For this step, a .csv file is created defining the phenotypes to keep. Here is the head of the file used for this analysis

phenotypes type_of_behavior
FarPoint_Butanedione Olfactory
LocRatio_Butanedione Olfactory
Resp_Butanedion_30pc Olfactory

The exact names of the phenotypes were obtained by manual queries on the DGRPool website: https://dgrpool.epfl.ch/phenotypes

First, we generate a subset to only keep behavioural data from the data we gathered from the website.

#open phenotypes_to_use.csv

phenotypes_to_use <- read.csv("/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/brain_behavior/phenotypes_to_use.csv")

# drop the rows having a type_of_behavior different than olfactory, aggresive, and locomotor

phenotypes_to_use <- phenotypes_to_use[phenotypes_to_use$type_of_behavior == " olfactory" |

phenotypes_to_use$type_of_behavior == " aggresive" | phenotypes_to_use$type_of_behavior == " locomotor"

| phenotypes_to_use$type_of_behavior == " food"

| phenotypes_to_use$type_of_behavior == " sleep"

| phenotypes_to_use$type_of_behavior == " phototaxi",]

Then, we need to use phenotypes_to_use to filter the data objects for both males and females. However, we nee to find the id of the studies to be able to extract the relevant behavioural data from the .rds file.

Find the studies ID

To get the studies IDs, we need to extract information from https://dgrpool.epfl.ch/phenotypes.json?all=1 The JSON file is organised with the following headings:

  • ID: ID generated by download_phenotypes.R
  • Name: Phenotype name
  • Description: Description of the phenotype
  • Created at: Date of creation of the data entry
  • Updated at: Date of update of the data entry
  • Study ID: ID of the study download_phenotypes.R
  • Obsolete: Indicates if the phenotype is obsolete or no longer used
  • Number of lines in NBER data group: Number of data lines in the NBER data group associated with the phenotype
  • Number of male individuals: Number of individuals in the dataset that are classified as male
  • Number of female individuals: Number of individuals in the dataset that are classified as female
  • Number of individuals with unknown sex: Number of individuals in the dataset with unknown sex
  • Is summary: Indicates if the phenotype is a summary or aggregate measure
  • Is numeric: Indicates if the phenotype is a numeric value
  • Is continuous: Indicates if the phenotype is a continuous measure
  • Dataset ID: ID of the dataset where the phenotype data is stored
  • Summary type ID: ID indicating the type of summary associated with the phenotype
  • Unit ID: ID of the unit in which the phenotype is measured
  • Sex breakdown by data group: Breakdown of the number of males and females in each data group associated with the phenotype

Using the JSON file, we can search and find the studies ID by using their names and create a new table containing the study id and name.

library(jsonlite)

json_phenotypes <- fromJSON("https://dgrpool.epfl.ch/phenotypes.json?all=1")

json_phenotypes <- json_phenotypes[with(json_phenotypes, order(id)),]

rownames(json_phenotypes) <- json_phenotypes$id

message(nrow(json_phenotypes), " phenotypes found")

#print the head of json_phenotypes

head(json_phenotypes)  

#iterate through the phenotypes and if the value after "name" is in phenotypes_to_use.csv then add the id in a list called list_id and add name in a name list

list_id <- c()

name <- c()

list_id <- list()

for (i in 1:nrow(json_phenotypes)) {

if (json_phenotypes[i, "name"] %in% phenotypes_to_use$phenotype) {

list_id <- c(list_id, json_phenotypes[i, "id"])

name <- c(name, json_phenotypes[i, "name"])

}

}

list_id <- unlist(list_id)

print(name)

print(list_id)

# create a new dataframe with the transpose of list_id and name

phenotypes_for_analysis <- data.frame(list_id, name)

# write the dataframe phenotypes_for_analysis to a csv file, without the rows index

write.csv(phenotypes_for_analysis, "/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/brain_behavior/phenotypes_for_analysis.csv", row.names = F)

The phenotypes_for_analysis table looks like that:

list_id name
1311 mn_RespBenzaldeh_3_5
1312 mn_RespAcetophen_3_5
1313 mn_RespHexanol_3_5
1314 mn_RespHexanol_0_3
1316 mn_Resp_Hexanal
1317 mn_Resp_Citral

Extraction of the relevant behavioural data

To finally extract the data we want, we need only need to change the columns names to match the id by only keeping the last 4 characters. The following code does that and filter the data as we want.

# change the colnames of data_male to keep only the end 4 characters which are the actual id of the phenotypes

# Get the current column names

current_names <- colnames(data_male)

# Create new column names by keeping only the last four characters

new_names <- substr(current_names, nchar(current_names) - 3, nchar(current_names))

# Assign the new column names to the dataframe

colnames(data_male) <- new_names

print(colnames(data_male))

# change the colnames of data_female to keep only the end 4 characters

# Get the current column names

current_names <- colnames(data_female)

# Create new column names by keeping only the last four characters

new_names <- substr(current_names, nchar(current_names) - 3, nchar(current_names))

# Assign the new column names to the dataframe

colnames(data_female) <- new_names

print(colnames(data_female))

# change the colnames of data_na to keep only the end 4 characters

# Get the current column names

current_names <- colnames(data_na)

# Create new column names by keeping only the last four characters

new_names <- substr(current_names, nchar(current_names) - 3, nchar(current_names))

# Assign the new column names to the dataframe

colnames(data_na) <- new_names

print(colnames(data_na))

#write data_male as csv file

write.csv(data_male, "/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/brain_behavior/data_male.csv", row.names = F)

#write data_female as csv file

write.csv(data_female, "/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/brain_behavior/data_female.csv", row.names = F)

#write data_na as csv file

write.csv(data_na, "/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/brain_behavior/data_na.csv", row.names = F)

Merging morphological and behavioural data

**The script for this part is `extract_behaviour.py This code performs several data manipulation tasks on multiple datasets and saves the results as separate CSV files. Here is a summary of what the code does:

  1. The code imports the data from two CSV files, 'data_male.csv' and 'data_female.csv', which are stored as separate dataframes: data_male and data_female.
data_male = pd.read_csv("data_male.csv")
data_female = pd.read_csv("data_female.csv")
  1. It imports another dataset from 'entropy_vol_sep2023.csv' and stores it as a dataframe called vol_entropy.
vol_entropy = pd.read_csv("entropy_vol_sep2023.csv")
  1. The code normalizes the column names in the vol_entropy dataframe by changing the column name 'DGRP' to 'dgrp'. If the length of an item in the 'dgrp' column is 2, it adds 'DGRP_0' as a prefix. Otherwise, it adds 'DGRP_' as a prefix. The resulting dataframe is returned as vol_entropy.
vol_entropy['dgrp'] = vol_entropy['DGRP'].apply(lambda x: 'DGRP_0' + str(x) if len(str(x)) == 2 else 'DGRP_'+ str(x))
  1. The vol_entropy dataframe is then separated into two separate dataframes: male and female, based on the 'Sex' column.
male = vol_entropy[vol_entropy['Sex'] == 'male']
female = vol_entropy[vol_entropy['Sex'] == 'female']
  1. The data_male dataframe is merged with the male dataframe based on the 'dgrp' column. The resulting merged dataframe is returned as merged_data_male.
merged_data_male = pd.merge(data_male, male, on='dgrp')
  1. Similarly, the data_female dataframe is merged with the female dataframe based on the 'dgrp' column. The resulting merged dataframe is returned as merged_data_female.
merged_data_female = pd.merge(data_female, female, on='dgrp')
  1. Finally, the merged_data_male and merged_data_female dataframes are saved as separate CSV files named 'dgrpool_brain_behavior_male.csv' and 'dgrpool_brain_behavior_female.csv', respectively

Statistical analyses

**The script for this part is `behaviour_analysis.py

This code performs various computations and visualizations on a merged dataframe that combines brain and behavior data. Here's a breakdown of what each function does:

  1. split_string_with_dgrp(df): This function takes a dataframe as input and splits the 'genotype' column into two separate lists: DGRP and sex. It iterates over each row in the dataframe and checks if the 'genotype' column contains the string 'dgrp'. If it does, it extracts the value and appends it to the DGRP list. It also extracts the corresponding value from the 'sex' column and appends it to the sex list. Finally, it returns the DGRP and sex lists.

  2. merge_data(brain, behav, DGRP, sex): This function merges the brain and behavior data based on the DGRP and sex values. It selects rows from the behavior dataframe where the 'genotype' column is in the DGRP list, the 'sex' column is in the sex list, and the 'head_scanned' column is True. It then applies some modifications to the 'genotype' column values in both dataframes to ensure consistency. It renames the 'genotype' column in the behavior dataframe to 'DGRP' and performs the merge operation on the 'DGRP' and 'sex' columns. Finally, it returns the merged dataframe.

  3. calculate_pvalues(df): This function calculates the p-values for the correlation matrix of a dataframe. It creates an empty dataframe with the same columns as the input dataframe. It then iterates over all pairs of columns in the input dataframe and calculates the p-value for their correlation using the pearsonr function from the scipy.stats library. The p-value is rounded to 4 decimal places and stored in the corresponding cell of the output dataframe. Finally, it returns the p-values dataframe.

The code also loads two CSV files, summary.csv and vol_hratio.csv, into the behav and brain dataframes, respectively. It calls the split_string_with_dgrp function to extract the DGRP and sex lists from the behav dataframe. It then calls the merge_data function to merge the brain and behav dataframes based on the DGRP and sex values, resulting in the merged_df dataframe.

# -*- coding: utf-8 -*-

#!/usr/bin/env python3

  

import pandas as pd

import plotly.graph_objects as go

from scipy.stats import pearsonr

  

def split_string_with_dgrp(df):

"""

This function takes a dataframe as input and splits the 'genotype' column into two separate lists: DGRP and sex.

Parameters:

df (DataFrame): The input dataframe containing the 'genotype' and 'sex' columns.

Returns:

DGRP (list): A list of DGRP values extracted from the 'genotype' column.

sex (list): A list of sex values extracted from the 'sex' column.

"""

DGRP=[]

sex = []

for i in range(len(df)):

if 'dgrp' in df.iloc[i, 1]:

DGRP.append(df.iloc[i, 1])

s= df.iloc[i,2]

sex.append(s)

return DGRP,sex

  

def merge_data(brain, behav, DGRP, sex):

"""

This function merges the brain and behavior data based on the DGRP and sex values.

Parameters:

brain (DataFrame): The brain data dataframe.

behav (DataFrame): The behavior data dataframe.

DGRP (list): A list of DGRP values.

sex (list): A list of sex values.

Returns:

merged_df (DataFrame): The merged dataframe containing the brain and behavior data.

"""

data = behav[behav['genotype'].isin(DGRP) & behav['sex'].isin(sex) & behav["head_scanned"]==True]

data['genotype'] = data['genotype'].apply(lambda x: 'DGRP_0' + x.split('dgrp')[1] if len(x.split('dgrp')[1]) == 2 else 'DGRP_' + x.split('dgrp')[1])

brain['DGRP'] = brain['DGRP'].apply(lambda x: 'DGRP_0' + x if len(x) == 2 else 'DGRP_'+ x)

  

data.rename(columns={'genotype': 'DGRP'}, inplace=True)

  

merged_df = pd.merge(brain, data, on=['DGRP', 'sex'])

return merged_df

  

def calculate_pvalues(df):

"""

This function calculates the p-values for the correlation matrix of a dataframe.

Parameters:

df (DataFrame): The input dataframe.

Returns:

pvalues (DataFrame): The dataframe containing the p-values for the correlation matrix.

"""

dfcols = pd.DataFrame(columns=df.columns)

pvalues = dfcols.transpose().join(dfcols, how='outer')

for r in df.columns:

for c in df.columns:

tmp = df[df[r].notnull() & df[c].notnull()]

pvalues[r][c] = round(pearsonr(tmp[r], tmp[c])[1], 4)

return pvalues

  

behav = pd.read_csv("/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/summary.csv")

brain = pd.read_csv("/Users/skumar/Project/PHD_work/GWAS/dataset/vol_hratio.csv", sep=",")

  

DGRP,sex = split_string_with_dgrp(behav)

merged_df = merge_data(brain, behav, DGRP, sex)

  

correlation_matrix = merged_df[["abs_volume","h_ratio","activity","correct_choices","frac_time_on_shocked"]].corr()

"""

fig = px.imshow(correlation_matrix)

fig.show()

  

fig = px.imshow(calculate_pvalues(merged_df[["abs_volume","h_ratio","activity","correct_choices","frac_time_on_shocked"]]))

fig.show()

"""

  

p_values = calculate_pvalues(merged_df[["abs_volume", "h_ratio", "activity", "correct_choices", "frac_time_on_shocked"]])

  

fig = go.Figure(data=go.Heatmap(

z=correlation_matrix.values,

x=correlation_matrix.columns,

y=correlation_matrix.columns,

colorscale="Viridis",

colorbar=dict(title="Correlation Coefficient")

))

  

annotations = []

for i, row in enumerate(correlation_matrix.values):

for j, value in enumerate(row):

annotations.append(

dict(

x=correlation_matrix.columns[j],

y=correlation_matrix.columns[i],

text=f"p-value: {p_values.iloc[i, j]:.3f}",

showarrow=False,

font=dict(color="white" if abs(value) > 0.5 else "black")

)

)

  

fig.update_layout(

title="Correlation Coefficient and p-values",

annotations=annotations,

xaxis=dict(title="Variable"),

yaxis=dict(title="Variable"),

)

  

fig.show()

The output are stored as HTML plots and can be seen in the folder results.

brain_behavior's People

Contributors

sbourgeat avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.