brain_behavior

Brain morphology and behaviour analyses

I) Importing all the available data|Importing all the available data
II) Extract the data for males and females|Extract the data for males and females
III) Define the subset of phenotypes to use|Define the subset of phenotypes to use
IV) Find the studies ID|Find the studies ID
V) Extraction of the relevant behavioural data|Extraction of the relevant behavioural data

The repository to get all the scripts : https://github.com/Sbourgeat/brain_behavior

I- Extract all the relevant DGRP dataset

The script for this part is dgrpool_behav.R

Importing all the available data

data_all_pheno <- readRDS("/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/data.all_pheno_21_03_23_filtered.rds")

Web link https://github.com/DeplanckeLab/dgrpool

From the DGRPool GitHub, we downloaded the .rds file containing the information of all the experiments done with DGRP flies
- A script is also given to get the most up to date version of the dataset

To reproduce exactly their data gathering steps see the following excerpt from their git page :

In order to be fully reproducible, we downloaded the phenotypes on the website at a given timepoint. The script used, download_phenotypes.R, access our API to download a "studies.json" file containing all metadata for each study. Then it uses the same API to download the phenotypes study by study, and format everything in a common format. It then generates a RDS file with a given timestamp, which is the common file used by all other methods, so that all scripts are using the same data, collected at a given timestamp. We here provided the data used in the latest version of the manuscript in data.all_pheno_21_03_23_filtered.rds, but you can run again download_phenotypes.R to generate a new RDS with the latest up-to-date phenotyping data.

Extract the data for males and females

# Extract the data for males, females and NA

data_male <- data_all_pheno[["M"]]

data_female <- data_all_pheno[["F"]]

data_na <- data_all_pheno[["NA"]]

Now we have all the data existing for the DGRP lines male, female, and NA. Here is an example of a table that can be found in data_male or data_female with the given column names (DGRP, study_id_1, study_id_2, study_id_...):

DGRP	study_id_1	study_id_2	study_id_n
DGRP 1	Data 1	Data 2	Data 3
DGRP 2	Data 4	Data 5	Data 6
DGRP 3	Data 7	Data 8	Data 9

Define the subset of phenotypes to use

For this step, a .csv file is created defining the phenotypes to keep. Here is the head of the file used for this analysis

phenotypes	type_of_behavior
FarPoint_Butanedione	Olfactory
LocRatio_Butanedione	Olfactory
Resp_Butanedion_30pc	Olfactory

The exact names of the phenotypes were obtained by manual queries on the DGRPool website: https://dgrpool.epfl.ch/phenotypes

First, we generate a subset to only keep behavioural data from the data we gathered from the website.

#open phenotypes_to_use.csv

phenotypes_to_use <- read.csv("/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/brain_behavior/phenotypes_to_use.csv")

# drop the rows having a type_of_behavior different than olfactory, aggresive, and locomotor

phenotypes_to_use <- phenotypes_to_use[phenotypes_to_use$type_of_behavior == " olfactory" |

phenotypes_to_use$type_of_behavior == " aggresive" | phenotypes_to_use$type_of_behavior == " locomotor"

| phenotypes_to_use$type_of_behavior == " food"

| phenotypes_to_use$type_of_behavior == " sleep"

| phenotypes_to_use$type_of_behavior == " phototaxi",]

Then, we need to use phenotypes_to_use to filter the data objects for both males and females. However, we nee to find the id of the studies to be able to extract the relevant behavioural data from the .rds file.

Find the studies ID

To get the studies IDs, we need to extract information from https://dgrpool.epfl.ch/phenotypes.json?all=1 The JSON file is organised with the following headings:

ID: ID generated by download_phenotypes.R
Name: Phenotype name
Description: Description of the phenotype
Created at: Date of creation of the data entry
Updated at: Date of update of the data entry
Study ID: ID of the study download_phenotypes.R
Obsolete: Indicates if the phenotype is obsolete or no longer used
Number of lines in NBER data group: Number of data lines in the NBER data group associated with the phenotype
Number of male individuals: Number of individuals in the dataset that are classified as male
Number of female individuals: Number of individuals in the dataset that are classified as female
Number of individuals with unknown sex: Number of individuals in the dataset with unknown sex
Is summary: Indicates if the phenotype is a summary or aggregate measure
Is numeric: Indicates if the phenotype is a numeric value
Is continuous: Indicates if the phenotype is a continuous measure
Dataset ID: ID of the dataset where the phenotype data is stored
Summary type ID: ID indicating the type of summary associated with the phenotype
Unit ID: ID of the unit in which the phenotype is measured
Sex breakdown by data group: Breakdown of the number of males and females in each data group associated with the phenotype

Using the JSON file, we can search and find the studies ID by using their names and create a new table containing the study id and name.

library(jsonlite)

json_phenotypes <- fromJSON("https://dgrpool.epfl.ch/phenotypes.json?all=1")

json_phenotypes <- json_phenotypes[with(json_phenotypes, order(id)),]

rownames(json_phenotypes) <- json_phenotypes$id

message(nrow(json_phenotypes), " phenotypes found")

#print the head of json_phenotypes

head(json_phenotypes)  

#iterate through the phenotypes and if the value after "name" is in phenotypes_to_use.csv then add the id in a list called list_id and add name in a name list

list_id <- c()

name <- c()

list_id <- list()

for (i in 1:nrow(json_phenotypes)) {

if (json_phenotypes[i, "name"] %in% phenotypes_to_use$phenotype) {

list_id <- c(list_id, json_phenotypes[i, "id"])

name <- c(name, json_phenotypes[i, "name"])

}

}

list_id <- unlist(list_id)

print(name)

print(list_id)

# create a new dataframe with the transpose of list_id and name

phenotypes_for_analysis <- data.frame(list_id, name)

# write the dataframe phenotypes_for_analysis to a csv file, without the rows index

write.csv(phenotypes_for_analysis, "/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/brain_behavior/phenotypes_for_analysis.csv", row.names = F)

The phenotypes_for_analysis table looks like that:

list_id	name
1311	mn_RespBenzaldeh_3_5
1312	mn_RespAcetophen_3_5
1313	mn_RespHexanol_3_5
1314	mn_RespHexanol_0_3
1316	mn_Resp_Hexanal
1317	mn_Resp_Citral

Extraction of the relevant behavioural data

To finally extract the data we want, we need only need to change the columns names to match the id by only keeping the last 4 characters. The following code does that and filter the data as we want.

# change the colnames of data_male to keep only the end 4 characters which are the actual id of the phenotypes

# Get the current column names

current_names <- colnames(data_male)

# Create new column names by keeping only the last four characters

new_names <- substr(current_names, nchar(current_names) - 3, nchar(current_names))

# Assign the new column names to the dataframe

colnames(data_male) <- new_names

print(colnames(data_male))

# change the colnames of data_female to keep only the end 4 characters

# Get the current column names

current_names <- colnames(data_female)

# Create new column names by keeping only the last four characters

new_names <- substr(current_names, nchar(current_names) - 3, nchar(current_names))

# Assign the new column names to the dataframe

colnames(data_female) <- new_names

print(colnames(data_female))

# change the colnames of data_na to keep only the end 4 characters

# Get the current column names

current_names <- colnames(data_na)

# Create new column names by keeping only the last four characters

new_names <- substr(current_names, nchar(current_names) - 3, nchar(current_names))

# Assign the new column names to the dataframe

colnames(data_na) <- new_names

print(colnames(data_na))

#write data_male as csv file

write.csv(data_male, "/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/brain_behavior/data_male.csv", row.names = F)

#write data_female as csv file

write.csv(data_female, "/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/brain_behavior/data_female.csv", row.names = F)

#write data_na as csv file

write.csv(data_na, "/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/brain_behavior/data_na.csv", row.names = F)

Merging morphological and behavioural data

**The script for this part is `extract_behaviour.py This code performs several data manipulation tasks on multiple datasets and saves the results as separate CSV files. Here is a summary of what the code does:

The code imports the data from two CSV files, 'data_male.csv' and 'data_female.csv', which are stored as separate dataframes: data_male and data_female.

data_male = pd.read_csv("data_male.csv")
data_female = pd.read_csv("data_female.csv")

It imports another dataset from 'entropy_vol_sep2023.csv' and stores it as a dataframe called vol_entropy.

vol_entropy = pd.read_csv("entropy_vol_sep2023.csv")

The code normalizes the column names in the vol_entropy dataframe by changing the column name 'DGRP' to 'dgrp'. If the length of an item in the 'dgrp' column is 2, it adds 'DGRP_0' as a prefix. Otherwise, it adds 'DGRP_' as a prefix. The resulting dataframe is returned as vol_entropy.

vol_entropy['dgrp'] = vol_entropy['DGRP'].apply(lambda x: 'DGRP_0' + str(x) if len(str(x)) == 2 else 'DGRP_'+ str(x))

The vol_entropy dataframe is then separated into two separate dataframes: male and female, based on the 'Sex' column.

male = vol_entropy[vol_entropy['Sex'] == 'male']
female = vol_entropy[vol_entropy['Sex'] == 'female']

The data_male dataframe is merged with the male dataframe based on the 'dgrp' column. The resulting merged dataframe is returned as merged_data_male.

merged_data_male = pd.merge(data_male, male, on='dgrp')

Similarly, the data_female dataframe is merged with the female dataframe based on the 'dgrp' column. The resulting merged dataframe is returned as merged_data_female.

merged_data_female = pd.merge(data_female, female, on='dgrp')

Finally, the merged_data_male and merged_data_female dataframes are saved as separate CSV files named 'dgrpool_brain_behavior_male.csv' and 'dgrpool_brain_behavior_female.csv', respectively

Statistical analyses

**The script for this part is `behaviour_analysis.py

This code performs various computations and visualizations on a merged dataframe that combines brain and behavior data. Here's a breakdown of what each function does:

split_string_with_dgrp(df): This function takes a dataframe as input and splits the 'genotype' column into two separate lists: DGRP and sex. It iterates over each row in the dataframe and checks if the 'genotype' column contains the string 'dgrp'. If it does, it extracts the value and appends it to the DGRP list. It also extracts the corresponding value from the 'sex' column and appends it to the sex list. Finally, it returns the DGRP and sex lists.
merge_data(brain, behav, DGRP, sex): This function merges the brain and behavior data based on the DGRP and sex values. It selects rows from the behavior dataframe where the 'genotype' column is in the DGRP list, the 'sex' column is in the sex list, and the 'head_scanned' column is True. It then applies some modifications to the 'genotype' column values in both dataframes to ensure consistency. It renames the 'genotype' column in the behavior dataframe to 'DGRP' and performs the merge operation on the 'DGRP' and 'sex' columns. Finally, it returns the merged dataframe.
calculate_pvalues(df): This function calculates the p-values for the correlation matrix of a dataframe. It creates an empty dataframe with the same columns as the input dataframe. It then iterates over all pairs of columns in the input dataframe and calculates the p-value for their correlation using the pearsonr function from the scipy.stats library. The p-value is rounded to 4 decimal places and stored in the corresponding cell of the output dataframe. Finally, it returns the p-values dataframe.

The code also loads two CSV files, summary.csv and vol_hratio.csv, into the behav and brain dataframes, respectively. It calls the split_string_with_dgrp function to extract the DGRP and sex lists from the behav dataframe. It then calls the merge_data function to merge the brain and behav dataframes based on the DGRP and sex values, resulting in the merged_df dataframe.

# -*- coding: utf-8 -*-

#!/usr/bin/env python3

  

import pandas as pd

import plotly.graph_objects as go

from scipy.stats import pearsonr

  

def split_string_with_dgrp(df):

"""

This function takes a dataframe as input and splits the 'genotype' column into two separate lists: DGRP and sex.

Parameters:

df (DataFrame): The input dataframe containing the 'genotype' and 'sex' columns.

Returns:

DGRP (list): A list of DGRP values extracted from the 'genotype' column.

sex (list): A list of sex values extracted from the 'sex' column.

"""

DGRP=[]

sex = []

for i in range(len(df)):

if 'dgrp' in df.iloc[i, 1]:

DGRP.append(df.iloc[i, 1])

s= df.iloc[i,2]

sex.append(s)

return DGRP,sex

  

def merge_data(brain, behav, DGRP, sex):

"""

This function merges the brain and behavior data based on the DGRP and sex values.

Parameters:

brain (DataFrame): The brain data dataframe.

behav (DataFrame): The behavior data dataframe.

DGRP (list): A list of DGRP values.

sex (list): A list of sex values.

Returns:

merged_df (DataFrame): The merged dataframe containing the brain and behavior data.

"""

data = behav[behav['genotype'].isin(DGRP) & behav['sex'].isin(sex) & behav["head_scanned"]==True]

data['genotype'] = data['genotype'].apply(lambda x: 'DGRP_0' + x.split('dgrp')[1] if len(x.split('dgrp')[1]) == 2 else 'DGRP_' + x.split('dgrp')[1])

brain['DGRP'] = brain['DGRP'].apply(lambda x: 'DGRP_0' + x if len(x) == 2 else 'DGRP_'+ x)

  

data.rename(columns={'genotype': 'DGRP'}, inplace=True)

  

merged_df = pd.merge(brain, data, on=['DGRP', 'sex'])

return merged_df

  

def calculate_pvalues(df):

"""

This function calculates the p-values for the correlation matrix of a dataframe.

Parameters:

df (DataFrame): The input dataframe.

Returns:

pvalues (DataFrame): The dataframe containing the p-values for the correlation matrix.

"""

dfcols = pd.DataFrame(columns=df.columns)

pvalues = dfcols.transpose().join(dfcols, how='outer')

for r in df.columns:

for c in df.columns:

tmp = df[df[r].notnull() & df[c].notnull()]

pvalues[r][c] = round(pearsonr(tmp[r], tmp[c])[1], 4)

return pvalues

  

behav = pd.read_csv("/Users/skumar/Documents/PhD/BrainAnalysis/Behavior/summary.csv")

brain = pd.read_csv("/Users/skumar/Project/PHD_work/GWAS/dataset/vol_hratio.csv", sep=",")

  

DGRP,sex = split_string_with_dgrp(behav)

merged_df = merge_data(brain, behav, DGRP, sex)

  

correlation_matrix = merged_df[["abs_volume","h_ratio","activity","correct_choices","frac_time_on_shocked"]].corr()

"""

fig = px.imshow(correlation_matrix)

fig.show()

  

fig = px.imshow(calculate_pvalues(merged_df[["abs_volume","h_ratio","activity","correct_choices","frac_time_on_shocked"]]))

fig.show()

"""

  

p_values = calculate_pvalues(merged_df[["abs_volume", "h_ratio", "activity", "correct_choices", "frac_time_on_shocked"]])

  

fig = go.Figure(data=go.Heatmap(

z=correlation_matrix.values,

x=correlation_matrix.columns,

y=correlation_matrix.columns,

colorscale="Viridis",

colorbar=dict(title="Correlation Coefficient")

))

  

annotations = []

for i, row in enumerate(correlation_matrix.values):

for j, value in enumerate(row):

annotations.append(

dict(

x=correlation_matrix.columns[j],

y=correlation_matrix.columns[i],

text=f"p-value: {p_values.iloc[i, j]:.3f}",

showarrow=False,

font=dict(color="white" if abs(value) > 0.5 else "black")

)

)

  

fig.update_layout(

title="Correlation Coefficient and p-values",

annotations=annotations,

xaxis=dict(title="Variable"),

yaxis=dict(title="Variable"),

)

  

fig.show()

The output are stored as HTML plots and can be seen in the folder results.

sbourgeat / brain_behavior Goto Github PK

brain_behavior's Introduction

brain_behavior

I- Extract all the relevant DGRP dataset

Importing all the available data

Extract the data for males and females

Define the subset of phenotypes to use

Find the studies ID

Extraction of the relevant behavioural data

Merging morphological and behavioural data

Statistical analyses

brain_behavior's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs