karlvmsousa / in1102-machine-learning Goto Github PK

Disciplina IN1102 - 🤖 Aprendizadem de Máquina 🤖, da pós-graduação no CIn - UFPE.

Python 11.05% Common Lisp 88.95%

machine-learning pattern-recognition aprendizagem-de-maquina python pythonprojects cin-ufpe ufpe kaggle-competition kaggle

in1102-machine-learning's People

Contributors

Watchers

in1102-machine-learning's Issues

Confidence interval for classifiers accuracy

Considering the accuracy obtained by the repeated k-folds, show a point estimate and its confidence interval:

IC ( μ, (1-⍺/2)% ) = μ - σ_x * z_⍺/2, μ + σ_x * z_⍺/2

Matriz de covariância

Será considerada a aproximação em que as covariância entre diferentes atributos é zero. Ou seja, os valores não-nulos serão somente os da diagonal, e assumindo um mesmo valor para todos os atributos.

classes, classes_num = np.unique(np.sort(dataset.index), return_counts=True)
k = len(classes)

varShape = dataShape.groupby(dataShape.index).var().var(axis=1)
varRGB = dataRGB.groupby(dataRGB.index).var().var(axis=1)

covShape = [ np.eye(np.size(dataShape, 1)) * varShape[i] for i in range(k) ]
covRGB = [ np.eye(np.size(dataRGB, 1)) * varRGB[i] for i in range(k) ]

Run the kNN for multiple number of neighbors

In the main.py, would be a good choice run the kNN for different values of n_neighbors, and select the best choice.

[Image Segmentation] Function printing infos about fuzzy clustering

In the main file, the info function in class FuzzyClustering isn't working

Implementar classificador combinado bayseano baseado em k-Vizinhos

Para o classificador:

Aplicar normalização nos dados
Usar distância euclidiana
Criar classe para o classificador

Label encoding

Transform labels (forward and backward) between numerical and categorical data

Implementar Wilcoxon signed-ranks test para comparar os classificadores

A partir das métricas obtidas na validação cruzada estratificada, "30 times 10-fold", aplicar o teste de Wilcoxon. Usar a tabela do teste para comparar o valor obtido no teste com um nível de significância (alpha).

Criar classe para os classificadores

Modularizar o código, "empacotando" a parte de treinamento e previsão do classificador para posterior utilização.

Até agora estava implementando o classificador diretamente. Resta então criar uma classe usando cada parte do que já foi desenvolvido, e colocando métodos (train, predict).

Plot densidade de probabilidade

Tentei plotar o gráfico da densidade de probabilidade pra checar se estava razoavelmente fazendo sentido...

sigmaShape = (dataShape.groupby(dataShape.index).mean()).values
sigmaRGB = (dataRGB.groupby(dataRGB.index).mean()).values

varShape = dataShape.groupby(dataShape.index).var().var(axis=1)
varRGB = dataRGB.groupby(dataRGB.index).var().var(axis=1)

covShape = [ np.eye(np.size(dataShape, 1)) * varShape[i] for i in range(k) ]
covRGB = [ np.eye(np.size(dataRGB, 1)) * varRGB[i] for i in range(k) ]

densShape = [ multivariate_normal.pdf(dataShape.values, sigmaShape[i], covShape[i]) for i in range(k) ]
densRGB = [ multivariate_normal.pdf(dataRGB.values, sigmaRGB[i], covRGB[i]) for i in range(k) ]

# Plot da densidade de probabilidade (está muito estranha!!!)
#x_axis = range(210)
#plt.plot(dataShape.values, multivariate_normal.pdf(dataShape.values, sigmaShape[0], covShape[0]))
#plt.show()

Use normalization fit just with the train data

Use the train data to fit the standard scaler, and transform the train and test data. Test the accuracy results for the bayesian gaussian classifier with/without the normalization.

[Image Segmentation] Randomness in the fuzzy clustering algorithm

Since the initial values for the membership degree is selected randomly, every time that you run the algorithm, you'll get a different solution (based on this randomness). So, two approaches can be followed:

Use one specific seed (as a parameter for the predict function) to select a specific random state
A more interesting approach, select multiple random states, and based on the initial cost, continue with the best seed (that returned the lowest cost).

Implementação do Kernel Gaussiano

Implementar uma função que retorne o valor do kernel gaussiano entre dois exemplos (dois pontos), para um dado atributo.

Além da função do kernel gaussiano em si, é importante fazer o cálculo do termo 2*sigma²

Como dito no artigo, o cálculo de 2*sigma² é feito para cada atributo como a média entre os quartis 0,1 e 0,9 de ||xij - xkj||² com i != j.

karlvmsousa / in1102-machine-learning Goto Github PK

in1102-machine-learning's People

Contributors

Watchers

in1102-machine-learning's Issues

Confidence interval for classifiers accuracy

Matriz de covariância

Run the kNN for multiple number of neighbors

[Image Segmentation] Function printing infos about fuzzy clustering

Implementar classificador combinado bayseano baseado em k-Vizinhos

Label encoding

Implementar Wilcoxon signed-ranks test para comparar os classificadores

Criar classe para os classificadores

Plot densidade de probabilidade

Use normalization fit just with the train data

[Image Segmentation] Randomness in the fuzzy clustering algorithm

Implementação do Kernel Gaussiano

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs