Comments (2)
Thanks, I'll try to look into this when I get a chance.
from pynndescent.
The dataset used was Codon Usage. Interestingly enough, it has only nonnegative values (codon percentages), so regular cosine is always nonnegative as well. Exact code:
import pandas as pd
from pynndescent import PyNNDescentTransformer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from pynndescent import PyNNDescentTransformer
from sklearn.pipeline import make_pipeline
df = pd.read_csv("codon_usage.csv")
df = df[pd.to_numeric(df["UUU"], errors="coerce").notnull()].copy()
df = df.copy() # to avoid irritating SettingWithCopyWarning
df["UUU"] = df.loc[:, "UUU"].astype(float)
df["UUC"] = df.loc[:, "UUC"].astype(float)
df = df.loc[df["Ncodons"] >= 1000, :]
df = df.loc[df["Kingdom"] != "plm", :]
df = df.drop(["DNAtype", "SpeciesID", "Ncodons", "SpeciesName"], axis="columns")
kingdom_mapping = {
"arc": 0,
"bct": 1,
"pln": 2,
"inv": 2,
"vrt": 2,
"mam": 2,
"rod": 2,
"pri": 2,
"phg": 3,
"vrl": 4,
}
df = df.replace({"Kingdom": kingdom_mapping})
y = df.pop("Kingdom")
X_train, X_test, y_train, y_test = train_test_split(
df, y, test_size=0.2, random_state=0, stratify=y
)
sklearn_knn = KNeighborsClassifier(metric="cosine")
pynndescent_ann = make_pipeline(
PyNNDescentTransformer(metric="cosine", random_state=0),
KNeighborsClassifier(metric="precomputed"),
)
sklearn_knn.fit(X_train, y_train)
pynndescent_ann.fit(X_train, y_train)
from pynndescent.
Related Issues (20)
- Sample identifiers for semantic search HOT 2
- uint8 as internal data HOT 1
- Question about covariance matrix used when using Mahalanobis distance
- Tests fail: E SystemError: initialization of _internal failed without raising an exception
- Newest version breaks with UMAP HOT 3
- Slice error using mac M1-max ARM HOT 6
- Exceedingly large amount of memory usage
- Very high memory usage HOT 7
- Specifying threshold in distance metrics
- API to save and load index from disk
- Minor bias in split selection? HOT 1
- true_angular is not a distance?
- TSSS missing a factor of 2
- Reverse diversification is actually forward diversification again
- pynndescent might break with next numba release HOT 3
- How to navigate pyinstaller HOT 1
- Querying the training set: runtime tradeoff for large k HOT 2
- 1 test fails: ZeroDivisionError: division by zero
- `np.infty` replacement
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pynndescent.