Comments (4)
In thinking about it, I agree that a minimal list is better than an expansive one that has multiple arguments per target :) Honestly I just added the films
key as I got annoyed that I forgot the key was movies
😄 The goal is that people use data_utils.input_conversion_dict
to explore the options, so let's fully implement it in the workflow.
I think that language based arguments would be best, as I could see some people being confused if arguments in other languages besides English are displayed. The shear size and depth of the English Wikipedia means that it will be the go to choice for most NLP tasks, wikirec included, with other languages picking up the slack to provide cultural insights or be used in areas where the English wiki's lacking. With that being said, I like the idea of a common
argument - but it might be best if it's kept in mind for later :)
from wikirec.
Sorry if I'm not understanding the issue here; are you looking to scrape different languages on Wikipedia to find valid Infobox topics? Would adding keys manually for different languages also be sufficient for smaller pull requests?
from wikirec.
Hi and thanks for the message :) Adding keys manually would certainly be enough for smaller pull requests. I guess I'm not 100% sure how I want this to function yet, so your input would be welcome. As of now I figure that it would be helpful for people to be able to use data_utils.input_conversion_dict
as a way to check for the most likely arguments and have them standardized (ex: both films
and movies
pointing to the same target to avoid confusion). If we're scraping then we'll get all the ones that are valid, but not particularly useful.
I think that just adding more language keys and then conversions within would be sufficient (but let me know), and then we could add something to the readme that details how to query common options via:
data_utils.input_conversion_dict()["en"].keys()
Let me know, and I appreciate your interest in helping out!
from wikirec.
Adding new language keys in a similar format to the "en" should be simple enough. As for the conversions, what do you think about having multiple keys (e.g., "movies: ...", "films:...",) that point to the same value? Or perhaps, in a single "movies" key, the value is a list of all the related Infoboxes? That way, if people use data_utils.input_conversion_dict
, they get a minimal list of relevant possible arguments.
In fact, maybe another option is to simply have a key that is consistent across all languages. Something like, "common", that will spit out the most likely arguments. Though, this might be a bit naive as we might miss related Infoboxes?
from wikirec.
Related Issues (11)
- New recommendation models HOT 6
- Devising ways to best combine recommendations HOT 19
- Allowing users to express disinterest in model.recommend HOT 3
- Add t-SNE to wikirec HOT 3
- Add neural network model HOT 21
- Create concise requirement and env files
- Update gensim LDA to 4.x HOT 1
- Add WikilinkNN Unwanted Links HOT 3
- Add ability to change model results based on topic models
- Add Wikidata metadata
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wikirec.