eworx-org / labourr Goto Github PK
View Code? Open in Web Editor NEWlabourR: Methods, Classes and Data for Labour Market Analysis
License: GNU General Public License v3.0
labourR: Methods, Classes and Data for Labour Market Analysis
License: GNU General Public License v3.0
Hi - any chance to offer concordance functions with NOC?
great stuff!
Hi there -
Really great to see an R package for converting occupation descriptions to ISCO-08 codes.
Could you describe the process of making the tfidf_tokens dataset though. Does it just use the description field?
Also, I think the matching algorithm can be improved not just by taking the sum of the tfidf scores as it does not penalise for when a term is not in the matched tfidf score.
For example,
'bus driver' returns (num_leaves = 10) best match as:
The weightTokens match for this is:
Whereas 8331 Bus and tram drivers is in the occupations dataset. But the weightTokens are:
Therefore 8332 Heavy truck and lorry drivers is not being penalised for not having 'bus' in it.
I will have a see if the matcher can add a penalty to it if all words aren't in the weighTokens.
Prepare for release:
usethis::use_cran_comments()
devtools::check()
devtools::check_win_devel()
devtools::check(remote = TRUE, manual = TRUE)
rhub::check_with_sanitizers()
rhub::check_for_cran()
Submit to CRAN:
usethis::use_version('major')
cran-comments.md
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
usethis::use_news_md()
Update to latest ESCO release
Map identify_language() with existing ESCO languages and give a list.
Integrate with the ESCO API to retrieve latest data in a tidy format.
Use co-occurrence to allow for a second way of matching with the ESCO classification.
I have a feature request: it will be great if the package enabled user to supply his/her own dictionary of names connected with each ISCO occupation.
This will be useful while applying package to survey data. On the one hand respondents' answers to questions regarding occupations are somewhat different from official names of occupations, so specific (survey) dictionary may be useful to improve quality of codding. On the other hand there is a lot of data from previous research that contain human coded mapping of respndents' answers to ISCO codes, that may be used as dictionaries.
And besides I'd like to thank you for this very useful package!
Allow custom tfidf calculated from tf_idf() to be used for occupation coding.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.