GithubHelp home page GithubHelp logo

ourresearch / journalsdb Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 0.0 22.29 MB

Open database of scholarly journals

Home Page: https://journalsdb.org

License: MIT License

Python 99.59% Mako 0.16% HTML 0.24% Procfile 0.01%

journalsdb's People

Contributors

caseydm avatar hpiwowar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

journalsdb's Issues

Journal of Applied Corporate Finance

There's two, or maybe just one, issue with this specific Wiley title.

issn's

In JournalsDB we have https://api.journalsdb.org/journals/1936-8216 - in that response we have ISSN's 1936-8216 and 1745-6622, but not 1078-1196

In Crossref https://api.crossref.org/journals/1936-8216 is not found
It is in Crossref under https://api.crossref.org/journals/1078-1196 or https://api.crossref.org/journals/1745-6622

I'm not sure if Crossref is off here or if JournalsDB is? That is, shouldn't Crossref have the linking ISSN? or am I misunderstanding something?

currently publishing / dois_by_issued_year

JournalsDB has only one doi issued in 2015 for this title. Whereas Crossref has DOIs issued all the way up through 2021. Perhaps this is linked to the ISSNs issue discussed above?

AFAICT, it's this false positive for not currently publishing that for one Unsub user means this title is missing from their dashboard

open_access.is_gold_journal

Sage

Is this journal correctly determined as Gold OA? https://api.journalsdb.org/journals/1092-5872 - Looking at the journal website https://journals.sagepub.com/author-instructions/JIX it seems like OA is an option, so wouldn't it be a hybrid journal, and not Gold? It's not in DOAJ.

Springer

These Springer titles all seem to be hybrid/transformative journals, eventually becoming gold OA, but right now seems like they are hybrid? Or if it's a transformative journal, maybe you call it Gold OA? Some of these are in DOAJ and some are not.

ISSN Title Is it Gold OA?
1976-8257 Toxicological research no, hybrid, transformative
0104-6632 Brazilian Journal of Chemical Engineering no, hybrid, transformative
2662-9283 SN Social Sciences no, hybrid, transformative
1976-4251 Carbon letters no, hybrid, transformative
2662-7655 Systems microbiology and biomanufacturing no, hybrid, transformative
0019-5413 Indian journal of orthopaedics no, hybrid, transformative
2454-9983 Proceedings of the Indian National Science Academy ??, page wouldn't resolve
1229-7801 Journal Of The Korean Ceramic Society no, hybrid, transformative
2358-1883 Trends in Psychology no, hybrid, transformative
0019-5235 Indian journal of history of science ??, page wouldn't resolve
2447-9462 Journal of Sedimentary Environments no, hybrid, transformative

missing issn https://api.journalsdb.org/journals/1878-013X

Is a little bit of a tricky one because it is used in a bunch of places (Wikipedia, WorldCat) though not sure how official since I think not on Elsevier's page?

Nonetheless I think we should add it as an issn synonym because people have it in their systems (because of WorldCat I bet) and it is clear what it refers to, and redirecting it is helpful for them.

Questions about some ISSN's related to Sage

Hi Casey, Looking at some titles supposedly published with Sage, given to me by one of our library consortia. A few questions/comments. Sorry the below is a bit messy :)

The goal here is for me to understand why certain titles are excluded from Unsub dashboards, reasons include: moved to Gold OA, not currently publishing, price not available, or new publisher. In particular the following is in relation to the new publisher category.

Here's a spreadsheet of the ISSN's excluded from a few Unsub dashboards. If there's an entry in the notes column I manually went to the website for the journal to inspect it.

https://docs.google.com/spreadsheets/d/10Dw2MRkXzATua4fZh9EzXCY78TU6TioJ8ZG_KbGOK2E/edit#gid=0

In the "notes" column of the above spreadsheet I have the following categories:

  • "clearly different publisher now": Great, new publisher, not Sage anymore, all good.
  • "articles are clearly on Sage websites": Some of these used to be with another publisher; some simply had "null" for publisher in JournalsDB.
  • there's a number of others with other notes based on looking at the publishers websites
  • There's a bunch of ISSNs in the spreadsheet where the publisher is stated as not Sage (e.g., Libertas Academica, or American Educational Research Association, or Tsinghua University Press) in both JournalsDB and Crossref, but the title is clearly published with Sage. For subscription decision purposes for libraries, I imagine these cases where a title is published with Sage and a partner, subscriptions are done through Sage? E.g., subscription for an "American Educational Research Association" title is in Sage https://us.sagepub.com/en-us/nam/journal/educational-researcher#subscribe . In Unsub we use the publisher field from JournalsDB to determine whether to include a title in an Unsub dashboard or not, so it's clearly important what the publisher field has in it. I wonder if there's a way already in JournalsDB to determine the publisher where the subscription has to happen vs. the named publisher (that is simply a partner where the subscription does not occur).

There's a column "crossref/journalsdb differ" that says whether publisher name differs in JournalsDB vs. Crossref

Note on Libertas Academica - Sage acquired this publisher back in 2016 https://sustainingknowledgecommons.org/category/publisher/libertas-academica/ Many of these titles are gold OA anyway, so would be excluded form Unsub dashboards due to that reason, so publisher isn't important, but publisher is important if the title is not Gold OA

Status

Just looking at this today to spot check some work for a consortium. Noticed that in the Unsub backend we use the method set_is_currently_publishing https://github.com/ourresearch/jump-api/blob/master/journalsdb.py#L158-L164 to determine if a journal is currently publishing, based on the array of data in dois_by_issued_year from here. In some cases what we have and what you have doesn't match. Maybe it doesn't matter? Seems like it does though.

For example, the following Sage ISSN's are all ones that in our database say are not currently publishing, and then I checked the journals website and the journalsdb api

issn: Currently publishing based on looking at journal website?, journalsdb api

  • 1525-1071: no, publishing
  • 0265-8135: yes, unknown
  • 0004-8658: yes, unknown
  • 1356-2622: yes, unknown
  • 1473-6691: no, unknown
  • 0971-9458: yes unknown
  • 0970-8464: too hard to tell, unknown
  • 2394-9015: yes, unknown
  • 0145-7217: yes, unknown
  • 0263-774X: yes, unknown
  • 2325-1603: yes, unknown

This is not an exhaustive search, just some examples.

Overal questions:

  1. How is the status fields in the journalsdb API determined?
  2. Is it possible dois_by_issued_year can be incomplete?

gold OA journals with 2021 subscription rates

Hi Casey,

Following up on this issue, I got a list of all issn_ls with oa_status = "gold" in Unpaywall and non-null subscription rates for 2021 in JournalsDB. Many were mistakes in Unpaywall, but 19 of them seem to be fully OA in 2021. They are listed here: https://docs.google.com/spreadsheets/d/1bcdAs1iNM42xL-nGe39SyXc4UqqgsQ8RK5fiCuz1lW8/edit#gid=470956996

I haven't looked for subscription rates directly, so it may be that the some publishers will let you pay for a journal you don't have to pay for, but I expect most of these rates are wrong.

Thanks,
Richard

is a gold OA journal

as per the publisher's website, but we don't list it as one:
https://www.journals.elsevier.com/annals-of-medicine-and-surgery
https://api.journalsdb.org/journals/2049-0801

Guessing this may be because Unpaywall doesn't list it as one (it might have recently flipped). Can you work with Richard to make sure he either adds it to his Gold OA list manually and then that info makes its way to JournalsDB, or you patch manually and he reads that data from you, or ?? Ditto with all should-be-gold-OA journals you find / we report? Thanks!

Heather

larger publishers publishing on behalf of smaller societies

I've touched on this in other issues I think, but wanted to talk about this directly.

There's a number of cases where the publisher listed in JournalsDB and Crossref for an ISSN is some society like the "London Mathematical Society" - But the journal is clearly handled by a larger publisher.

In this case https://api.journalsdb.org/journals/0024-6107 "Journal of the London Mathematical Society" you subscribe to the journal on the Wiley website https://londmathsoc.onlinelibrary.wiley.com/toc/14697750/2006/73/1

For the purposes of Unsub, I'd think it would make sense for this title to be included in their Wiley dashboard, rather than their London Mathematical Society dashboard. But maybe that's not right.

You forwarded me the email covering standardizing across publisher imprints, so that's related here, but I think publishing on behalf of societies is different from imprints, yes?

How have you dealt with this so far with Elsevier? I imagine it has come up.

GBP APC prices

Related to #23 - curious about when JournalsDB API has a USD APC price why we don't have a GBP price too for that journal. That is, why can't we just do the conversion from USD to GBP? I think I can see reasoning for subscription pricing being in certain currencies, but I imagine anyone can pay an APC from any country, yes?

Still learning ๐Ÿ˜ฌ

Scientific American is still publishing

Our data https://api.journalsdb.org/journals/0036-8733 indicates that Scientific American is no longer publishing - 2018 is the last year with DOIs

They clearly are still publishing https://www.scientificamerican.com/

I've talked with Crossref, and it appears it's a problem with the publisher screwing up DOIs somehow. I haven't heard when a fix is coming yet.

Can we get a temporary fix for this? For now, I don't know how we would do this since in Unsub we currently look at dois_by_issued_year to determine if a journal is still publishing, and we don't have #34 yet (but maybe it's close to being ready? in which case we could just set this field manually?).

Any ideas?

Questions about some ISSN's related to Taylor & Francis

Hi Casey, Same thing here as in #32 but for T&F

spreadsheet: https://docs.google.com/spreadsheets/d/1tPNdcaBUleGy8TGJNLa-okj6OW83GiHCbQl87TuYQac/edit?pli=1#gid=916071260

These ISSNs are those excluded from Unsub dashboards for various reasons.

There's a column "crossref/journalsdb differ" that says whether publisher name differs in JournalsDB vs. Crossref. And the next column is "correct?", which is only filled in if the "crossref/journalsdb differ" column has a "yes" in it. If the "correct?" column is filled in, then I've filled in the "notes" column with notes about the publisher and related issues after looking up the ISSN.

In the "notes" column there's various issues.

  • JournalsDB has the correct publisher and Crossref has the wrong publisher.
  • JournalsDB has a publisher that is correct, but is I think an imprint of a larger publisher. For Unsub, we'd need to know the parent publisher, not the imprint
  • In most of these cases the publisher is null for JournalsDB, and I've looked up the correct publisher, which is T&F in most cases. For these ISSN's I also noted whether they are not publishing anymore, if they're under a different ISSN in Crossref, and if the publisher needs to be changed to something other than T&F.

Field for date of the last DOI?

hi @caseydm

In Unsub, we exclude journals from user dashboards if the journal is not publishing anymore. The way it's currently done is not ideal. It would be helpful to have a new field in the API that indicates the date of the last DOI for the journal. e.g.,

{
  "date_last_doi": "2020-01-04"
}

We could then use that date to calculate time since last DOI and determine whether it's publishing anymore based on whatever criteria we choose.

Do you think this is feasible?

If this is done, Heather said okay to do this in a month or so, I imagine after Openalex stuff is wrapped up.

Questions about some SAGE journals

double check all subscription prices for top 5 publishers

Hi Casey,

Similar to the APC request, except it might indeed be the case that some subscription prices are not available on the web (they say "contact us for a quote").

Nonetheless, can you double check all missing subscription prices for the top five publishers, fix any you find via this QA, and send us an email report of any still missing (maybe we will include this as in RELEASE NOTES or something in the future)? Specifically:

If the journal is currently publishing (dois in 2021 is not null)
and currently "gold_rate" IS NOT 1 (opposite of APC query)
then ideally it would have a subscription price. Double check these. If it says "contact for a quote" then can you note that in the db somewhere so we know it was double-checked and when?

For the ones that are missing even after you double check, please make us a list of these by publisher including title and issnl etc.

Thanks!

Standardized publisher names?

hi casey, Working on cleaning up the publisher field in an Unsub database table for journal prices, and we talked about maybe using publisher names that journalsdb uses. However, looking at the data we ingest from journalsdb I'm not sure if names are standardized or not in journalsdb. For example, searching for the big five publisher names in the journalsdb data we ingest I see Wiley and Taylor & Francis are all set, but there's a few variants for Elsevier, SAGE and Springer.

Publisher Rows
Elsevier 4170
Elsevier- Churchill Livingstone 1
SAGE 1464
SAGE Publications 3
Sage Publications (Prufrock Press, Inc.) 1
Springer (Biomed Central Ltd.) 1
Springer Nature 4045
Springer Publishing Company 26
Springer-Verlag 3
Taylor & Francis 3663
Wiley 2363

It appears Elsevier-Churchill Livingstone is part of Elsevier, I think:

curl https://api.crossref.org/members/78 | jq .message.names | grep Living

"Elsevier- Churchill Livingstone",

Some of the more interesting publisher names: "[email protected]", "10.15653 (Tierarztl Prax Ausg G Grosstiere Nutztiere)", "10.35977"

Currently, there's a total of 16,366 publisher names from journalsdb.

I think publishers in journalsdb are not straight from Crossref - I think Heather said that you've done some standardizing. To what extent are they cleaned up after getting them from Crossref?

Curious your thoughts on if we wanted to use standardized publisher names, what is the best source of those?

Wiley issues

@caseydm As I mentioned in the Sage and T&F issues I'd be opening an issue about Wiley. I've spot checked a bunch of ISSNs that are missing from a consortium's Unsub dashboards to make sure they are missing for correct reasons.

Here's the Wiley spreadsheet

https://docs.google.com/spreadsheets/d/1y30cYEYYPakLsP4hXKuK6_hNSpFU0UGFaT_Y0xfi-qY/edit#gid=0

See the "notes" column of the above spreadsheet for the comments. I've filtered it to the journalsdb column == 1, but you can look at the other rows if you want to.

  • There's a bunch that I think should be changed to Wiley - where Wiley publishes the title on behalf of some society publisher
  • There's a bunch where publisher is currently null in journalsdb - suggested publisher for those
  • for issn 0142-0356: there's two entries with the same exact title https://api.journalsdb.org/journals/0142-0356 https://api.journalsdb.org/journals/0268-2575 . Curious if there should be different entries in journalsdb?
  • there's a bunch of titles that are part of a Wiley-Hindawi partnership, e.g., 0730-6679, and in journalsdb and crossref the publisher is listed as Hindawi. It's not super clear who the publisher is - journal content can be found on both Wiley and Hindawi websites - but they're all OA anyway so for Unsub it doesn't matter as we would exclude them anyway from user dashboards. Maybe nothing to do here

Springer Nature issues

Same drill as w/ the other 3 publishers, this time with Springer Nature

Spreadsheet https://docs.google.com/spreadsheets/d/1FS6FOgUsV4F8JRPbHU7i27mO870nWQJDBN8Cjxd7Dc4/edit#gid=0

Filter column journalsdb == 1 for columns where I think a change should be made in journalsdb. To highlight a few of the bigger ones:

  • Allerton Press: a bunch of titles (44) have Allerton Press as the publisher. Crossref has the same publisher for these. However, these titles are all housed with Springer. I don't see any subscription information on the websites for these titles other than "institutional subscription". We do have subscription prices for the few titles are checked under Allerton Press, but I'm not sure who you'd subscribe through (Springer or Allerton). Thoughts?
  • Pleiades Publishing: marked a bunch of titles (127) to ask you about from Pleiades. They have on their website "Subscription is available from the distributor of the English-language journals, Springer Nature" - so for Unsub purposes, i think we want Springer Nature b/c that's how the subscription is done. Do you agree?
  • As with the other publishers (Wiley, SAGE, TF) there's a number of society publishers that have Springer publish their journals
  • There's 46 titles where publisher is null
  • There's 2 titles copublished with Zhejiang University Press that I think should be changed to Springer

missing a lot of issns especially for Elsevier

I've been comparing JournalsDB to the journals data we've been using in Unsub, which we got from Unpaywall (and for the most part it got it from Crossref, by bubbling up issn data from the doi records).

A simplified dump of this comparison dump is here: https://docs.google.com/spreadsheets/d/1-9Iyudcfwr50IsqIUxHJUoLWrslVoR-XYQt7dR6tEJE/edit#gid=559023817

It includes issn_l, issn, publisher, title
flattened by issn, with another field for "version" that is "previous" and "journalsdb" as you can see on the raw data tab.

It is a lot of data and too much to process easily in this format, so I cut out everything except Elsevier on 2nd tab.

I think we are missing a lot of issns, especially for Elsevier. Most issn_ls for Elsevier seem to have only one issn in journalsdb right now. See the pivot table. Most journals actually have an online and a print issn. Not all, but most. It seems we are missing those for Elsevier.

This is very important to fix fairly quickly -- it is holding up using journalsdb in unsub. Thanks!

After that, can you use this data to do various slices and dices comparing "previous" and "journalsdb" in other ways as sanity check? Check if the number of issnls is different by publisher and if so dig in and make sure not a bug, sanity check number of issns/issn_l for other publishers, etc etc. If you'd rather have it in another format let me know. Thanks!

Some Wiley titles

double check all apcs for top five publishers

Hi Casey,

Can you double check we have all APCs for the top five publishers, fix any you find via this QA, and send us an email report of any still missing (maybe we will include this as in RELEASE NOTES or something in the future)? Specifically:

If the journal is currently publishing (dois in 2021 is not null)
and currently has "gold_rate"==1
then our APC price should not be null... there is almost certainly an APC price somewhere on the web.

If it is null even after double checking the web page, can you make us a list of these by publisher, including a link to their home page so we can have a look?

Thanks!

ISSNs not found

Hi casey, the below ISSNs for Elsevier titles are not found in journalsdb. Curious if they should be in there? This arose from some calculations Im doing for one of the consortia.

0160-791x
0304-422x
1042-444x
1226-086x
1342-937x
1871-174x
2212-571x
2214-790x
2352-152x
2352-409x

update publisher

Hi! Can you please update publishers listed as
"Ac\t Prasetiya Mulya Publishing"
to a publisher string of "Prasetiya Mulya Publishing"

Thanks!

How often is data updated?

Hi @caseydm - Someone asked about how often data is updated in Unsub, and I realized I don't know.

So, do you know how often certain fields in a response from https://api.journalsdb.org/journals/{issn} are updated? For example, subscription prices and apc prices. Is that a once a year thing? I assume it can't be automated since you have to pull from various websites and spreadsheets from websites, etc. Or are APC and sub prices done once, and then not updated?

I'm guessing many of the fields in the response are updated on a rolling basis via querying Crossref's API. Yeah?

Nature Reviews ... problem

Hi Casey, A user reported an issue with their Nature subscriptions in Unsub where some titles weren't showing up that they had uploaded title prices for. I had a look and there were 14 titles that all had titles as Nature Reviews - BUT they should have been Nature Reviews Neuroscience, Nature Reviews Cancer, etc. The other issue was that some of these Nature Reviews XXX titles also had the publisher as null in journalsdb.

Here's the ones where title is wrong and publisher is null

Here's the ones where just title is wrong. I only checked some of the below ISSN's with the journalsdb API, and for those that I checked the title was correct, but somehow we ended up with just Nature reviews in Unsub. Maybe the titles were fixed recently?

1740-1526 | Nature reviews
1759-5061 | Nature reviews
1759-5002 | Nature reviews
1759-4774 | Nature reviews
1759-5045 | Nature reviews
1471-0056 | Nature reviews
1759-4758 | Nature reviews
1759-5029 | Nature reviews
1759-4790 | Nature reviews
1471-0072 | Nature reviews
1474-1776 | Nature reviews

Heather says to say this is a high priority fix

thanks! and could you let me know when these are fixed?

missing issns/data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.