hadro / hathi_analysis Goto Github PK
View Code? Open in Web Editor NEWUsage analysis of HathiTrust
Home Page: https://hadro.github.io/hathi_analysis/
Usage analysis of HathiTrust
Home Page: https://hadro.github.io/hathi_analysis/
Via NK:
what if we did these graphs with just open volumes, so
all volumes in hathi
all open volumes in hathi
all accessed open volumes in hathi
on a big overlaid bar chart
and then the bottom panel would just be a timeseries of %of open volumes accessed (edited)
it would give a bit of a more continuous idea of โif we do the work to open this item, what is the chance it will be accessed?โ
CC @nkrabben
As a user, I would like to compare the number of CCE registrations for books published in the US with HT's data of US published books so that I can determine whether CCE registrations are a good representation for all books published in the US for a particular time.
I have book registration data for 1923-1952 (1953-onward includes non-book type registrations).
AC1: For each year from 1923 to 1952, please count the number of unique titles in HT.
select count(distinct(hathitrust_record_number)) from hathifiles where (publication_date = '1949' or publication_date = '1949.0') and bibliograhic_format = 'BK' and publication_place LIKE '__u'
select count(distinct(oclc_number)) from hathifiles where (publication_date = '1949' or publication_date = '1949.0') and bibliograhic_format = 'BK' and publication_place LIKE '__u'
I'd be interested in seeing how the top40k items represent access proportional to the amount of material in hathi. Some pseudoish code to explain it.
df = pd.DataFrame(data, columns = ['year', 'items_in_top40', 'items_in_hathi', 'items_open])
df['rel_total'] = df.items_in_top_40/df.items_in_hathi
df.plot(x = 'year', y = rel_total, type = 'scatter')
df['rel_open'] = df.items_in_top_40/df.items_open
df.plot(x = 'year', y = rel_open, type = 'scatter')
It would be good to see if there are relationships between publication year and usage amount on a volume level. This might need some binning to be useful.
Quick sketch
Axis 1: Years, maybe binned into decades or centuries
Axis 2: Access level, binned into ... 0, 1, 2-5, 6-20, ... 1,000-1,000,000 (not really sure about the bins)
Axis 3: Either number of volumes in year/access bin, or percentage of volumes in year/access bin compared to that entire year's volumes
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.