GithubHelp home page GithubHelp logo

econpy / google-ngrams Goto Github PK

View Code? Open in Web Editor NEW
250.0 250.0 76.0 317 KB

Python scripts for retrieving CSV data from the Google Ngram Viewer and plotting it in XKCD style. The Python script for retrieving ngram data was originally modified from the script at www.culturomics.org.

License: MIT License

Python 100.00%

google-ngrams's People

Contributors

econpy avatar timguoqk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

google-ngrams's Issues

Graphing error

Great module. Any idea, however, why it is throwing the following error when I try to graph anything?

Traceback (most recent call last):
File "xkcd.py", line 86, in
plotXKCD(sys.argv[1])
File "xkcd.py", line 21, in plotXKCD
plt.xkcd(scale=2, randomness=2.75)
AttributeError: 'module' object has no attribute 'xkcd'

No graph opening

Hi, great tool you've made here.

When I try to plot a graph with the -plot arg, nothing comes up, and the program runs to completion.

my input is this--- python3 getngrams.py einstein, darwin - plot

I'm working on a mac if that makes a difference. Any idea why this would be happening?

That is when I run it from the command line. I also tried python3 xkcd.py einstein_darwin-eng_2012-1800-2000-3-caseSensitive.csv after that but nothing from there as well.

When I call your script from blender(i'm trying to interface my simulation code to make a graph')

It gives me some errors that look like this

Data saved to Einstein_Darwin-eng_2012-1800-2000-3-caseSensitive.csv
Unable to revert mtime: /Library/Fonts
Unable to revert mtime: /Library/Fonts/Microsoft
Traceback (most recent call last):
File "google-ngrams/xkcd.py", line 86, in
plotXKCD(sys.argv[1])
File "google-ngrams/xkcd.py", line 82, in plotXKCD
fig.savefig(ngramCSVfile.replace('.csv', '.png'), dpi=190)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/figure.py", line 1421, in savefig
self.canvas.print_figure(*args, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/backend_bases.py", line 2220, in print_figure
**kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/backends/backend_agg.py", line 505, in print_png
FigureCanvasAgg.draw(self)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/backends/backend_agg.py", line 451, in draw
self.figure.draw(self.renderer)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/artist.py", line 55, in draw_wrapper
draw(artist, renderer, *args, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/figure.py", line 1034, in draw
func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/artist.py", line 55, in draw_wrapper
draw(artist, renderer, *args, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/axes.py", line 2086, in draw
a.draw(renderer)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/artist.py", line 55, in draw_wrapper
draw(artist, renderer, *args, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/axis.py", line 1093, in draw
renderer)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/axis.py", line 1042, in _get_tick_bboxes
extent = tick.label1.get_window_extent(renderer)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/text.py", line 754, in get_window_extent
bbox, info, descent = self._get_layout(self._renderer)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/text.py", line 320, in _get_layout
ismath=False)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/text.py", line 312, in get_text_width_height_descent
*kl, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/backend_bases.py", line 584, in get_text_width_height_descent
font = self._text2path._get_font(prop)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/matplotlib/textpath.py", line 52, in _get_font
font = FT2Font(str(fname))
RuntimeError: Could not open facefile Humor-Sans.ttf; Cannot_Open_Resource

so i'm assuming it's something with the font? Have you seen this before?

Make it a module?

This script is pretty useful to me and I can think of people that might want to use it.
Have you thought of putting it up on PyPi?

Back-end update

It looks like Google changed the back-end for returning n-gram data, and now the results are generated in a different format. I had to replace a line in getngrams.py to fix it.

Original (line 29):

res = re.findall('var data = (.*?);\\n', req.text)

New (line 29):

res = re.findall('ngrams.data = (.*?);\\n', req.text)

Some queries crash the plotting logic

One example of running the script (taken from the README) is:
python getngrams.py _START_ President ?_NOUN
This works fine and produces a CSV but if you try to alter the command to also output a plot:
python getngrams.py _START_ President ?_NOUN -plot
the xkcd.py module will throw an exception:

Traceback (most recent call last):
  File "xkcd.py", line 84, in <module>
    plotXKCD(sys.argv[1])
  File "xkcd.py", line 47, in plotXKCD
    for label in legend.get_texts():
AttributeError: 'NoneType' object has no attribute 'get_texts'

License

Thanks for the really nice project. Just wanted to say that you should really put an explicit license on this. Without a license, it's copyrighted. If you don't want to put a license on it, then you should say explicitly that it's in the public domain rather than saying "no license" (which is an implicit statement of copyright).

Is this broken?

When I run it it does claim to have saved a csv file, but the csv-file is empty. Trying -corpus=eng_2019 also throws an error. Can somebody check whether this works for them? Thanks.

Unable to process German, Chinese and Hebrew in case inSensitive mode

This can be reproduced with this query:

python getngrams.py פמיניזם:heb_2012, 女性主义:chi_sim_2012 --startYear= --endYear=2008 -caseInsensitive -smoothing=1

The problem is that these languages only return one case, so there is no (all) column so the data is thrown away in the -AllData routine

caseInsensitive mode has problems

python google-ngrams/getngrams.py Abenaki,Apache -startYear=1950 -endYear=2000

works fine

python google-ngrams/getngrams.py Abenaki,Apache -startYear=1950 -endYear=2000 --caseInsensitive

fails to return results for Abenaki.

Fails with queries with spaces after `,`

Queries written as first term, second term fail to return all the data. From inspecting with a python debugger, it seems like the issue is in the part of the code that prunes excess columns: it searches for what effectively is the string " second term" in a list of column names that will contain "second term" without the initial space.

-caseInsentive doesn't seem to work for n-grams where n>1

This seems to be an issue with n-grams containing "I" .All n-grams containing "i" (in lower case) return a frequency value of 0 even when -caseInsensitive is added to the command.Is it that it only takes into account all occurrences of the n-gram in any case(lower,upper or title case) only if the provided form(case) of the n-gram exists in the corpora?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.