GithubHelp home page GithubHelp logo

ma-scraper's People

Contributors

jonchar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ma-scraper's Issues

ReviewContent is blank

Hello,

First of all, thank you so much for this. I was trying to make a MA scraper myself, particularly for lyrics and reviews, but couldn't get it to work. Maybe I should pick a less tricky (and less interesting) site to try web-scraping the first time!

Anyway, after running MA_review_scraper.py everything was copied except for the ReviewContent. I tried using a small subset of the code on a sample review it managed to successfully print.

url = "https://www.metal-archives.com/reviews/Death/Scream_Bloody_Gore/598/CactusSlaughter/400395"
r = requests.get(url)
html = r.text

# Create a BeautifulSoup object from the HTML: soup
soup = BeautifulSoup(html, "lxml")
review_soup = BeautifulSoup(r.text, 'html.parser')
review_title = review_soup.find_all('h3')[0].text.strip()[:-6]
review = review_soup.find_all('div', {'class': 'reviewContent'})[0].text
print(review)

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported.

Hi Jon, thank you for your brilliant data analyzer for MA :-) However, I seem to have hit a small problem, I'm not sure how to circumvent;

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported.

The error occurs at:

We have 11 clusters which we can put on a 3x4 grid of plots, and disable the last plot...

code

I'm just running the notebook as it is - no changes, other than an updated .csv file from MA. Everything works like a charm, until the routine gets to clustering.

I've tried for days to figure out what this means and tried numerous things, but to no avail. Please bear with me - I haven't programmed for 18+ years, so I'm a bit rusty.

Hope you can help, since you little program seems really awesome ๐Ÿ‘

Best regards
Kim

Returning JSON is the same

Hello! I tried to use MA_band_scraper.py for getting full list of bands, but code was crashing with error json.decoder.JSONDecodeError: Expecting value: line 4 column 11 (char 66). After some checking stuff I found that problem is in request response, as value after "sEcho": is empty space. How it looks:

{ 
    "iTotalRecords": 11835,
    "iTotalDisplayRecords": 11835,
    "sEcho": ,
    "aaData": [
        [ 
        "<a href='https://www.metal-archives.com/bands/A_--_Solution/3540442600'>A // Solution</a>", 
        "United States",
... a lot of strings

Therefore, to fix it, I manually added value to each such response - as payload in lines 36-38 does not help. What's worse is that payload does not work at all - every chunk of band data is the same first 500 bands. It changes only with changing letters, but for one letter every chunk is the same. Here is code after my slight changes: https://gist.github.com/ramskyi/8d831e561d835ef0659bcfb8788ca4e0

Response 403 from request

Hi, I am just getting into web-scraping and is experimenting with metal-archives.com. I just found your ma-scraper here and it looks super helpful for me to learn!

However, the code does not seems to be able to get data from MA right now (was working just a few days earlier, maybe because MA moved their data on a new server? from their homepage: Maintenance / 2018-10-19 14:50
The site will be migrating to a new server tonight at midnight EDT / 4 am UTC. There will be some downtime. We'll try to make the process as quick and smooth as possible.
)

specifically, it seems that the problem is with
r = requests.get(BASEURL + RELURL + letter, params=payload),
that I am unable to get data by requests, it just returns
<Response [403]>
I absolutely have no idea how to make the scraper work at this point, so I was wondering someone more knowledgeable than me like you have an idea of what's going on.
Thank you very much for your work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.