GithubHelp home page GithubHelp logo

eliasdabbas / advertools Goto Github PK

View Code? Open in Web Editor NEW
1.1K 36.0 201.0 21.56 MB

advertools - online marketing productivity and analysis tools

Home Page: https://advertools.readthedocs.io

License: MIT License

Python 66.08% Makefile 0.07% HTML 0.08% Shell 0.02% Julia 33.76%
marketing advertising adwords python digital-marketing online-marketing keywords search-engine-marketing twitter-api search-engine-optimization

advertools's Introduction

image

Documentation Status

image

Announcing Data Science with Python for SEO course: Cohort based course, interactive, live-coding.

advertools: productivity & analysis tools to scale your online marketing

A digital marketer is a data scientist.
Your job is to manage, manipulate, visualize, communicate, understand, and make decisions based on data.

You might be doing basic stuff, like copying and pasting text on spread sheets, you might be running large scale automated platforms with sophisticated algorithms, or somewhere in between. In any case your job is all about working with data.

As a data scientist you don't spend most of your time producing cool visualizations or finding great insights. The majority of your time is spent wrangling with URLs, figuring out how to stitch together two tables, hoping that the dates, won't break, without you knowing, or trying to generate the next 124,538 keywords for an upcoming campaign, by the end of the week!

advertools is a Python package that can hopefully make that part of your job a little easier.

Installation

python3 -m pip install advertools

Philosophy/approach

It's very easy to learn how to use advertools. There are two main reasons for that.

First, it is essentially a set of independent functions that you can easily learn and use. There are no special data structures, or additional learning that you need. With basic Python, and an understanding of the tasks that these functions help with, you should be able to pick it up fairly easily. In other words, if you know how to use an Excel formula, you can easily use any advertools function.

The second reason is that advertools follows the UNIX philosophy in its design and approach. Here is one of the various summaries of the UNIX philosophy by Doug McIlroy:

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

Let's see how advertools follows that:

Do one thing and do it well: Each function in advertools aims for that. There is a function that just extracts hashtags from a text list, another one to crawl websites, one to test which URLs are blocked by robots.txt files, and one for downloading XML sitemaps. Although they are designed to work together as a full pipeline, they can be run independently in whichever combination or sequence you want.

Write programs to work together: Independence does not mean they are unrelated. The workflows are designed to aid the online marketing practitioner in various steps for understanding websites, SEO analysis, creating SEM campaigns and others.

Programs to handle text streams because that is a universal interface: In Data Science the most used data structure that can be considered “universal” is the DataFrame. So, most functions return either a DataFrame or a file that can be read into one. Once you have it, you have the full power of all other tools like pandas for further manipulating the data, Plotly for visualization, or any machine learning library that can more easily handle tabular data.

This way it is kept modular as well as flexible and integrated. As a next step most of these functions are being converted to no-code interactive apps for non-coders, and taking them to the next level.

SEM Campaigns

The most important thing to achieve in SEM is a proper mapping between the three main elements of a search campaign

Keywords (the intention) -> Ads (your promise) -> Landing Pages (your delivery of the promise) Once you have this done, you can focus on management and analysis. More importantly, once you know that you can set this up in an easy way, you know you can focus on more strategic issues. In practical terms you need two main tables to get started:

SEO

Probably the most comprehensive online marketing area that is both technical (crawling, indexing, rendering, redirects, etc.) and non-technical (content creation, link building, outreach, etc.). Here are some tools that can help with your SEO

  • SEO crawler: A generic SEO crawler that can be customized, built with Scrapy, & with several features:
    • Standard SEO elements extracted by default (title, header tags, body text, status code, response and request headers, etc.)
    • CSS and XPath selectors: You probably have more specific needs in mind, so you can easily pass any selectors to be extracted in addition to the standard elements being extracted
    • Custom settings: full access to Scrapy's settings, allowing you to better control the crawling behavior (set custom headers, user agent, stop spider after x pages, seconds, megabytes, save crawl logs, run jobs at intervals where you can stop and resume your crawls, which is ideal for large crawls or for continuous monitoring, and many more options)
    • Following links: option to only crawl a set of specified pages or to follow and discover all pages through links
  • robots.txt downloader A simple downloader of robots.txt files in a DataFrame format, so you can keep track of changes across crawls if any, and check the rules, sitemaps, etc.
  • XML Sitemaps downloader / parser An essential part of any SEO analysis is to check XML sitemaps. This is a simple function with which you can download one or more sitemaps (by providing the URL for a robots.txt file, a sitemap file, or a sitemap index
  • SERP importer and parser for Google & YouTube Connect to Google's API and get the search data you want. Multiple search parameters supported, all in one function call, and all results returned in a DataFrame
  • Tutorials and additional resources

Text & Content Analysis (for SEO & Social Media)

URLs, page titles, tweets, video descriptions, comments, hashtags are some examples of the types of text we deal with. advertools provides a few options for text analysis

  • Word frequency Counting words in a text list is one of the most basic and important tasks in text mining. What is also important is counting those words by taking in consideration their relative weights in the dataset. word_frequency does just that.
  • URL Analysis We all have to handle many thousands of URLs in reports, crawls, social media extracts, XML sitemaps and so on. url_to_df converts your URLs into easily readable DataFrames.
  • Emoji Produced with one click, extremely expressive, highly diverse (3k+ emoji), and very popular, it's important to capture what people are trying to communicate with emoji. Extracting emoji, get their names, groups, and sub-groups is possible. The full emoji database is also available for convenience, as well as an emoji_search function in case you want some ideas for your next social media or any kind of communication
  • extract_ functions The text that we deal with contains many elements and entities that have their own special meaning and usage. There is a group of convenience functions to help in extracting and getting basic statistics about structured entities in text; emoji, hashtags, mentions, currency, numbers, URLs, questions and more. You can also provide a special regex for your own needs.
  • Stopwords A list of stopwords in forty different languages to help in text analysis.
  • Tutorial on DataCamp for creating the word_frequency function and explaining the importance of the difference between absolute and weighted word frequency
  • Text Analysis for Online Marketers An introductory article on SEMrush

Social Media

In addition to the text analysis techniques provided, you can also connect to the Twitter and YouTube data APIs. The main benefits of using advertools for this:

  • Handles pagination and request limits: typically every API has a limited number of results that it returns. You have to handle pagination when you need more than the limit per request, which you typically do. This is handled by default
  • DataFrame results: APIs send you back data in a formats that need to be parsed and cleaned so you can more easily start your analysis. This is also handled automatically
  • Multiple requests: in YouTube's case you might want to request data for the same query across several countries, languages, channels, etc. You can specify them all in one request and get the product of all the requests in one response
  • Tutorials and additional resources
  • A visual tool to check what is trending on Twitter for all available locations
  • A Twitter data analysis dashboard with many options
  • How to use the Twitter data API with Python
  • Extracting entities from social media posts tutorial on Kaggle
  • Analyzing 131k tweets by European Football clubs tutorial on Kaggle
  • An overview of the YouTube data API with Python

Conventions

Function names mostly start with the object you are working on, so you can use autocomplete to discover other options:

kw_: for keywords-related functions
ad_: for ad-related functions
url_: URL tracking and generation
extract_: for extracting entities from social media posts (mentions, hashtags, emoji, etc.)
emoji_: emoji related functions and objects
twitter: a module for querying the Twitter API and getting results in a DataFrame
youtube: a module for querying the YouTube Data API and getting results in a DataFrame
serp_: get search engine results pages in a DataFrame, currently available: Google and YouTube
crawl: a function you will probably use a lot if you do SEO
*_to_df: a set of convenience functions for converting to DataFrames (log files, XML sitemaps, robots.txt files, and lists of URLs)

advertools's People

Contributors

amrrs avatar andypayne avatar bilalmirza74 avatar danielp77 avatar dreadedhamish avatar eliasdabbas avatar lgtm-migrator avatar pyup-bot avatar takluyver avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

advertools's Issues

pandas frame.append method is deprecated

Hi @eliasdabbas

The sitemap_to_df function is throwing the following warning, so I thought it would be a good idea to bring it to your notice.

2022-04-21 18:46:26,560 | INFO | sitemaps.py:419 | sitemap_to_df | Getting https://xyz.com/sitemap/site.xml
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/advertools/sitemaps.py:421: 
FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. 
Use pandas.concat instead.
  sitemap_df = sitemap_df.append(elem_df, ignore_index=True)

Regards.

crawl dataFrame - jsonld objects

I've made it as far as examining the dataFrame returned from a crawl!

Looking through the docs I was expecting separately labelled columns for the various bits of jsonls data. Instead I'm seeing a column with lots of Objects within it. Is this a quirk of viewing a datFrame within VSCode? Or are the docs out of date? Or something else? I'm a little stuck!

Thanks!

browser can get https://zapier.com but when run scrape failed

(base) wenke@wenkedeMac-mini gradio-demo % python zapier.py 
2023-12-03 00:40:48 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: scrapybot)
2023-12-03 00:40:48 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.4, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.1, Twisted 22.10.0, Python 3.9.16 (main, Mar  8 2023, 04:29:44) - [Clang 14.0.6 ], pyOpenSSL 23.0.0 (OpenSSL 1.1.1t  7 Feb 2023), cryptography 39.0.1, Platform macOS-10.15.7-x86_64-i386-64bit
2023-12-03 00:40:49 [scrapy.addons] INFO: Enabled addons:
[]
2023-12-03 00:40:49 [py.warnings] WARNING: /Users/wenke/miniconda3/lib/python3.9/site-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.

It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.

See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation.
  return cls(crawler)

2023-12-03 00:40:49 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2023-12-03 00:40:49 [scrapy.extensions.telnet] INFO: Telnet Password: 4f579800aa59aff0
2023-12-03 00:40:50 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats']
2023-12-03 00:40:50 [scrapy.crawler] INFO: Overridden settings:
{'ROBOTSTXT_OBEY': True,
 'SPIDER_LOADER_WARN_ONLY': True,
 'USER_AGENT': 'advertools/0.13.5'}
2023-12-03 00:40:50 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2023-12-03 00:40:50 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-12-03 00:40:50 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2023-12-03 00:40:50 [scrapy.core.engine] INFO: Spider opened
2023-12-03 00:40:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-12-03 00:40:50 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-12-03 00:40:50 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://zapier.com/robots.txt> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2023-12-03 00:40:50 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://zapier.com/robots.txt> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2023-12-03 00:40:50 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://zapier.com/robots.txt> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2023-12-03 00:40:50 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET https://zapier.com/robots.txt>: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Traceback (most recent call last):
  File "/Users/wenke/miniconda3/lib/python3.9/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_request
    return (yield download_func(request=request, spider=spider))
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2023-12-03 00:40:50 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://zapier.com> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2023-12-03 00:40:50 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://zapier.com> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2023-12-03 00:40:50 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://zapier.com> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2023-12-03 00:40:51 [seo_spider] ERROR: <twisted.python.failure.Failure twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]>
2023-12-03 00:40:51 [scrapy.core.scraper] DEBUG: Scraped from [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2023-12-03 00:40:51 [scrapy.core.engine] INFO: Closing spider (finished)
2023-12-03 00:40:51 [scrapy.extensions.feedexport] INFO: Stored jl feed (1 items) in: zapier.jl
2023-12-03 00:40:51 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 6,
 'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived': 6,
 'downloader/request_bytes': 1248,
 'downloader/request_count': 6,
 'downloader/request_method_count/GET': 6,
 'elapsed_time_seconds': 0.71527,
 'feedexport/success_count/FileFeedStorage': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2023, 12, 2, 16, 40, 51, 34244, tzinfo=datetime.timezone.utc),
 'item_scraped_count': 1,
 'log_count/DEBUG': 6,
 'log_count/ERROR': 4,
 'log_count/INFO': 11,
 'log_count/WARNING': 1,
 'memusage/max': 123461632,
 'memusage/startup': 123461632,
 'retry/count': 4,
 'retry/max_reached': 2,
 'retry/reason_count/twisted.web._newclient.ResponseNeverReceived': 4,
 "robotstxt/exception_count/<class 'twisted.web._newclient.ResponseNeverReceived'>": 1,
 'robotstxt/request_count': 1,
 'scheduler/dequeued': 3,
 'scheduler/dequeued/memory': 3,
 'scheduler/enqueued': 3,
 'scheduler/enqueued/memory': 3,
 'start_time': datetime.datetime(2023, 12, 2, 16, 40, 50, 318974, tzinfo=datetime.timezone.utc)}
2023-12-03 00:40:51 [scrapy.core.engine] INFO: Spider closed (finished)

import advertools as adv
adv.crawl('https://zapier.com', 'zapier.jl', follow_links=True)

Suggestion - don't treat jsonld items in distinct script tags as distinct.

Some field observations with a small data set of only 12 sites:
8 place all their json-ld tags in a hierarchy under @graph (wrapped in a single script tag), and 4 spread place their items in seperate script tags.
advertools treats each occurrence of as a seperate entity, so for those in a hierarchy there is a single json-dl @graph column with nested object, and those without a hierachy get spread out over multiple columns (json_1_ etc...).

I'm building a scraper that regularly and often scrapes the same sites. With the current functionaly of advertools I will need to inspect each site and write conditions to scrape columns depending on how json-ld tags have been stored.

I think it would be helpful to treat json-ld items, whether they are in a hierachy under @graph or in distinct script tags, as being equal. As far as I can tell the only difference apart from being nested is that the @context entry appears as a sibling to @graph and so only appears once in the whole scheme.

NESTED:
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "WebSite",
...

NON-NESTED:
{
"@context": "http://schema.org",
"@type": "Website",
...

Then as a bonus break out each @type to a column with the whole record within.

Freezing versions for dependencies

@eliasdabbas

Currently, versions are not specified for the dependencies mentioned in the setup.py
Current config:
requirements = [
'pandas',
'pyasn1',
'scrapy',
'twython',
'pyarrow',
]

So, on every install, latest versions will be downloaded and installed for these requried packages.
Instead, we can adopt a common practice by specyfying the required versions to avoid any breaking changes in the newer versions of these dependent packages.

requirements = [
'pandas==1.4.2',
'pyasn1==0.4.8',
'scrapy==2.6.1',
'twython==3.9.1',
'pyarrow==8.0.0',
]

Instagram Mentions Allows Periods

Hi Elias,

Comments on Instagram that include mentions with periods are currently truncated on advertools.__version__ = 0.14.2.
Example: @elias.dabbas -> [@elias]

I propose adding adding a . to the MENTIONS in the REGEX module.

MENTION = re.compile(
    r"""(?i)     # case-insensitive
    (?<!\w)      # word character doesn't precede mention
    ([@@]       # either of two @ signs
    [a-z0-9_.]+)  # A to Z, numbers and underscores AND PERIODS only
    \b           # end with a word boundary
    """, re.VERBOSE)

This change works for me, but I haven't tested edge cases or other social media platforms.

sitemap_to_df cannot handle recursion well

If you try sitemap_to_df for wrangler.com, you will notice that there is a recursion in the sitemap. It calls the sitemap index again and again without terminating. There should be a check to keep track of visited Sitemaps.

Pandas Futurewarning "fillna" in url_to_df()

Method: adv.url_to_df()

advertools/urlytics.py:198: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
.assign(last_dir=dirs_df

Getting NaN values for serp_goog function

I tried running the following query:
df = adv.serp_goog(q=search_term, cx=cse_id, key=api_key)

However, it just returns a bunch of NaN values for rank, title, snippet, displaylink, link
image

Scraps forever

Here is the list of URLs I'm trying to scrape, which are stuck and never finishes.

https://www.si.com/showcase/fitness/best-boxing-gloves
https://www.verywellfit.com/best-boxing-gloves-4158917
https://www.rollingstone.com/product-recommendations/lifestyle/best-boxing-gloves-1234690811/
https://www.gearpatrol.com/fitness/g40446087/best-boxing-gloves/
https://boxingglovesreviews.com/top-ten-boxing-gloves/
https://sweetscienceoffighting.com/best-boxing-gloves/
https://www.shape.com/fitness/gear/best-boxing-gloves
https://www.t3.com/features/best-boxing-gloves
https://bleacherreport.com/articles/1286577-breaking-down-different-brands-of-boxing-gloves-worn-by-the-pros
https://www.youtube.com/watch?v=tWoucO2nIlE
https://expertboxing.com/best-boxing-gloves-review
https://thekarateblog.com/best-boxing-gloves/
https://boxupnation.com/blogs/news/my-top-5-favorite-boxing-glove-brands-and-why
https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131
https://www.tabletenniscoach.me.uk/sport-equipment-guides/best-boxing-gloves-for-beginners/
https://myboxinglife.com/best-boxing-gloves-for-beginners/
https://www.youtube.com/watch?v=rHepbZOCxfY
https://wayofmartialarts.com/best-boxing-gloves-worth-your-money/
https://www.hayabusafight.com/products/t3-boxing-gloves
https://www.dickssportinggoods.com/o/best-boxing-gloves-for-pad-work
https://revgear.com/gear/boxing-gloves/
https://blog.joinfightcamp.com/boxing-equipment/how-to-choose-the-best-boxing-gloves-for-beginners/
https://www.ebay.com/t/Boxing-Gloves/30102/bn_1943751
https://cletoreyesboxing.com/
https://www.walmart.com/c/lists/top-rated-boxing-gloves
https://www.ringsport.com.au/blogs/ringsport-blog/boxing-glove-guide-part-1
https://made4fighters.com/blogs/default-blog/top-womens-boxing-gloves
https://m.timesofindia.com/most-searched-products/sports-equipment/boxing-gloves-for-beginners-best-picks/articleshow/97912567.cms
https://www.everlast.com/fight/boxing/gloves
https://www.msmfightshop.com/blogs/news/top-3-boxing-gloves-in-the-world
https://www.quora.com/What-companies-make-the-best-quality-boxing-gloves
https://www.titleboxing.com/gloves
https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-professionals/articleshow/97128538.cms
https://skilspo.com/gb/blog/1_how-to-choose-the-best-boxing-gloves.html
https://bravose.com/collections/training-gloves
https://sanabulsports.com/blogs/news/the-best-boxing-gloves-for-training
https://anthonyjoshua.com/blogs/news/anthony-joshua-how-to-choose-the-best-boxing-gloves
https://www.nakmuaywholesale.com/top-3-boxing-gloves-for-small-hands-2022/
https://mmagearaddict.com/best-boxing-gloves/
https://issuu.com/punchequipment/docs/get_the_best_boxing_gloves_for_a_winning_performan
https://tufwear-germany.de/en/blogs/news/was-sind-die-besten-boxhandschuhe-der-boxhandschuh-guide-fur-deinen-kauf
https://yokkao.com/pages/boxing-gloves-guide
https://topboxer.com/collections/boxing-gloves
https://warriorpunch.com/best-boxing-gloves-for-beginners/
https://nypost.com/article/best-boxing-equipment-per-experts/
https://origympersonaltrainercourses.co.uk/blog/best-boxing-gloves
https://www.infinitudefight.com/buy-the-best-boxing-gloves/
https://cashkaro.com/blog/best-boxing-gloves-in-india/201246
https://www.popsugar.com/fitness/Best-Boxing-Gloves-Women-45472473
https://kdvr.com/reviews/br/sports-fitness-br/boxing-br/best-title-boxing-gloves/
https://www.expertreviews.co.uk/health-and-grooming/1407584/best-boxing-gloves
https://branded.disruptsports.com/blogs/blog/which-boxing-gloves-to-buy-for-beginners
https://www.flipkart.com/sports/boxing/boxing-gloves/pr?sid=abc%2Cppq%2Cbb6&page=2
https://www.reddit.com/r/amateur_boxing/comments/2ykhau/the_top_15_best_boxing_gloves_ranking_the_best/
https://fightquality.com/2018/10/12/best-custom-gloves/
https://fightingadvice.com/best-boxing-gloves-under-200/
https://glovesaddict.com/best-boxing-gloves-on-amazon/
https://www.k2promos.com/best-beginner-boxing-gloves/
https://absolutelymartialarts.com/best-boxing-gloves-beginners/
https://www.healthyprinciples.co.uk/best-boxing-gloves-for-kids-review/
https://breakinggrips.com/best-kids-boxing-gloves/
https://www.proboxingequipment.com/Boxing-Gloves_c_196.html
https://www.mmahive.com/best-boxing-gloves-for-wrist-support/
https://bwsgym.com/etiquette-produit/best-boxing-gloves/
https://www.dontwasteyourmoney.com/products/hawk-sports-heavy-bag-boxing-gloves/
https://www.bestproducts.com/fitness/equipment/g1009/boxing-gloves-mitts/
https://www.wbcme.co.uk/ringside/best-boxing-gloves-for-beginners/
https://www.momjunction.com/articles/best-boxing-gloves-for-kids_00514921/
https://middleeasy.com/reviews/gear/gloves-cardio-kickboxing/
https://www.fightingking.com/boxing-gloves-brands-reviews/
https://www.mightyfighter.com/top-10-best-boxing-gloves/
https://www.stylecraze.com/articles/best-heavy-bag-gloves/
https://linealboxing.com/best-boxing-glove-brands-2022/
https://blackbeltmag.com/best-boxing-gloves
https://smartmma.com/best-boxing-gloves-for-heavy-bag/
https://www.fullcontactway.com/best-sparring-gloves/
https://www.attacktheback.com/best-cheap-boxing-gloves/
https://www.boxingear.com/shop-2/grant-gloves/lace-up/best-boxing-gloves-for-sparring-grant-gloves/
https://www.kreedon.com/best-boxing-gloves-brands/
https://bestreviews.com/sports-fitness/boxing/best-boxing-gloves
https://cletoreyesuk.com/blogs/news/what-are-the-best-boxing-gloves-for-beginners
https://www.fitnessbaddies.com/amateur-boxing-gloves/
https://www.boxingison.com/best-boxing-gloves-for-training-and-sparring/
https://boxingready.com/ringside/best-boxing-gloves-wrist-support/
https://www.msn.com/en-gb/lifestyle/rf-best-products-uk/best-boxing-gloves-for-men-12oz-reviews
https://www.pragmaticmom.com/2019/11/best-boxing-gloves-for-women/
https://thewiredshopper.com/best-boxing-gloves-to-buy/
https://www.standard.co.uk/shopping/esbest/health-fitness/fitness-wear/best-womens-boxing-gloves-for-beginners-a4272321.html
https://www.gloveworx.com/blog/how-choose-best-boxing-gloves-beginners/
https://www.lowkickmma.com/best-boxing-gloves/
https://www.sportsdirect.com/boxing/boxing-gloves
https://themmaguru.com/best-youth-boxing-gloves/
https://brawlbros.com/best-boxing-gloves-on-amazon/
https://thechamplair.com/sports/best-beginners-boxing-gloves/
https://www.dmarge.com/best-boxing-gloves
https://www.nytimes.com/video/style/1194840632119/gear-test-boxing-gloves.html
https://findbestboxinggloves.com/best-boxing-gloves-for-heavy-bag-the-complete-guide/
https://www.hungry4fitness.co.uk/post/10-best-boxing-mitts-an-ultimate-guide
https://www.gearhungry.com/best-boxing-gloves/
https://hiconsumption.com/best-boxing-gloves/

Here is the log

/home/irfan/.pyenv/versions/TES/bin/python /home/irfan/PycharmProjects/TES-SAAS/tests/scprapping.py 
2023-05-05 06:52:32 [scrapy.utils.log] INFO: Scrapy 2.6.1 started (bot: scrapybot)
2023-05-05 06:52:32 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 22.4.0, Python 3.7.9 (default, Jan 23 2022, 07:32:51) - [GCC 7.5.0], pyOpenSSL 22.0.0 (OpenSSL 3.0.3 3 May 2022), cryptography 37.0.2, Platform Linux-5.4.0-148-generic-x86_64-with-debian-bullseye-sid
2023-05-05 06:52:32 [scrapy.crawler] INFO: Overridden settings:
{'ROBOTSTXT_OBEY': True,
 'SPIDER_LOADER_WARN_ONLY': True,
 'USER_AGENT': 'advertools/0.13.2'}
2023-05-05 06:52:32 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2023-05-05 06:52:32 [scrapy.extensions.telnet] INFO: Telnet Password: 2dcb88ca688b5e23
2023-05-05 06:52:32 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats']
2023-05-05 06:52:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2023-05-05 06:52:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-05-05 06:52:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2023-05-05 06:52:33 [scrapy.core.engine] INFO: Spider opened
2023-05-05 06:52:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-05-05 06:52:33 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-05-05 06:52:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sweetscienceoffighting.com/robots.txt> (referer: None)
2023-05-05 06:52:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.rollingstone.com/robots.txt> (referer: None)
2023-05-05 06:52:33 [filelock] DEBUG: Attempting to acquire lock 140227121181328 on /home/irfan/.cache/python-tldextract/3.7.9.final__TES__f2586e__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2023-05-05 06:52:33 [filelock] DEBUG: Lock 140227121181328 acquired on /home/irfan/.cache/python-tldextract/3.7.9.final__TES__f2586e__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2023-05-05 06:52:33 [filelock] DEBUG: Attempting to release lock 140227121181328 on /home/irfan/.cache/python-tldextract/3.7.9.final__TES__f2586e__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2023-05-05 06:52:33 [filelock] DEBUG: Lock 140227121181328 released on /home/irfan/.cache/python-tldextract/3.7.9.final__TES__f2586e__tldextract-3.3.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2023-05-05 06:52:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.t3.com/robots.txt> (referer: None)
2023-05-05 06:52:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.si.com/robots.txt> (referer: None)
2023-05-05 06:52:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gearpatrol.com/robots.txt> (referer: None)
2023-05-05 06:52:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.shape.com/robots.txt> (referer: None)
2023-05-05 06:52:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.verywellfit.com/robots.txt> (referer: None)
2023-05-05 06:52:33 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.si.com/showcase/fitness/best-boxing-gloves> (referer: None)
2023-05-05 06:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxingglovesreviews.com/robots.txt> (referer: None)
2023-05-05 06:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.t3.com/features/best-boxing-gloves> (referer: None)
2023-05-05 06:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bleacherreport.com/robots.txt> (referer: None)
2023-05-05 06:52:34 [scrapy.core.scraper] DEBUG: Scraped from <403 https://www.si.com/showcase/fitness/best-boxing-gloves>
2023-05-05 06:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.rollingstone.com/product-recommendations/lifestyle/best-boxing-gloves-1234690811/> (referer: None)
2023-05-05 06:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://expertboxing.com/robots.txt> (referer: None)
2023-05-05 06:52:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.t3.com/features/best-boxing-gloves>
2023-05-05 06:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.youtube.com/robots.txt> (referer: None)
2023-05-05 06:52:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.rollingstone.com/product-recommendations/lifestyle/best-boxing-gloves-1234690811/>
2023-05-05 06:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.verywellfit.com/best-boxing-gloves-4158917> (referer: None)
2023-05-05 06:52:34 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.verywellfit.com/best-boxing-gloves-4158917>
2023-05-05 06:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.shape.com/fitness/gear/best-boxing-gloves> (referer: None)
2023-05-05 06:52:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://thekarateblog.com/robots.txt> (referer: None)
2023-05-05 06:52:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.shape.com/fitness/gear/best-boxing-gloves>
2023-05-05 06:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sweetscienceoffighting.com/best-boxing-gloves/> (referer: None)
2023-05-05 06:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gearpatrol.com/fitness/g40446087/best-boxing-gloves/> (referer: None)
2023-05-05 06:52:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sweetscienceoffighting.com/best-boxing-gloves/>
2023-05-05 06:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/robots.txt> (referer: None)
2023-05-05 06:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxupnation.com/robots.txt> (referer: None)
2023-05-05 06:52:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.gearpatrol.com/fitness/g40446087/best-boxing-gloves/>
2023-05-05 06:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.tabletenniscoach.me.uk/robots.txt> (referer: None)
2023-05-05 06:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.youtube.com/watch?v=tWoucO2nIlE> (referer: None)
2023-05-05 06:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bleacherreport.com/articles/1286577-breaking-down-different-brands-of-boxing-gloves-worn-by-the-pros> (referer: None)
2023-05-05 06:52:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxingglovesreviews.com/top-ten-boxing-gloves/> (referer: None)
2023-05-05 06:52:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=tWoucO2nIlE>
2023-05-05 06:52:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bleacherreport.com/articles/1286577-breaking-down-different-brands-of-boxing-gloves-worn-by-the-pros>
2023-05-05 06:52:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://boxingglovesreviews.com/top-ten-boxing-gloves/>
2023-05-05 06:52:36 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131> (failed 1 times): 429 Unknown Status
2023-05-05 06:52:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxupnation.com/blogs/news/my-top-5-favorite-boxing-glove-brands-and-why> (referer: None)
2023-05-05 06:52:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://boxupnation.com/blogs/news/my-top-5-favorite-boxing-glove-brands-and-why>
2023-05-05 06:52:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://wayofmartialarts.com/robots.txt> (referer: None)
2023-05-05 06:52:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://thekarateblog.com/best-boxing-gloves/> (referer: None)
2023-05-05 06:52:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://myboxinglife.com/robots.txt> (referer: None)
2023-05-05 06:52:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://thekarateblog.com/best-boxing-gloves/>
2023-05-05 06:52:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.tabletenniscoach.me.uk/sport-equipment-guides/best-boxing-gloves-for-beginners/> (referer: None)
2023-05-05 06:52:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://expertboxing.com/best-boxing-gloves-review> (referer: None)
2023-05-05 06:52:36 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.dickssportinggoods.com/robots.txt> (referer: None)
2023-05-05 06:52:36 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131> (failed 2 times): 429 Unknown Status
2023-05-05 06:52:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.tabletenniscoach.me.uk/sport-equipment-guides/best-boxing-gloves-for-beginners/>
2023-05-05 06:52:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://expertboxing.com/best-boxing-gloves-review>
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.youtube.com/watch?v=rHepbZOCxfY> (referer: None)
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.hayabusafight.com/robots.txt> (referer: None)
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://revgear.com/robots.txt> (referer: None)
2023-05-05 06:52:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=rHepbZOCxfY>
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.dickssportinggoods.com/o/best-boxing-gloves-for-pad-work> (referer: None)
2023-05-05 06:52:37 [scrapy.core.scraper] DEBUG: Scraped from <403 https://www.dickssportinggoods.com/o/best-boxing-gloves-for-pad-work>
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://myboxinglife.com/best-boxing-gloves-for-beginners/> (referer: None)
2023-05-05 06:52:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://myboxinglife.com/best-boxing-gloves-for-beginners/>
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://blog.joinfightcamp.com/robots.txt> (referer: None)
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/robots.txt> (referer: None)
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.walmart.com/robots.txt> (referer: None)
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ringsport.com.au/robots.txt> (referer: None)
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cletoreyesboxing.com/robots.txt> (referer: None)
2023-05-05 06:52:37 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131> (failed 3 times): 429 Unknown Status
2023-05-05 06:52:37 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131> (referer: None) ['partial']
2023-05-05 06:52:38 [scrapy.core.scraper] DEBUG: Scraped from <429 https://www.amazon.com/Best-Sellers-Boxing-Training-Gloves/zgbs/sporting-goods/3400131>
2023-05-05 06:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://made4fighters.com/robots.txt> (referer: None)
2023-05-05 06:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ringsport.com.au/blogs/ringsport-blog/boxing-glove-guide-part-1> (referer: None)
2023-05-05 06:52:38 [seo_spider] ERROR: Invalid control character at: line 5 column 19 (char 78) 200 https://www.ringsport.com.au/blogs/ringsport-blog/boxing-glove-guide-part-1
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 761, in parse
    response.css('script[type="application/ld+json"]::text').getall()]
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 760, in <listcomp>
    ld = [json.loads(s.replace('\r', '')) for s in
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 5 column 19 (char 78)
2023-05-05 06:52:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ringsport.com.au/blogs/ringsport-blog/boxing-glove-guide-part-1>
2023-05-05 06:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://blog.joinfightcamp.com/boxing-equipment/how-to-choose-the-best-boxing-gloves-for-beginners/> (referer: None)
2023-05-05 06:52:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://blog.joinfightcamp.com/boxing-equipment/how-to-choose-the-best-boxing-gloves-for-beginners/>
2023-05-05 06:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/t/Boxing-Gloves/30102/bn_1943751> (referer: None)
2023-05-05 06:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://wayofmartialarts.com/best-boxing-gloves-worth-your-money/> (referer: None)
2023-05-05 06:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.msmfightshop.com/robots.txt> (referer: None)
2023-05-05 06:52:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.ebay.com/t/Boxing-Gloves/30102/bn_1943751>
2023-05-05 06:52:38 [scrapy.core.scraper] DEBUG: Scraped from <200 https://wayofmartialarts.com/best-boxing-gloves-worth-your-money/>
2023-05-05 06:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://made4fighters.com/blogs/default-blog/top-womens-boxing-gloves> (referer: None)
2023-05-05 06:52:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.quora.com/robots.txt> (referer: None)
2023-05-05 06:52:38 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.quora.com/What-companies-make-the-best-quality-boxing-gloves>
2023-05-05 06:52:39 [seo_spider] ERROR: Invalid control character at: line 20 column 226 (char 698) 200 https://made4fighters.com/blogs/default-blog/top-womens-boxing-gloves
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 761, in parse
    response.css('script[type="application/ld+json"]::text').getall()]
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 760, in <listcomp>
    ld = [json.loads(s.replace('\r', '')) for s in
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 20 column 226 (char 698)
2023-05-05 06:52:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://made4fighters.com/blogs/default-blog/top-womens-boxing-gloves>
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.hayabusafight.com/products/t3-boxing-gloves> (referer: None)
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.msmfightshop.com/blogs/news/top-3-boxing-gloves-in-the-world> (referer: None)
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.everlast.com/robots.txt> (referer: None)
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cletoreyesboxing.com/> (referer: None)
2023-05-05 06:52:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.hayabusafight.com/products/t3-boxing-gloves>
2023-05-05 06:52:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.msmfightshop.com/blogs/news/top-3-boxing-gloves-in-the-world>
2023-05-05 06:52:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cletoreyesboxing.com/>
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://revgear.com/gear/boxing-gloves/> (referer: None)
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://m.timesofindia.com/robots.txt> (referer: None)
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.walmart.com/c/lists/top-rated-boxing-gloves> (referer: None)
2023-05-05 06:52:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://revgear.com/gear/boxing-gloves/>
2023-05-05 06:52:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-beginners-best-picks/articleshow/97912567.cms?from=mdr> from <GET https://m.timesofindia.com/most-searched-products/sports-equipment/boxing-gloves-for-beginners-best-picks/articleshow/97912567.cms>
2023-05-05 06:52:39 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.walmart.com/c/lists/top-rated-boxing-gloves>
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.titleboxing.com/robots.txt> (referer: None)
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bravose.com/robots.txt> (referer: None)
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sanabulsports.com/robots.txt> (referer: None)
2023-05-05 06:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://timesofindia.indiatimes.com/robots.txt> (referer: None)
2023-05-05 06:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://anthonyjoshua.com/robots.txt> (referer: None)
2023-05-05 06:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sanabulsports.com/blogs/news/the-best-boxing-gloves-for-training> (referer: None)
2023-05-05 06:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.everlast.com/fight/boxing/gloves> (referer: None)
2023-05-05 06:52:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://sanabulsports.com/blogs/news/the-best-boxing-gloves-for-training>
2023-05-05 06:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nakmuaywholesale.com/robots.txt> (referer: None)
2023-05-05 06:52:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.everlast.com/fight/boxing/gloves>
2023-05-05 06:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://mmagearaddict.com/robots.txt> (referer: None)
2023-05-05 06:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://anthonyjoshua.com/blogs/news/anthony-joshua-how-to-choose-the-best-boxing-gloves> (referer: None)
2023-05-05 06:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bravose.com/collections/training-gloves> (referer: None)
2023-05-05 06:52:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://anthonyjoshua.com/blogs/news/anthony-joshua-how-to-choose-the-best-boxing-gloves>
2023-05-05 06:52:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bravose.com/collections/training-gloves>
2023-05-05 06:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://issuu.com/robots.txt> (referer: None)
2023-05-05 06:52:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://tufwear-germany.de/robots.txt> (referer: None)
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.titleboxing.com/gloves> (referer: None)
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-professionals/articleshow/97128538.cms> (referer: None)
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://yokkao.com/robots.txt> (referer: None)
2023-05-05 06:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.titleboxing.com/gloves>
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://tufwear-germany.de/en/blogs/news/was-sind-die-besten-boxhandschuhe-der-boxhandschuh-guide-fur-deinen-kauf> (referer: None)
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://mmagearaddict.com/best-boxing-gloves/> (referer: None)
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://topboxer.com/robots.txt> (referer: None)
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nakmuaywholesale.com/top-3-boxing-gloves-for-small-hands-2022/> (referer: None)
2023-05-05 06:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-professionals/articleshow/97128538.cms>
2023-05-05 06:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://tufwear-germany.de/en/blogs/news/was-sind-die-besten-boxhandschuhe-der-boxhandschuh-guide-fur-deinen-kauf>
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://issuu.com/punchequipment/docs/get_the_best_boxing_gloves_for_a_winning_performan> (referer: None)
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://nypost.com/robots.txt> (referer: None)
2023-05-05 06:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://mmagearaddict.com/best-boxing-gloves/>
2023-05-05 06:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nakmuaywholesale.com/top-3-boxing-gloves-for-small-hands-2022/>
2023-05-05 06:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://issuu.com/punchequipment/docs/get_the_best_boxing_gloves_for_a_winning_performan>
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-beginners-best-picks/articleshow/97912567.cms?from=mdr> (referer: None)
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://warriorpunch.com/robots.txt> (referer: None)
2023-05-05 06:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://timesofindia.indiatimes.com/most-searched-products/sports-equipment/boxing-gloves-for-beginners-best-picks/articleshow/97912567.cms?from=mdr>
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://yokkao.com/pages/boxing-gloves-guide> (referer: None)
2023-05-05 06:52:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://topboxer.com/collections/boxing-gloves> (referer: None)
2023-05-05 06:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://yokkao.com/pages/boxing-gloves-guide>
2023-05-05 06:52:41 [seo_spider] ERROR: Invalid control character at: line 15 column 21 (char 385) 200 https://topboxer.com/collections/boxing-gloves
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 761, in parse
    response.css('script[type="application/ld+json"]::text').getall()]
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 760, in <listcomp>
    ld = [json.loads(s.replace('\r', '')) for s in
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 15 column 21 (char 385)
2023-05-05 06:52:41 [scrapy.core.scraper] DEBUG: Scraped from <200 https://topboxer.com/collections/boxing-gloves>
2023-05-05 06:52:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://nypost.com/article/best-boxing-equipment-per-experts/> (referer: None)
2023-05-05 06:52:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://kdvr.com/robots.txt> (referer: None)
2023-05-05 06:52:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cashkaro.com/robots.txt> (referer: None)
2023-05-05 06:52:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://nypost.com/article/best-boxing-equipment-per-experts/>
2023-05-05 06:52:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://origympersonaltrainercourses.co.uk/robots.txt> (referer: None)
2023-05-05 06:52:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.popsugar.com/robots.txt> (referer: None)
2023-05-05 06:52:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.expertreviews.co.uk/robots.txt> (referer: None)
2023-05-05 06:52:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cashkaro.com/blog/best-boxing-gloves-in-india/201246> (referer: None)
2023-05-05 06:52:42 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cashkaro.com/blog/best-boxing-gloves-in-india/201246>
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://warriorpunch.com/best-boxing-gloves-for-beginners/> (referer: None)
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.popsugar.com/fitness/Best-Boxing-Gloves-Women-45472473> (referer: None)
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://kdvr.com/reviews/br/sports-fitness-br/boxing-br/best-title-boxing-gloves/> (referer: None)
2023-05-05 06:52:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://warriorpunch.com/best-boxing-gloves-for-beginners/>
2023-05-05 06:52:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.popsugar.com/fitness/Best-Boxing-Gloves-Women-45472473>
2023-05-05 06:52:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://kdvr.com/reviews/br/sports-fitness-br/boxing-br/best-title-boxing-gloves/>
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://branded.disruptsports.com/robots.txt> (referer: None)
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.expertreviews.co.uk/health-and-grooming/1407584/best-boxing-gloves> (referer: None)
2023-05-05 06:52:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.expertreviews.co.uk/health-and-grooming/1407584/best-boxing-gloves>
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://branded.disruptsports.com/blogs/blog/which-boxing-gloves-to-buy-for-beginners> (referer: None)
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.reddit.com/robots.txt> (referer: None)
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://fightquality.com/robots.txt> (referer: None)
2023-05-05 06:52:43 [scrapy.core.scraper] DEBUG: Scraped from <200 https://branded.disruptsports.com/blogs/blog/which-boxing-gloves-to-buy-for-beginners>
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flipkart.com/robots.txt> (referer: None)
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://origympersonaltrainercourses.co.uk/blog/best-boxing-gloves> (referer: None)
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.infinitudefight.com/robots.txt> (referer: None)
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 10 without any user agent to enforce it on.
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 14 without any user agent to enforce it on.
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 16 without any user agent to enforce it on.
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 35 without any user agent to enforce it on.
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 42 without any user agent to enforce it on.
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 43 without any user agent to enforce it on.
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 44 without any user agent to enforce it on.
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 45 without any user agent to enforce it on.
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 46 without any user agent to enforce it on.
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 47 without any user agent to enforce it on.
2023-05-05 06:52:43 [protego] DEBUG: Rule at line 69 without any user agent to enforce it on.
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://absolutelymartialarts.com/robots.txt> (referer: None)
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.k2promos.com/robots.txt> (referer: None)
2023-05-05 06:52:43 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.infinitudefight.com/buy-the-best-boxing-gloves/> (referer: None)
2023-05-05 06:52:44 [seo_spider] ERROR: Expecting value: line 1 column 1 (char 0) 200 https://origympersonaltrainercourses.co.uk/blog/best-boxing-gloves
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 761, in parse
    response.css('script[type="application/ld+json"]::text').getall()]
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 760, in <listcomp>
    ld = [json.loads(s.replace('\r', '')) for s in
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2023-05-05 06:52:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://origympersonaltrainercourses.co.uk/blog/best-boxing-gloves>
2023-05-05 06:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://fightingadvice.com/robots.txt> (referer: None)
2023-05-05 06:52:44 [scrapy.core.scraper] DEBUG: Scraped from <403 https://www.infinitudefight.com/buy-the-best-boxing-gloves/>
2023-05-05 06:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.proboxingequipment.com/robots.txt> (referer: None)
2023-05-05 06:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.proboxingequipment.com/Boxing-Gloves_c_196.html> (referer: None)
2023-05-05 06:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://glovesaddict.com/robots.txt> (referer: None)
2023-05-05 06:52:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.proboxingequipment.com/Boxing-Gloves_c_196.html>
2023-05-05 06:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://absolutelymartialarts.com/best-boxing-gloves-beginners/> (referer: None)
2023-05-05 06:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.reddit.com/r/amateur_boxing/comments/2ykhau/the_top_15_best_boxing_gloves_ranking_the_best/> (referer: None)
2023-05-05 06:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.healthyprinciples.co.uk/robots.txt> (referer: None)
2023-05-05 06:52:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://absolutelymartialarts.com/best-boxing-gloves-beginners/>
2023-05-05 06:52:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.reddit.com/r/amateur_boxing/comments/2ykhau/the_top_15_best_boxing_gloves_ranking_the_best/>
2023-05-05 06:52:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.mmahive.com/robots.txt> (referer: None)
2023-05-05 06:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bwsgym.com/robots.txt> (referer: None)
2023-05-05 06:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://fightquality.com/2018/10/12/best-custom-gloves/> (referer: None)
2023-05-05 06:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://fightingadvice.com/best-boxing-gloves-under-200/> (referer: None)
2023-05-05 06:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.k2promos.com/best-beginner-boxing-gloves/> (referer: None)
2023-05-05 06:52:45 [scrapy.core.scraper] DEBUG: Scraped from <200 https://fightquality.com/2018/10/12/best-custom-gloves/>
2023-05-05 06:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.flipkart.com/sports/boxing/boxing-gloves/pr?sid=abc%2Cppq%2Cbb6&page=2> (referer: None)
2023-05-05 06:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.dontwasteyourmoney.com/robots.txt> (referer: None)
2023-05-05 06:52:45 [scrapy.core.scraper] DEBUG: Scraped from <200 https://fightingadvice.com/best-boxing-gloves-under-200/>
2023-05-05 06:52:45 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.k2promos.com/best-beginner-boxing-gloves/>
2023-05-05 06:52:45 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.flipkart.com/sports/boxing/boxing-gloves/pr?sid=abc%2Cppq%2Cbb6&page=2>
2023-05-05 06:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bwsgym.com/etiquette-produit/best-boxing-gloves/> (referer: None)
2023-05-05 06:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://middleeasy.com/robots.txt> (referer: None)
2023-05-05 06:52:45 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bwsgym.com/etiquette-produit/best-boxing-gloves/>
2023-05-05 06:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.healthyprinciples.co.uk/best-boxing-gloves-for-kids-review/> (referer: None)
2023-05-05 06:52:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.bestproducts.com/robots.txt> (referer: None)
2023-05-05 06:52:46 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.healthyprinciples.co.uk/best-boxing-gloves-for-kids-review/>
2023-05-05 06:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.mmahive.com/best-boxing-gloves-for-wrist-support/> (referer: None)
2023-05-05 06:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.momjunction.com/robots.txt> (referer: None)
2023-05-05 06:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.dontwasteyourmoney.com/products/hawk-sports-heavy-bag-boxing-gloves/> (referer: None)
2023-05-05 06:52:46 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mmahive.com/best-boxing-gloves-for-wrist-support/>
2023-05-05 06:52:46 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.dontwasteyourmoney.com/products/hawk-sports-heavy-bag-boxing-gloves/>
2023-05-05 06:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://glovesaddict.com/best-boxing-gloves-on-amazon/> (referer: None)
2023-05-05 06:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://middleeasy.com/reviews/gear/gloves-cardio-kickboxing/> (referer: None)
2023-05-05 06:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://breakinggrips.com/robots.txt> (referer: None)
2023-05-05 06:52:46 [scrapy.core.scraper] DEBUG: Scraped from <200 https://glovesaddict.com/best-boxing-gloves-on-amazon/>
2023-05-05 06:52:46 [scrapy.core.scraper] DEBUG: Scraped from <200 https://middleeasy.com/reviews/gear/gloves-cardio-kickboxing/>
2023-05-05 06:52:46 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.fightingking.com/robots.txt> (failed 1 times): 429 Unknown Status
2023-05-05 06:52:46 [py.warnings] WARNING: /home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/scrapy/core/engine.py:276: ScrapyDeprecationWarning: Passing a 'spider' argument to ExecutionEngine.download is deprecated
  return self.download(result, spider) if isinstance(result, Request) else result

2023-05-05 06:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.momjunction.com/articles/best-boxing-gloves-for-kids_00514921/> (referer: None)
2023-05-05 06:52:46 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.momjunction.com/articles/best-boxing-gloves-for-kids_00514921/>
2023-05-05 06:52:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.bestproducts.com/fitness/equipment/g1009/boxing-gloves-mitts/> (referer: None)
2023-05-05 06:52:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.fightingking.com/robots.txt> (failed 2 times): 429 Unknown Status
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://breakinggrips.com/best-kids-boxing-gloves/> (referer: None)
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.mightyfighter.com/robots.txt> (referer: None)
2023-05-05 06:52:47 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.bestproducts.com/fitness/equipment/g1009/boxing-gloves-mitts/>
2023-05-05 06:52:47 [scrapy.core.scraper] DEBUG: Scraped from <200 https://breakinggrips.com/best-kids-boxing-gloves/>
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.stylecraze.com/robots.txt> (referer: None)
2023-05-05 06:52:47 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.fightingking.com/robots.txt> (failed 3 times): 429 Unknown Status
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://www.fightingking.com/robots.txt> (referer: None)
2023-05-05 06:52:47 [protego] DEBUG: Rule at line 2 without any user agent to enforce it on.
2023-05-05 06:52:47 [protego] DEBUG: Rule at line 6 without any user agent to enforce it on.
2023-05-05 06:52:47 [protego] DEBUG: Rule at line 10 without any user agent to enforce it on.
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://linealboxing.com/robots.txt> (referer: None)
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.wbcme.co.uk/robots.txt> (referer: None)
2023-05-05 06:52:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.fightingking.com/boxing-gloves-brands-reviews/> (failed 1 times): 429 Unknown Status
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://blackbeltmag.com/robots.txt> (referer: None)
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.mightyfighter.com/top-10-best-boxing-gloves/> (referer: None)
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://smartmma.com/robots.txt> (referer: None)
2023-05-05 06:52:47 [protego] DEBUG: Rule at line 1 without any user agent to enforce it on.
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://linealboxing.com/best-boxing-glove-brands-2022/> (referer: None)
2023-05-05 06:52:47 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mightyfighter.com/top-10-best-boxing-gloves/>
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.stylecraze.com/articles/best-heavy-bag-gloves/> (referer: None)
2023-05-05 06:52:47 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.fightingking.com/boxing-gloves-brands-reviews/> (failed 2 times): 429 Unknown Status
2023-05-05 06:52:47 [scrapy.core.scraper] DEBUG: Scraped from <200 https://linealboxing.com/best-boxing-glove-brands-2022/>
2023-05-05 06:52:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.wbcme.co.uk/ringside/best-boxing-gloves-for-beginners/> (referer: None)
2023-05-05 06:52:48 [seo_spider] ERROR: Invalid control character at: line 28 column 64 (char 1740) 200 https://www.stylecraze.com/articles/best-heavy-bag-gloves/
Traceback (most recent call last):
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 761, in parse
    response.css('script[type="application/ld+json"]::text').getall()]
  File "/home/irfan/.pyenv/versions/TES/lib/python3.7/site-packages/advertools/spider.py", line 760, in <listcomp>
    ld = [json.loads(s.replace('\r', '')) for s in
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/irfan/.pyenv/versions/3.7.9/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 28 column 64 (char 1740)
2023-05-05 06:52:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.stylecraze.com/articles/best-heavy-bag-gloves/>
2023-05-05 06:52:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.kreedon.com/robots.txt> (referer: None)
2023-05-05 06:52:48 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.kreedon.com/best-boxing-gloves-brands/>
2023-05-05 06:52:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.wbcme.co.uk/ringside/best-boxing-gloves-for-beginners/>
2023-05-05 06:52:48 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.fightingking.com/boxing-gloves-brands-reviews/> (failed 3 times): 429 Unknown Status
2023-05-05 06:52:48 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://www.fightingking.com/boxing-gloves-brands-reviews/> (referer: None)
2023-05-05 06:52:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.attacktheback.com/robots.txt> (referer: None)
2023-05-05 06:52:48 [scrapy.core.scraper] DEBUG: Scraped from <429 https://www.fightingking.com/boxing-gloves-brands-reviews/>
2023-05-05 06:52:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.boxingear.com/robots.txt> (referer: None)
2023-05-05 06:52:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://blackbeltmag.com/best-boxing-gloves> (referer: None)
2023-05-05 06:52:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://blackbeltmag.com/best-boxing-gloves>
2023-05-05 06:52:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cletoreyesuk.com/robots.txt> (referer: None)
2023-05-05 06:52:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.attacktheback.com/best-cheap-boxing-gloves/> (referer: None)
2023-05-05 06:52:48 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.attacktheback.com/best-cheap-boxing-gloves/>
2023-05-05 06:52:48 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://sites.google.com/view> from <GET https://www.boxingear.com/shop-2/grant-gloves/lace-up/best-boxing-gloves-for-sparring-grant-gloves/>
2023-05-05 06:52:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fullcontactway.com/robots.txt> (referer: None)
2023-05-05 06:52:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cletoreyesuk.com/blogs/news/what-are-the-best-boxing-gloves-for-beginners> (referer: None)
2023-05-05 06:52:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fitnessbaddies.com/robots.txt> (referer: None)
2023-05-05 06:52:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bestreviews.com/robots.txt> (referer: None)
2023-05-05 06:52:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.boxingison.com/robots.txt> (referer: None)
2023-05-05 06:52:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cletoreyesuk.com/blogs/news/what-are-the-best-boxing-gloves-for-beginners>
2023-05-05 06:52:49 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://thewiredshopper.com/robots.txt> (referer: None)
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 28 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 37 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 38 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 39 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 40 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 41 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 42 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 43 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 44 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 45 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 46 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 47 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 48 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 49 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 50 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 51 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 52 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 53 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 54 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 55 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 56 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 57 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 58 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 59 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 60 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 61 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 67 without any user agent to enforce it on.
2023-05-05 06:52:49 [protego] DEBUG: Rule at line 72 without any user agent to enforce it on.
2023-05-05 06:52:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.msn.com/robots.txt> (referer: None)
2023-05-05 06:52:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fullcontactway.com/best-sparring-gloves/> (referer: None)
2023-05-05 06:52:49 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://thewiredshopper.com/best-boxing-gloves-to-buy/> (referer: None)
2023-05-05 06:52:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fullcontactway.com/best-sparring-gloves/>
2023-05-05 06:52:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://smartmma.com/best-boxing-gloves-for-heavy-bag/> (referer: None)
2023-05-05 06:52:49 [scrapy.core.scraper] DEBUG: Scraped from <403 https://thewiredshopper.com/best-boxing-gloves-to-buy/>
2023-05-05 06:52:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://smartmma.com/best-boxing-gloves-for-heavy-bag/>
2023-05-05 06:52:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.msn.com/en-gb/lifestyle/rf-best-products-uk/best-boxing-gloves-for-men-12oz-reviews> (referer: None)
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://bestreviews.com/sports-fitness/boxing/best-boxing-gloves> (referer: None)
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gloveworx.com/robots.txt> (referer: None)
2023-05-05 06:52:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.msn.com/en-gb/lifestyle/rf-best-products-uk/best-boxing-gloves-for-men-12oz-reviews>
2023-05-05 06:52:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bestreviews.com/sports-fitness/boxing/best-boxing-gloves>
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.fitnessbaddies.com/amateur-boxing-gloves/> (referer: None)
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.standard.co.uk/robots.txt> (referer: None)
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sites.google.com/robots.txt> (referer: None)
2023-05-05 06:52:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.fitnessbaddies.com/amateur-boxing-gloves/>
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pragmaticmom.com/robots.txt> (referer: None)
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lowkickmma.com/robots.txt> (referer: None)
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.standard.co.uk/shopping/esbest/health-fitness/fitness-wear/best-womens-boxing-gloves-for-beginners-a4272321.html> (referer: None)
2023-05-05 06:52:50 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.standard.co.uk/shopping/esbest/health-fitness/fitness-wear/best-womens-boxing-gloves-for-beginners-a4272321.html>
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxingready.com/robots.txt> (referer: None)
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.sportsdirect.com/robots.txt> (referer: None)
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lowkickmma.com/best-boxing-gloves/> (referer: None)
2023-05-05 06:52:50 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://sites.google.com/view> (referer: None)
2023-05-05 06:52:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowkickmma.com/best-boxing-gloves/>
2023-05-05 06:52:51 [scrapy.core.scraper] DEBUG: Scraped from <404 https://sites.google.com/view>
2023-05-05 06:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://themmaguru.com/robots.txt> (referer: None)
2023-05-05 06:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.dmarge.com/robots.txt> (referer: None)
2023-05-05 06:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pragmaticmom.com/2019/11/best-boxing-gloves-for-women/> (referer: None)
2023-05-05 06:52:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.pragmaticmom.com/2019/11/best-boxing-gloves-for-women/>
2023-05-05 06:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.dmarge.com/best-boxing-gloves> (referer: None)
2023-05-05 06:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.boxingison.com/best-boxing-gloves-for-training-and-sparring/> (referer: None)
2023-05-05 06:52:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.dmarge.com/best-boxing-gloves>
2023-05-05 06:52:51 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.sportsdirect.com/boxing/boxing-gloves> (referer: None)
2023-05-05 06:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gloveworx.com/blog/how-choose-best-boxing-gloves-beginners/> (referer: None)
2023-05-05 06:52:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.boxingison.com/best-boxing-gloves-for-training-and-sparring/>
2023-05-05 06:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://thechamplair.com/robots.txt> (referer: None)
2023-05-05 06:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://brawlbros.com/robots.txt> (referer: None)
2023-05-05 06:52:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://themmaguru.com/best-youth-boxing-gloves/> (referer: None)
2023-05-05 06:52:51 [scrapy.core.scraper] DEBUG: Scraped from <403 https://www.sportsdirect.com/boxing/boxing-gloves>
2023-05-05 06:52:51 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.gloveworx.com/blog/how-choose-best-boxing-gloves-beginners/>
2023-05-05 06:52:52 [scrapy.core.scraper] DEBUG: Scraped from <200 https://themmaguru.com/best-youth-boxing-gloves/>
2023-05-05 06:52:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nytimes.com/robots.txt> (referer: None)
2023-05-05 06:52:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gearhungry.com/robots.txt> (referer: None)
2023-05-05 06:52:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.hungry4fitness.co.uk/robots.txt> (referer: None)
2023-05-05 06:52:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://findbestboxinggloves.com/robots.txt> (referer: None)
2023-05-05 06:52:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://hiconsumption.com/robots.txt> (referer: None)
2023-05-05 06:52:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://thechamplair.com/sports/best-beginners-boxing-gloves/> (referer: None)
2023-05-05 06:52:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://thechamplair.com/sports/best-beginners-boxing-gloves/>
2023-05-05 06:52:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://brawlbros.com/best-boxing-gloves-on-amazon/> (referer: None)
2023-05-05 06:52:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://brawlbros.com/best-boxing-gloves-on-amazon/>
2023-05-05 06:52:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://hiconsumption.com/best-boxing-gloves/> (referer: None)
2023-05-05 06:52:53 [scrapy.core.scraper] DEBUG: Scraped from <200 https://hiconsumption.com/best-boxing-gloves/>
2023-05-05 06:52:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.hungry4fitness.co.uk/post/10-best-boxing-mitts-an-ultimate-guide> (referer: None)
2023-05-05 06:52:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.gearhungry.com/best-boxing-gloves/> (referer: None)
2023-05-05 06:52:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.hungry4fitness.co.uk/post/10-best-boxing-mitts-an-ultimate-guide>
2023-05-05 06:52:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.gearhungry.com/best-boxing-gloves/>
2023-05-05 06:52:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://boxingready.com/ringside/best-boxing-gloves-wrist-support/> (referer: None)
2023-05-05 06:52:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://boxingready.com/ringside/best-boxing-gloves-wrist-support/>
2023-05-05 06:52:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nytimes.com/video/style/1194840632119/gear-test-boxing-gloves.html> (referer: None)
2023-05-05 06:52:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nytimes.com/video/style/1194840632119/gear-test-boxing-gloves.html>
2023-05-05 06:52:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://findbestboxinggloves.com/best-boxing-gloves-for-heavy-bag-the-complete-guide/> (referer: None)
2023-05-05 06:52:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://findbestboxinggloves.com/best-boxing-gloves-for-heavy-bag-the-complete-guide/>
2023-05-05 06:53:33 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 196 pages/min), scraped 97 items (at 97 items/min)
2023-05-05 06:54:33 [scrapy.extensions.logstats] INFO: Crawled 196 pages (at 0 pages/min), scraped 97 items (at 0 items/min)
2023-05-05 06:54:49 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://skilspo.com/robots.txt> (failed 1 times): TCP connection timed out: 110: Connection timed out.

File not found on crawl method

  Hi all,

I'm following the documentation with this line of code

adv.crawl('https://example.com', 'my_output_file.jl', follow_links=True)

But it returns this error:

FileNotFoundError: [WinError 2] The system cannot find the file specified

Even though my directory looks like this:

- SEO.py
- my_output_file.jl

Here is the complete trace:

Traceback (most recent call last):
  File "c:/Users/Henrique/Desktop/SEO/SEO.py", line 6, in <module>
    adv.crawl('https://example.com', 'my_output_file.jl', follow_links=True)
  File "C:\Users\Henrique\AppData\Roaming\Python\Python38\site-packages\advertools\spider.py", line 971, in crawl
    subprocess.run(command)
  File "C:\Python38\lib\subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Python38\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python38\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

As you can see it doesn't specify which file was not found but I assume it is the output file.

Any help is greatly appreciated!

Originally posted by @henriquearaujo-98 in #247

Python 3.10/11 SSL: SSLV3_ALERT_HANDSHAKE_FAILURE

Not fatal, but just an issue note:

Seems there is a issue with 3.10/3.11
python/cpython#103142

Mac Intel

and containers using
FROM python:3.11-slim
FROM python:3.10-slim

This url:

https://opentopography.org/sitemap.xml

gets redirected to:

https://portal.opentopography.org/sitemap.xml

If i just use https://portal.opentopography.org/sitemap.xml it works fine.

File "/Users//development/dev_earthcube/earthcube_utilities/venv311/lib/python3.11/site-packages/advertools/sitemaps.py", line 491, in sitemap_to_df
    xml_text = urlopen(Request(sitemap_url, headers=headers))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1002)>

serp_goog next page

hi sir,
I am having problems with the start parameter on serp_goog. I want to query from the first page to the 4th.
Please advise.
thank you
packages version:
pandas
In [5]: print(pd.version)
0.25.0
advertools
In [6]: print(adv.version)
0.7.3
----> 4 next_result_1=adv.serp_goog(cx=cx, key=key, q=queri, gl=['id'],start=[1, 11, 21])
KeyError: "['start'] not in index"

C:\Anaconda3\lib\site-packages\advertools\serp.py in serp_goog(q, cx, key, c2coff, cr, dateRestrict, exactTerms, excludeTerms, fileType, filter, gl, highRange, hl, hq, imgColorType, imgDominantColor, imgSize, imgType, linkSite, lowRange, lr, num, orTerms, relatedSite, rights, safe, searchType, siteSearch, siteSearchFilter, sort, start)
700 specified_cols)
701 non_ordered = result_df.columns.difference(set(ordered_cols))
--> 702 final_df = result_df[ordered_cols + list(non_ordered)]
703 return final_df
704

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
2979 if is_iterator(key):
2980 key = list(key)
-> 2981 indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
2982
2983 # take() does not accept boolean indexers

C:\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter, raise_missing)
1269 # When setting, missing keys are not allowed, even with .loc:
1270 kwargs = {"raise_missing": True if is_setter else raise_missing}
-> 1271 return self._get_listlike_indexer(obj, axis, **kwargs)[1]
1272 else:
1273 try:

C:\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1076
1077 self._validate_read_indexer(
-> 1078 keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
1079 )
1080 return keyarr, indexer

C:\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1169 if not (self.name == "loc" and not raise_missing):
1170 not_found = list(set(key) - set(ax))
-> 1171 raise KeyError("{} not in index".format(not_found))
1172
1173 # we skip the warning on Categorical/Interval

KeyError: "['start'] not in index"

Advertools in Ubuntu in a Venv (Python 3.10.12 and Python 3.9.18)

Hi Everyone,

I am trying to run Advertools in a Python Venv and Ubuntu.

I tried with the standard Python that comes with this Ubuntu version (3.10.12) and I also tried to install Python 3.9.18 as I saw someone posting an issue similar and suggesting to use this version but the issue is the same.

This is what I did:

mkdir /home/abc/advertools/
cd /home/abc/advertools/
python3 -m venv .
source bin/activate
python3 -m pip install advertools

When I try to type:
adv
or
advertools --version

I get

Illegal instruction (core dumped)

Do you guys have some suggestions?

For Python9 I tried:
add-apt-repository ppa:deadsnakes/ppa
apt install python3.9
apt install python3.9-venv
python3.9 -m venv .
source bin/activate
python3 -m pip install advertools

same error: Illegal instruction (core dumped)

Thank you very much

request_url_df creates wide list?

After application of adv.url_to_df(logs_df['request']) on my dataset the dataframe explodes to more than 120 columns with names like:

'query_template',
'query_archive',
'query_key',
'query_per',
'query_x',
"query_[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0]['[0][

Applied on referer produces another 40 columns. Is this behavior intended?

How to get initial url in output.jl ?

Hello Elias,

Advertools is a really great package ! Many thanks for the splendid work.

Nevertheless I have a little problem for which I have not found a good workaround (except crawling urls one by one)

I was wondering how to get the initial url that is crawled.
Advertools returns the url after redirection and not the url before. So when you want to merge data it can becone tricky if you have no "reference".

Can we also imagine to "inject" specific user params as string to get them in the output ? I've tried to do so in the xpath_selectors, but completely failed.

Many thanks,

Caro

No of threads for Crawling

Is there a way to specify the no of threads for crawling? Currently it uses all the threads on the system and causes issues to refresh the page.

Bulk robots.txt Tester Documentation

Hi,

Thanks so much for providing such a valuable Python package to marketing researchers.

I tried running the robotstxt_test() function as described in documetation, but the example does not work correctly. I propose the following solution (3rd example below) and a change of the documetation:

# 1. current example includes wrong syntax
robotstxt_test('https://www.example.com/robots.txt',
               useragents=['Googlebot', 'baiduspider', 'Bingbot']
               urls=['/', '/hello', '/some-page.html']])

# 2. example includes right syntax but does not return meaningful information (i.e., returns HTTP Error 404)
adv.robotstxt_test('https://www.example.com/robots.txt',
               user_agents=['Googlebot', 'baiduspider', 'Bingbot'],
               urls=['/', '/hello', '/some-page.html'])

# 3. example includes right syntax and returns meaningful information
adv.robotstxt_test('https://www.amazon.com/robots.txt',
               user_agents=['Googlebot', 'baiduspider', 'Bingbot'],
               urls=['/', '/hello', '/some-page.html'])

jsonld_sameAs data is mixed with str and list.

Hi. I'm working on a streamlit app and I'm having an issue with scrapped data specifically jsonld_sameAs column. It has mixed datatype and streamlit throws error when I try to show the table.

Here is my code

adv.crawl(urls, 'pages.jl', follow_links=False)
crawl_df = pd.read_json('pages.jl', lines=True)
st.dataframe(crawl_df)

Here is the error.

{StreamlitAPIException}('cannot mix list and non-list, non-null values', 'Conversion failed for column jsonld_sameAs with type object')

When I checked the datatype of the column it is mixed with str and list.

issue

It should be list not str to avoid issues.

Questions about custom spider

Is there a way to just extract the information I want, by default it extracts too much, if the web page is large, the json line file will be very large.
for example, I just want to extract just the title.
Do you plan to add feature to create sitemap for this app. I have a lot of big websites that need to create a sitemap

opening .jl file command doesn't show 'my_output_file.jl'

My apologies in advance, I am still a python padawan. I swept through the documentation but couldn't find an answer.

I am assuming that after i run;

import advertools as adv
adv.crawl('https://example.com', 'my_output_file.jl', follow_links=True)

and then

import pandas as pd
crawl_df = pd.read_json('my_output_file.jl', lines=True)

I am supposed to see the data from the scrape? (I did put in a real URL, and the output file is a 1.6MB .jl file was created. But when I run that cell, nothing happens. No errors but no data either. I am testing this in a jupyter notebook... all requirements are installed etc.

Also, if I may, as an SEO practitioner, how do I output these results into a CSV file that can viewed in excel or google sheets for further analysis? If you're willing, can you provide an example command as to how to convert the .jl file to .csv? I tried to install json-lines but apparently it's no longer supported. I assume what I need is to import csv but as far as how to structure it to encapsulate the data with column headers etc is a bit intimidating.

Thanks

logs_to_df() Limitation

1) Missing domain_entry

Weblog files may contain the domain name (eg in case a system hosts several webserver) as 10th column. This domain name may be missing, which is the expectation of method "logs_to_df()

But sometimes the field appears in a weblog file, as an entry or an empty entry (eg '-' or '"-"'). logs_to_df() cannot handle this extra field and ignores these entries.

2) Failure on escaped quotes (\")

In weblogs quotes mark fields. Sometimes quotes are part of a field string and escaped by "\". logs_to_df() does not catch escaped character \" and ignores these entries.

Bypass a cookie wall

Hello Elias,
I had already posted the topic some time ago on #328, but I don't think you had seen it.

Thank you for the fantastic work you're doing with advertools.

However, I have an issue with websites that have a cookie wall, like on https://www.interflora.fr/p/roses-passion.

When I do
scrapy shell view(response)
I can clearly see that I am blocked.
image
There is absolutely no element like the title, the button or body_text

So, I was wondering if you might have a fantastic idea to work around this issue.

Thanks a million !

Stop words list very limited

For the function word_frequency mainly, as the default value for rm_words param.
Get the full list of stopwords for several languages, to be imported and potentially used in function calls, or separately, e.g.:

import advertools as adv
adv.stop_words['en'] 
adv.stop_words['fr']

Feature Request - Alternative Crawl Output

Is it possible to add functionality so we don't have to write to disc before being able ot analyze the results?

Directly to a df or some other python object would be great!

Need some way to rate limit requests for sitemap_to_df

When trying to retrieve a large recursive sitemap, I am getting http error 429, too many requests. Currently it seems like there is no way to limit the number of requests it makes, specify a cooldown period or limit the speed of requests. So nothing is ever retrieved with that function.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.