GithubHelp home page GithubHelp logo

somjang / instagram_crawler Goto Github PK

View Code? Open in Web Editor NEW
26.0 3.0 22.0 473 KB

인스타그램 크롤러 (Python, Selenium)

License: MIT License

Jupyter Notebook 51.35% Python 48.65%
selenium instagram crawler python somjang chromedriver

instagram_crawler's Introduction

Instagram_Crawler

Extract Data From Instagram Using Selenium/Python.

Detail InfoDescriptionInstall LibrariesGet StartedArchitectureStackContribute

logo_image

Description

Instagram Crawler is a python module for crawling Instagram data.

⚠️ If you access more than a certain number of posts on Instagram, the posts are no longer loaded. Therefore, about 100 to 300 posts can be crawled.

Install

Simply run :

pip install -r requirements.txt

You can also install additional dependencies (for running examples, generating documentation, etc...) with : ⚠️ Python ≥ 3.6 required

Get Started

The full documentation contains more detailed tutorials, but to get a taste of the framework, you can take a look at the examples folder.
Let's look at the easy example, bart_easy.py. You can run the example with following command :

$ python3 main.py --id=[user_id] \
  --password=['user_password']\
  --hash_tag=[hash_tag] \
  --display=[0 or 1] \
  --extract_num=[extract_num: int] \
  --login_option=[instagram or facebook] \
  --extract_file=[file name] \
  --extract_tag_file=[tag file name] \
  --driver_path=[chromedriver path]
# -*- coding:utf-8 -*-

import argparse
from instagram_crawler.metadata import EXTRACT_NUM, LOGIN_OPTION, SAVE_FILE_NAME, SAVE_FILE_NAME_TAG
from instagram_crawler.extract_data import crawling_instagram


parser = argparse.ArgumentParser(description='Crawling Instagram Post - Comment',
                                 formatter_class=argparse.RawTextHelpFormatter)


def get_arguments():
    parser.add_argument("--driver_path", 
                        help="selenium chrome driver path", 
                        required=True, type=str)

    parser.add_argument("--id", 
                        help="instagram or facebook id", 
                        required=True, type=str)

    parser.add_argument("--password", 
                        help="instagram or facebook password", 
                        required=True, type=str)

    parser.add_argument("--hash_tag", 
                        help="The hashtag you want to extract.", 
                        required=True, type=str)

    parser.add_argument("--display",
                        help="display selenium chromedriver or not 0 or 1",
                        required=True, type=int)


    parser.add_argument("--extract_num", 
                        help="The number of posts I want to extract.", 
                        default=EXTRACT_NUM, type=int)

    parser.add_argument("--login_option", 
                        help="select login account [facebook, instagram]", 
                        default=LOGIN_OPTION, type=str)

    parser.add_argument("--extract_file",
                        help="set extract file name", 
                        default=SAVE_FILE_NAME, type=str)

    parser.add_argument("--extract_tag_file",
                        help="set extract tag file name", 
                        default=SAVE_FILE_NAME_TAG, type=str)

    _args = parser.parse_args()

    return _args


def instagram_main():
    args = get_arguments()
    is_file_save, is_tag_file_save = crawling_instagram(args=args)

    if is_file_save:
        print("file save success - {}".format(args.extract_file))

    if is_tag_file_save:
        print("file save success - {}".format(args.extract_tag_file))


if __name__ == "__main__":
    instagram_main()

Stack

Library used for make result csv file.

Library used for extract instagram data in chrome browser.

Contribute

To contribute, simply clone the repository, add your code in a new branch and open a pull request !

instagram_crawler's People

Contributors

somjang avatar somjang-42maru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

instagram_crawler's Issues

UnboundLocalError: local variable 'is_save_file_success' referenced before assignment

2022년 3월 25일 블로그 질문을 통해 알게된 버그

Traceback (most recent call last):
File "/Users/jaehalee/Downloads/Instagram_Crawler/Instagram_Crawler/main.py", line 67, in <module>
instagram_main()
File "/Users/jaehalee/Downloads/Instagram_Crawler/Instagram_Crawler/main.py", line 57, in instagram_main
is_file_save, is_tag_file_save = crawling_instagram(args=args)
File "/Users/jaehalee/Downloads/Instagram_Crawler/Instagram_Crawler/instagram_crawler/extract_data.py", line 190, in crawling_instagram
return is_save_file_success, is_save_tag_file_success
UnboundLocalError: local variable 'is_save_file_success' referenced before assignment

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.