GithubHelp home page GithubHelp logo

pytok's Introduction

DOI

pytok

This is a Playwright based version of David Teacher's unofficial api wrapper for TikTok.com in python. It re-implements a currently limited set of the features of the original library, with a shifted focus on using browser automation to allow automatic captcha solves with a hopefully minor trade-off in performance.

Installation

pip install git+https://github.com/networkdynamics/pytok.git@master

Quick Start Guide

Here's a quick bit of code to get the videos from a particular hashtag on TikTok. There's more examples in the examples directory.

import asyncio

from pytok.tiktok import PyTok

async def main():
    async with PyTok() as api:
        user = api.user(username="therock")
        user_data = await user.info()
        print(user_data)

        videos = []
        async for video in user.videos():
            video_data = video.info()
            print(video_data)

if __name__ == "__main__":
    asyncio.run(main())

Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting, like so: PyTok(request_delay=10)

Please do not hesitate to make an issue in this repo to get our help with this!

Citation

If you use this library in your research, please cite it using the following BibTeX entry:

@software{ben_steel_2024_12802714,
  author       = {Ben Steel and
                  Alexei Abrahams},
  title        = {{networkdynamics/pytok: Initial working version of 
                   library}},
  month        = jul,
  year         = 2024,
  publisher    = {Zenodo},
  version      = {v0.1.0},
  doi          = {10.5281/zenodo.12802714},
  url          = {https://doi.org/10.5281/zenodo.12802714}
}

Format and Schema

The JSONable dictionary returned by the info() methods contains all of the data that the TikTok API returns. We have provided helper functions to parse that data into Pandas DataFrames, utils.get_comment_df(), utils.get_video_df() and utils.get_user_df() for the data from comments, videos, and users respectively.

The video dataframe will contain the following columns:

Field name Description
video_id Unique video ID
createtime UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format
author_name Unique author name
author_id Unique author ID
desc The full video description from the author
hashtags A list of hashtags used in the video description
share_video_id If the video is sharing another video, this is the video ID of that original video, else empty
share_video_user_id If the video is sharing another video, this the user ID of the author of that video, else empty
share_video_user_name If the video is sharing another video, this is the user name of the author of that video, else empty
share_type If the video is sharing another video, this is the type of the share, stitch, duet etc.
mentions A list of users mentioned in the video description, if any
digg_count The number of likes on the video
share_count The number of times the video was shared
comment_count The number of comments on the video
play_count The number of times the video was played

The comment dataframe will contain the following columns:

Field name Description
comment_id Unique comment ID
createtime UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format
author_name Unique author name
author_id Unique author ID
text Text of the comment
mentions A list of users that are tagged in the comment
video_id The ID of the video the comment is on
comment_language The language of the comment, as predicted by the TikTok API
digg_count The number of likes the comment got
reply_comment_id If the comment is replying to another comment, this is the ID of that comment

The user dataframe will contain the following columns:

Field name Description
id Unique author ID
unique_id Unique user name
nickname Display user name, changeable
signature Short user description
verified Whether or not the user is verified
num_following How many other accounts the user is following
num_followers How many followers the user has
num_videos How many videos the user has made
num_likes How many total likes the user has had
createtime When the user account was made. This is derived from the id field, and can occasionally be incorrect with a very low unix epoch such as 1971

pytok's People

Contributors

alexeiabrahams avatar bendavidsteel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytok's Issues

Captcha failure

Is the captcha solver failing?

i get the next message:
{'code': 500, 'data': {'msg': 'VerifyErr'}, 'message': 'Unable to verify. Please try again.', 'msg_code': '500', 'msg_sub_code': 'verify_err'}

Errors in comment.py and user_example

I am getting the following error while running comment example.py

AssertionError: Locator expected to be visible
Actual value: None
Call log:
LocatorAssertions.to_be_visible with timeout 30000ms
waiting for locator("[data-e2e=comment-level-1]").first.or_(locator("Rotate the shapes").or_(get_by_text("Verify to continue:", exact=True)).or_(get_by_text("Click on the shapes with the same size", exact=True)).or_(get_by_text("Drag the slider to fit the puzzle", exact=True))).or_(get_by_text("Be the first to comment!", exact=True))

User example.py is giving this error

json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 5 (char 6)

Authentication issue

Apparently TikTok changed something and you need to login before scraping data. Is there any workaround?

Tiktok not show captra slide but form login

My code is worked normally in local, but response this error after I run it on ubuntu server
I take a screenshot while it's scrapping comment and tiktok is just show form login without captra slide
how can I fix that, pls help

Actual value: None 
Call log:
LocatorAssertions.to_be_visible with timeout 30000ms
  - waiting for locator("[data-e2e=comment-level-1]").first.or_(locator("Rotate the shapes").or_(get_by_text("Verify to continue:", exact=True)).or_(get_by_text("Click on the shapes with the same size", exact=True))).or_(get_by_text("Be the first to comment!", exact=True))
)

how to run this? pls make a video step by step for beginners

i want to scrape posts from keyword " border" and get all post infos to each. if unlimited scraping posts is possibel that will be okay but if its not and there is a start and end date scrap post between date example posts between feb 27 2024 to feb 20 2024 please let me know.. laso is there any way to contact you

Encountered an error while running the sample code.

Thank you for providing the PyTok library.

However,while using the provided sample code, I encountered errors such as:
TypeError: 'PyTok' object does not support the context manager protocol】;
NotImplementedError:Output is truncated. View as a [scrollable element]or open in a [text editor]. Adjust cell output [settings]

I have checked my environment and dependencies, but I am still unable to resolve this issue. Is there any additional configuration required, or is the PyTok library currently under maintenance?

Your assistance and guidance would be greatly appreciated.

videos(count=x) not working

I'm trying to only get 20 videos from a user so I changed the example like so:

`import asyncio
import json

from pytok.tiktok import PyTok

async def main():
async with PyTok() as api:
user = api.user(username="therock")
user_data = await user.info()

    videos = []
    videos_bytes = []
    async for video in user.videos(count=20):
        video_data = await video.info()
        videos.append(video_data)

    assert len(videos) > 0, "No videos found"
    with open("rock.json", "w") as f:
        json.dump(videos, f)

if name == "main":
asyncio.run(main())`

It prints "Failed to get videos all at once, trying in batches..." and returns 32 videos for this user and 35 for a different one I tried.
Is async the problem?
edit: can't get the code markdown to work, but I hope it's readable enough. Only thing I changed is add count=20 to videos parameter

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.