GithubHelp home page GithubHelp logo

ciallocorpus's Introduction

CialloCorpus

Ciallo~(∠・ω< )⌒☆

人民日报

Download: https://huggingface.co/datasets/Papersnake/people_daily_news

数据范围

1946-2023

数据来源

***系列重要讲话数据库

Download: https://huggingface.co/datasets/Papersnake/xi_talk

数据范围

数据截止至 2023.4.23

数据来源

ciallocorpus's People

Contributors

prnake avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ciallocorpus's Issues

huggingface下载的时候报错

下载用的代码:

from datasets import load_dataset
dataset_name = "Papersnake/people_daily_news"
dataset = load_dataset(dataset_name,cache_dir=r'xxx/')

错误信息:

An error occurred while generating the dataset

All the data files must have the same columns, but at some point there are 2 missing columns ({'author', 'page'})

This happened while the json dataset builder was generating data using

..\downloads\d434406d0e80132d996bc6796817699b81390d86744e10acda0ec2ea71fead71

Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)
Traceback (most recent call last):
  File "_pydevd_bundle/pydevd_cython.pyx", line 546, in _pydevd_bundle.pydevd_cython.PyDBFrame._handle_exception
  File "C:\Program Files\Python39\lib\linecache.py", line 26, in getline
    def getline(filename, lineno, module_globals=None):
  File "C:\Program Files\Python39\lib\linecache.py", line 36, in getlines
    def getlines(filename, module_globals=None):
  File "C:\Program Files\Python39\lib\linecache.py", line 80, in updatecache
    def updatecache(filename, module_globals=None):
  File "C:\Program Files\Python39\lib\codecs.py", line 319, in decode
    def decode(self, input, final=False):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 41: invalid start byte
0.03s - Error on build_exception_info_response.
Traceback (most recent call last):
  File "c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\microsoft\python\core\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 1404, in build_exception_info_response
    def build_exception_info_response(dbg, thread_id, request_seq, set_additional_thread_info, iter_visible_frames_info, max_frames):
  File "C:\Program Files\Python39\lib\linecache.py", line 26, in getline
    def getline(filename, lineno, module_globals=None):
  File "C:\Program Files\Python39\lib\linecache.py", line 36, in getlines
    def getlines(filename, module_globals=None):
  File "C:\Program Files\Python39\lib\linecache.py", line 80, in updatecache
    def updatecache(filename, module_globals=None):
  File "C:\Program Files\Python39\lib\codecs.py", line 319, in decode
    def decode(self, input, final=False):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 41: invalid start byte
0.03s - Error on build_exception_info_response.
Traceback (most recent call last):
  File "c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\microsoft\python\core\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 1404, in build_exception_info_response
    def build_exception_info_response(dbg, thread_id, request_seq, set_additional_thread_info, iter_visible_frames_info, max_frames):
  File "C:\Program Files\Python39\lib\linecache.py", line 26, in getline
    def getline(filename, lineno, module_globals=None):
  File "C:\Program Files\Python39\lib\linecache.py", line 36, in getlines
    def getlines(filename, module_globals=None):
  File "C:\Program Files\Python39\lib\linecache.py", line 80, in updatecache
    def updatecache(filename, module_globals=None):
  File "C:\Program Files\Python39\lib\codecs.py", line 319, in decode
    def decode(self, input, final=False):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 41: invalid start byte

打开看了对应的文件,内容是这个:
{"url": "hf://datasets/Papersnake/people_daily_news@e61323bc7692312d907fc2d154b4ffc4290ce496/2004.jsonl.gz", "etag": null}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.