GithubHelp home page GithubHelp logo

msnews.github.io's People

Contributors

msnews avatar wuch15 avatar yingqiao avatar yjw1029 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

msnews.github.io's Issues

the order of impressions in behaviour file.

Hey~
The definition of impressions in MIND paper is:

image

An example is shown as below:

image

I am wondering if this impressions sequence follows the order that user has seen? (eg, user U131 see N4 earlier than N34). Or it has been re-ordered by whether it was clicked (I noticed clicked news always ahead of non-clicked news)?

Many Thanks.
Qin

Q: is the click history ordered in chronological or inverse chronological order?

In the description of the dataset it reads:

History. The news click history (ID list of clicked news) of this user before this impression. The clicked news articles are ordered by time.

are the oldest or most recent news ids appearing first? For example, given the line:

N37780 N8541 N81937 N15638 N47169 N23596 N86567 N73301

is N37780 the oldest or the newest click?

Thanks!

How to access the complete test score

Thanks for your nice work.

I noticed that the codalab website can only show the scores on the 10% test set. However, the result table in the orginal MIND paper shows the scores on the complete test set.
I want to know how to access the complete test score.

Thank you very much.

Questions about the Data Set

As described in the paper, There are 2,186,683 samples in the training set, 365,200 samples in the validation set, and 2,341,619 samples in the test set.
But after I downloaded the dataset and analyzed it, I noticed that there are 223,274,8 samples in the training set, 376,471 samples in the validation set, and 237,072,7 samples in the test set.

I'm wondering whether there were some changes in the dataset?
Thank you so much.

About the codalab competition website

Hi
I'm very interested in this news recommendation competition.
I cannot open the codalab competition website via the Chinese IP, even use the VPN proxy .
Does it have an IP blacklist?

How to get the triplets of KG

The dataset provides the vec file of KG entities and relations, they are very useful. However, the triplets are more important when using KG. So, could you share the basic triplets(wikidata) of MIND?

Thank you very much.

behaviors.tsv just has 4 columns

I noticed the 'behaviors.tsv' just has four columns while it was described as five columns in 'introduction.md'. Could you fix the missing "Impression ID" column? Thanks.

Entities in dataset

Hello!

If I could create my own news dataset from today's news, how do I generate the Title and Abstract entities? It was mentioned in the MIND paper that it came from using an internal NER and entity linking tool but no further explanation about it. I am interested in knowing how it was processed.

I have some ideas for getting the keys except Confidence and OccurenceOffsets.

Is there a MIND util function for generating entities? Or is there any code I can refer to to generate entities?

I would appreciate any help. Thanks!

the order of history attribute in behaviours file

Hey, Thanks for collecting this awesome news dataset.

I am wondering if news clicks follow the ascending time order, or they are random sequences?

For example. we assume that User U131 has historic news clicks: N11 N21 N103. Is the time of clicked N11 is earlier than the time of clicked N21.

Thanks in advanced.

Type of entity in dataset

Hi,

I wonder what is the type of entity in the dataset.

The description tells us just "the type of entity in wikidata", but I couldn't figure it out.

In wikidata, the type of entity is either "item" or "property", but in this dataset, the type of entity is a single character ranging from "A" to "Z.

Is there anyone who understands this meaning? Just a single URL related to this one will be helpful :)

Thanks,

Timestamps for user history items?

Hello.

Is there a reason you only provide timestamps for impressions and not for each click in the user's history?

Since temporal information also plays an important role in recommendation (with many recent research proving so), not providing historical timestamps seems like a huge loss in terms of dataset quality.

MIND dataset is one of few reliable public datasets in news recommendation research. It would be much appreciated if you could boost the research in this area by providing those extra features.

Thanks for the otherwise wonderful dataset.

Question on the dataset format

Hi,
The dataset description said that the 'userClickHistory' is all the 'news' that the user has clicked.
So, why it is not updated sequentially ,
all the 'userClickHistory' by the same user(if the user has more than on row) are all the same.
thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.