msnews.github.io's People
Forkers
lvjingshu rosarubu voladorlu doubleq2018 manalalshehri xlx0010 sumitsidana dusthui slother123 wumengdan lexieqiqi bluebalam cuixiaopi jimcurrywang scwindy jfzo ruppdj whxhx sparsh-ai peijiesun williamstar inacionery limengleidaye js-ts emmmmmboom ronniefu helenaazheng scientist1642 ablohui cylouiskoo mayurakshipaultr tenrec0 aashishkolluri xiaorancsmsnews.github.io's Issues
the order of impressions in behaviour file.
Hey~
The definition of impressions in MIND paper is:
An example is shown as below:
I am wondering if this impressions sequence follows the order that user has seen? (eg, user U131 see N4 earlier than N34). Or it has been re-ordered by whether it was clicked (I noticed clicked news always ahead of non-clicked news)?
Many Thanks.
Qin
Is the competition open? The Join button is not working
Q: is the click history ordered in chronological or inverse chronological order?
In the description of the dataset it reads:
History. The news click history (ID list of clicked news) of this user before this impression. The clicked news articles are ordered by time.
are the oldest or most recent news ids appearing first? For example, given the line:
N37780 N8541 N81937 N15638 N47169 N23596 N86567 N73301
is N37780
the oldest or the newest click?
Thanks!
How to access the complete test score
Thanks for your nice work.
I noticed that the codalab website can only show the scores on the 10% test set. However, the result table in the orginal MIND paper shows the scores on the complete test set.
I want to know how to access the complete test score.
Thank you very much.
Most URLs of news not available
For example, 1st sample from MINDsmall_train/news.tsv, the URL is https://assets.msn.com/labs/mind/AAGH0ET.html , accessing the URL returns the following xml:
<Error>
<Code>ResourceNotFound</Code>
<Message>The specified resource does not exist. RequestId:dd08bc8b-101e-0038-13ca-4fb0f6000000 Time:2023-03-06T01:25:44.5985962Z</Message>
</Error>
Questions about the Data Set
As described in the paper, There are 2,186,683 samples in the training set, 365,200 samples in the validation set, and 2,341,619 samples in the test set.
But after I downloaded the dataset and analyzed it, I noticed that there are 223,274,8 samples in the training set, 376,471 samples in the validation set, and 237,072,7 samples in the test set.
I'm wondering whether there were some changes in the dataset?
Thank you so much.
About the codalab competition website
Hi
I'm very interested in this news recommendation competition.
I cannot open the codalab competition website via the Chinese IP, even use the VPN proxy .
Does it have an IP blacklist?
How to get the triplets of KG
The dataset provides the vec file of KG entities and relations, they are very useful. However, the triplets are more important when using KG. So, could you share the basic triplets(wikidata) of MIND?
Thank you very much.
behaviors.tsv just has 4 columns
I noticed the 'behaviors.tsv' just has four columns while it was described as five columns in 'introduction.md'. Could you fix the missing "Impression ID" column? Thanks.
Entities in dataset
Hello!
If I could create my own news dataset from today's news, how do I generate the Title and Abstract entities? It was mentioned in the MIND paper that it came from using an internal NER and entity linking tool but no further explanation about it. I am interested in knowing how it was processed.
I have some ideas for getting the keys except Confidence and OccurenceOffsets.
Is there a MIND util function for generating entities? Or is there any code I can refer to to generate entities?
I would appreciate any help. Thanks!
the order of history attribute in behaviours file
Hey, Thanks for collecting this awesome news dataset.
I am wondering if news clicks follow the ascending time order, or they are random sequences?
For example. we assume that User U131 has historic news clicks: N11 N21 N103. Is the time of clicked N11 is earlier than the time of clicked N21.
Thanks in advanced.
Request to join as a contributor
This is Ying from MS News :)
Type of entity in dataset
Hi,
I wonder what is the type of entity in the dataset.
The description tells us just "the type of entity in wikidata", but I couldn't figure it out.
In wikidata, the type of entity is either "item" or "property", but in this dataset, the type of entity is a single character ranging from "A" to "Z.
Is there anyone who understands this meaning? Just a single URL related to this one will be helpful :)
Thanks,
Timestamps for user history items?
Hello.
Is there a reason you only provide timestamps for impressions and not for each click in the user's history?
Since temporal information also plays an important role in recommendation (with many recent research proving so), not providing historical timestamps seems like a huge loss in terms of dataset quality.
MIND dataset is one of few reliable public datasets in news recommendation research. It would be much appreciated if you could boost the research in this area by providing those extra features.
Thanks for the otherwise wonderful dataset.
Question on the dataset format
Hi,
The dataset description said that the 'userClickHistory' is all the 'news' that the user has clicked.
So, why it is not updated sequentially ,
all the 'userClickHistory' by the same user(if the user has more than on row) are all the same.
thanks
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.