GithubHelp home page GithubHelp logo

labadier / evalita_tag_it Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 3.26 MB

Predicting age, gender and topic from italian texts

Python 90.09% Jupyter Notebook 9.91%
nlp deep-learning neural-network language-processing author-profiling

evalita_tag_it's Introduction

EVALITA_TAG_it

This Project constist on a neural network model used for participating in the TAG-it Author Profiling task at EVALITA 2020. This task aims to predict age and gender of blogs users from their posts, as the topic they wrote about. It combines learned representations by RNN at word and sentence levels, Transformer Neural Net, specifically BERT arquitecture, and hand-crafted stylistic features. All these representations are mixed and fed into fully connected layer from a fedforward neural network in order to make predictions for addressed subtasks.

The Models description is available here.

For this code to be functional is needed:

  • Python 3.8
  • tensorflow 2.0
  • Keras 2.4.3
  • Freeling 4.1 and python API
  • Italian Word Embedding avalilable here

Steps for using the model

Training models of the ensemble

The models code for predicting each task is locatend on Ensemble floder, also there is a file train.py which once run save the weights learned with the provided training data. So the first step for use this classifier is run on the command line:

 python ./Ensemble/train.py

The training files are located on data folder and these are the one provided by the contest organizers. If you want to chage the trainning file, change the source variable on this train.py file.

source = "./data/training.txt"

Making Predictions

For making predictions run:

 python main.py

You should provide the test files by -dp option. Inside the test_data folder is the test data provided by the organizers.

Data Format

The datasets are composed by texts written by multiple users, with possibly multiple posts per user. The data is distributed in the form of one XML-like file per genre with one sample per elements, and attributes specifying an id, the topic, the gender male|female, and the age range [0,19], [20,29], [30-39], [40-49], [50-100]. This is a sample:

<doc id="3046" topic="orologi" age="30-39" gender="male" >
 <post>
   Per quale motivo oggi, il mondo dell'orologeria è così importante per voi? 
 </post>
 <post>
   Cosa vi ha spinto a rendervi appassionati così bramosi?
 </post>
</doc>

evalita_tag_it's People

Contributors

labadier avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.