euagendas / m3inference Goto Github PK

A deep learning system for demographic inference (gender, age, and individual/person) that was trained on massive Twitter dataset using profile images, screen names, names, and biographies

Home Page: http://www.euagendas.org

License: GNU Affero General Public License v3.0

Python 100.00%

m3inference's People

Contributors

Stargazers

Watchers

m3inference's Issues

Incompatibility with Torch 1.7.0

Hi there,

On a new installation pip will attempt to pull the latest version of all the dependency. Since Torch released the latest version of their package (1.7.0) M3 inference seems to misbehave.
I get the following error when attempting to import the preprocessing package.

AttributeError: module 'torch.utils.data' has no attribute "'BatchSamplerDistributedSamplerDataset'"

The whole issue is resolved by downgrading Torch to version 1.6.0.

I hope this helps!
Keep up with the amazing work!

PS: For context I tried this on two OS with two different Python version and got the same result.
Test 1: Linux 5.4.0-1031-azure; Python 3.6.7-1
Test 2: Linux 5.4.39-linuxkit; Python 3.8.2-0
Both are running on x86_64 architecture

Predicting...0%

Hi,

I tried using the library with text_mode and it works fine.
When I use the full_mode prediction doesn't work but I don't get any error. It is just stucked

This is basically my code:

m3 = M3Inference(use_full_model=True) 
        preprocess.download_resize_img(pic_url, "profile_pic.jpg", "profile_pic_fs.jpg")

        with open('data.jsonl', 'w') as outfile:
            for entry in data_set:
                json.dump(entry, outfile)
                outfile.write('\n')
        
        pred = m3.infer('data.jsonl')

This is the output:

10/04/2021 15:47:05 - INFO - m3inference.m3inference -   Version 1.1.5
10/04/2021 15:47:05 - INFO - m3inference.m3inference -   Running on cpu.
10/04/2021 15:47:05 - INFO - m3inference.m3inference -   Will use full M3 model.
10/04/2021 15:47:06 - INFO - m3inference.m3inference -   Model full_model exists at /Users/vv/m3/models/full_model.mdl.
10/04/2021 15:47:06 - INFO - m3inference.utils -   Checking MD5 for model full_model at /Users/vv/m3/models/full_model.mdl
10/04/2021 15:47:06 - INFO - m3inference.utils -   MD5s match.
10/04/2021 15:47:06 - INFO - m3inference.m3inference -   Loaded pretrained weight at /Users/vv/m3/models/full_model.mdl
10/04/2021 15:47:06 - INFO - m3inference.dataset -   1 data entries loaded.
Predicting...:   0%|          | 0/1 [00:00<?, ?it/s]

Any idea about the issue?

Thanks.

Feature: Support v2 API data as input

Hi! I'm doing a research project about Twitter analysis.

I fetched user data by Twitter Academic API (v2), and after usingM3Twitter.transform_jsonl(...) I got the following error:

KeyError                                  Traceback (most recent call last)
<ipython-input-5-23da1cf5d317> in <module>
      5 ,access_token=' ',access_secret=' ')
      6 
----> 7 m3twitter.transform_jsonl(input_file="test.jsonl", output_file="test_result.jsonl")

~/opt/anaconda3/lib/python3.8/site-packages/m3inference/m3twitter.py in transform_jsonl(self, input_file, output_file, img_path_key, lang_key, resize_img, keep_full_size_img)
     48             with open(output_file, "w") as fhOut:
     49                 for line in fhIn:
---> 50                     m3vals = self.transform_jsonl_object(line, img_path_key=img_path_key, lang_key=lang_key,
     51                                                          resize_img=resize_img, keep_full_size_img=keep_full_size_img)
     52                     fhOut.write("{}\n".format(json.dumps(m3vals)))

~/opt/anaconda3/lib/python3.8/site-packages/m3inference/m3twitter.py in transform_jsonl_object(self, input, img_path_key, lang_key, resize_img, keep_full_size_img)
     80             else:
     81                 img_file_resize = img_path
---> 82         elif user["default_profile_image"]:
     83             # Default profile image
     84             img_file_resize = TW_DEFAULT_PROFILE_IMG

KeyError: 'default_profile_image'

I also run the example data provided in m3inference/test/twitter_cache/ and the function runs perfectly.

Then I double-checked the jsonl file, it looks like the two versions of Twitter API (v1 / v2) returns (slightly) different jsonl files (I suppose the example data were made by v1 API). Details please see: https://developer.twitter.com/en/docs/twitter-api/migrate/data-formats/standard-v1-1-to-v2

I'm not sure if my comment makes sense, maybe you could have a look?
Thanks in advance!

Segmentation fault for certain ids in Apple M1 computers - no problem in Apple Intel computers

There may be an incompatibility issue for those running M3Inference with Apple M1 computers.

I just converted to a newer Apple M1 laptop and tried running m3. For certain ids, there are no problems. However, for most ids, I get "segmentation fault" (see error Output for A below).

I tried running it in my old laptop (Apple Intel). There are no problems for all ids. It runs smoothly. Examples can be found below:

Both (A) and (B) run fine for my Apple Intel laptop:
(A)

python3 scripts/m3twitter.py --skip-cache --id 7259022 --auth scripts/auth.txt

Output for (A)

{'input': {'description': 'Techonomist who runs International Development '
                          'Projects and works on Technology Platforms in the '
                          'Philippines, specifically @gloryreborn & @symphco',
           'id': '7259022',
           'img_path': '/Users/szoriac/m3/cache/7259022_224x224.jpg',
           'lang': 'en',
           'name': 'Dave Overton',
           'screen_name': 'daveove'},
 'output': {'age': {'19-29': 0.0087,
                    '30-39': 0.8318,
                    '<=18': 0.0002,
                    '>=40': 0.1593},
            'gender': {'female': 0.0004, 'male': 0.9996},
            'org': {'is-org': 0.0001, 'non-org': 0.9999}}}

(B)

python3 scripts/m3twitter.py --skip-cache --id 373269437 --auth scripts/auth.txt

Output for (B)

{'input': {'description': '',
           'id': '373269437',
           'img_path': '/Users/szoriac/m3/cache/373269437_224x224.jpg',
           'lang': 'un',
           'name': 'BANISCH Dominique',
           'screen_name': 'Nasch57'},
 'output': {'age': {'19-29': 0.0013,
                    '30-39': 0.0003,
                    '<=18': 0.0052,
                    '>=40': 0.9932},
            'gender': {'female': 0.021, 'male': 0.979},
            'org': {'is-org': 0.1689, 'non-org': 0.8311}}}

But only (B) works for my Apple M1 laptop:

I get this error for (A)

11/20/2021 16:22:54 - INFO - m3inference.m3inference -   Version 1.1.5
11/20/2021 16:22:54 - INFO - m3inference.m3inference -   Running on cpu.
11/20/2021 16:22:54 - INFO - m3inference.m3inference -   Will use full M3 model.
11/20/2021 16:22:54 - INFO - m3inference.m3inference -   Model full_model exists at /Users/wdwg/m3/models/full_model.mdl.
11/20/2021 16:22:54 - INFO - m3inference.utils -   Checking MD5 for model full_model at /Users/wdwg/m3/models/full_model.mdl
11/20/2021 16:22:55 - INFO - m3inference.utils -   MD5s match.
11/20/2021 16:22:55 - INFO - m3inference.m3inference -   Loaded pretrained weight at /Users/wdwg/m3/models/full_model.mdl
11/20/2021 16:22:55 - INFO - m3inference.m3twitter -   skip_cache is True. Fetching data from Twitter for id 7259022.
11/20/2021 16:22:55 - INFO - m3inference.m3twitter -   GET /users/show.json?id=7259022
[1]    22412 segmentation fault  python3 scripts/m3twitter.py --skip-cache --id 7259022 --auth scripts/auth.tx

But not for (B)

{'input': {'description': '',
           'id': '373269437',
           'img_path': '/Users/wdwg/m3/cache/373269437_224x224.jpg',
           'lang': 'un',
           'name': 'BANISCH Dominique',
           'screen_name': 'Nasch57'},
 'output': {'age': {'19-29': 0.0013,
                    '30-39': 0.0003,
                    '<=18': 0.0052,
                    '>=40': 0.9932},
            'gender': {'female': 0.021, 'male': 0.979},
            'org': {'is-org': 0.1689, 'non-org': 0.8311}}}

Is anyone else encountering the same problem? Am I doing something wrong? Is there a way to fix this?

Possibly helpful information: I used m3inference = 1.1.5 on both laptops. The Python version for my Apple M1 is 3.9.7 while the Intel version runs 3.8.5. M1 does not support 3.8.5. It may be a version issue or not.

Potential m3twitter.infer_id bug

Hello, first time GitHub issuer here!

When I try to process certain user id_str's I get a FileNotFound error. Here is a user id_str I chose at random - '238173039'. When I run m3twitter.infer_id, I receive the following error:

Traceback (most recent call last):
  File "is_organization.py", line 24, in <module>
    org = m3twitter.infer_id(id_str)['output']['org']
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/m3twitter.py", line 208, in infer_id
    output=self._twitter_api(id=id)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/m3twitter.py", line 187, in _twitter_api
    return self.process_twitter(r.json())
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/m3twitter.py", line 245, in process_twitter
    download_resize_img(img, img_file_resize, img_file_full)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/preprocess.py", line 28, in download_resize_img
    with open(img_out_path_fullsize, "wb") as fh:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ndbhagwa/m3/cache/TheWhaleShark.com/profile_images/2602602416/m8su11Vx_400x400'

I am not sure what the cause of this error might be. I originally thought it was a rate limit error since it does not occur consistently, but for other rate errors, I see warnings like this:

<dt> - INFO - m3inference.m3twitter -   Results not in cache. Fetching data from Twitter for id <#>
<dt> - INFO - m3inference.m3twitter -   GET /users/show.json?id=<#>
<dt> - WARNING - m3inference.m3twitter -   Could not retreive screen_name
<dt> - WARNING - m3inference.m3twitter -   Could not retreive id_str
<dt> - WARNING - m3inference.m3twitter -   Could not retreive description
<dt>  - WARNING - m3inference.m3twitter -   Could not retreive name
<dt> - WARNING - m3inference.m3twitter -   Could not retreive profile_image_url
<dt> - WARNING - m3inference.m3twitter -   Unable to extract image from Twitter. Using default image.
<dt> - INFO - m3inference.dataset -   1 data entries loaded

Query regarding 'id'

Hi,
Is it user_id or tweet_id that is used in the function infer()?

How to infer local Twitter JSON files?

I have a bunch of local Twitter JSON files. As free Twitter API has quite limited quota, how to do the job with m3 locally?

Infer_ad and infer_username can't work well

I have probelm while running program, my code is

#The API first needs to validate your Twitter App's credentials m3twitter.twitter_init_from_file('/content/drive/My Drive/Ibu Avi/Last/User/auth-sample.txt')

The output is : True

And
#sample run pprint.pprint(m3twitter.infer_id("3138075595"))

The output is :

04/27/2022 09:40:21 - INFO - m3inference.m3twitter - Results not in cache. Fetching data from Twitter for id 3138075595.
04/27/2022 09:40:21 - INFO - m3inference.m3twitter - GET /users/show.json?id=3138075595
04/27/2022 09:40:21 - WARNING - m3inference.m3twitter - Could not retreive screen_name
04/27/2022 09:40:21 - WARNING - m3inference.m3twitter - Could not retreive id_str
04/27/2022 09:40:21 - WARNING - m3inference.m3twitter - Could not retreive description
04/27/2022 09:40:21 - WARNING - m3inference.m3twitter - Could not retreive name
04/27/2022 09:40:21 - WARNING - m3inference.m3twitter - Could not retreive profile_image_url
04/27/2022 09:40:21 - WARNING - m3inference.m3twitter - Unable to extract image from Twitter. Using default image.
04/27/2022 09:40:21 - INFO - m3inference.dataset - 1 data entries loaded.
Predicting...: 100%|██████████| 1/1 [00:00<00:00, 1.39it/s]{'input': {'description': '',
'id': 'dummy',
'img_path': '/usr/local/lib/python3.7/dist-packages/m3inference/data/tw_default_profile.png',
'lang': 'un',
'name': '',
'screen_name': ''},
'output': {'age': {'19-29': 0.2393,
'30-39': 0.0793,
'<=18': 0.1746,
'>=40': 0.5067},
'gender': {'female': 0.2809, 'male': 0.7191},
'org': {'is-org': 0.0873, 'non-org': 0.9127}}}

Its always like that even after change other active user id and username, what should i do?

Error in infer_id()

The following code

from m3inference import M3Twitter
m3 = M3Twitter()
m3.infer_id(243344789)

Led to the following error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/winston/.local/lib/python3.7/site-packages/m3inference/m3twitter.py", line 179, in infer_id
    output = self.process_twitter(data, id=id)
  File "/home/winston/.local/lib/python3.7/site-packages/m3inference/m3twitter.py", line 229, in process_twitter
    pred = self.infer(data, batch_size=1, num_workers=1)
  File "/home/winston/.local/lib/python3.7/site-packages/m3inference/m3inference.py", line 125, in infer
    for batch in tqdm(dataloader, desc='Predicting...'):
  File "/home/winston/.local/lib/python3.7/site-packages/tqdm/std.py", line 1119, in __iter__
    for obj in iterable:
  File "/home/winston/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/winston/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/home/winston/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/winston/.local/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/winston/.local/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/winston/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/winston/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/winston/.local/lib/python3.7/site-packages/m3inference/dataset.py", line 37, in __getitem__
    return self._preprocess_data(data)
  File "/home/winston/.local/lib/python3.7/site-packages/m3inference/dataset.py", line 43, in _preprocess_data
    fig = self._image_loader(img_path)
  File "/home/winston/.local/lib/python3.7/site-packages/m3inference/dataset.py", line 91, in _image_loader
    image = Image.open(image_name)
  File "/home/winston/.local/lib/python3.7/site-packages/PIL/Image.py", line 2809, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/home/winston/m3/cache/indiealehouse_224x224.png'

Error fetching images will fail the infer method

I am trying to run transform_jsonl (to download images and prepare m3 json file) and right after running the infer method - the issue occurs when transform_jsonl does not find some images but still writes the path to the m3 json file, causing the infer to fail over:
FileNotFoundError: [Errno 2] No such file or directory

Commercial use

Best regards. My compliments.
Is it possible to use the python code or the python library for a commercial project? What are the restrictions or requirements?
Thanks.

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 3 and 1 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:612

I have the following error when trying to predict the demographics of a list of twitter users.

Predicting...:   0%|                                                                                                                                                        | 36/54307 [04:36<107:30:38,  7.13s/it]
File ".../src/utils/demographic_detector.py", line 43, in infer                                                                                                           [5/1807]
    predictions = self.m3twitter.infer(user_objs)                                                                                                                                                                  
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/m3inference/m3inference.py", line 125, in infer                                                                                        
    for batch in tqdm(dataloader, desc='Predicting...'):                                                                                                                                                           
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/tqdm/std.py", line 1108, in __iter__                                                                                                   
    for obj in iterable:                                                                                                                                                                                           
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in default_collate
    return [default_collate(samples) for samples in transposed]
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 79, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File ".../.conda/envs/twcovid/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 3 and 1 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:612

The list of users can be found here

Incompatibility with pytorch 1.8.0

Hi,
I've been enjoying this project a lot for my research, but recently I'm having issues using it in our machine that has Pytorch 1.8.0 installed on it. The error happens when I try to use any of the available models with GPU:

from m3inference import M3Inference import pprint m3 = M3Inference() # see docstring for details pred = m3.infer('./test/data_resized.jsonl') # also see docstring for details pprint.pprint(pred)

where it produces the following error:

pred = m3.infer('./test/data_resized.jsonl') # also see docstring for details
03/19/2021 12:13:13 - INFO - m3inference.dataset - 7 data entries loaded.
Predicting...: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "", line 1, in
File "/home/minje/libraries/m3inference/m3inference/m3inference.py", line 127, in infer
pred = self.model(batch)
File "/opt/anaconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/minje/libraries/m3inference/m3inference/full_model.py", line 99, in forward
username_pack, username_unsort = pack_wrapper(username_embed, username_len)
File "/home/minje/libraries/m3inference/m3inference/utils.py", line 47, in pack_wrapper
packed = pack_padded_sequence(sents_sorted, lengths_sorted, batch_first=True)
File "/opt/anaconda/lib/python3.8/site-packages/torch/nn/utils/rnn.py", line 245, in pack_padded_sequence
_VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

I think this is related to Pytorch's update on the pack_padded_sequence only accepting lengths as CPU form when inputted as tensors [link]. I would appreciate it a lot if you could look into this. Thanks!

Error:"profile_image_url_https"

I am getting the following error with m3twitter.transform_jsonl()
The data has been shared privately with Zijian Wang.

id in the results

Can I ask the id in the results is user id or tweet id?

"NameError: name 'torch' is not defined" but torch is installed and imported

After creating a virtual environment, I tried to install and import m3inference:

pip install m3inference
import m3inference

But I get the following error, how could I fix it?

NameError                                 Traceback (most recent call last)
<ipython-input-9-50ee37ff85fa> in <module>
----> 1 import m3inference

3 frames
/usr/local/lib/python3.8/dist-packages/m3inference/full_model.py in M3InferenceModel()
     10 
     11 class M3InferenceModel(nn.Module):
---> 12     def __init__(self, device='cuda' if torch.cuda.is_available() else 'cpu'):
     13         super(M3InferenceModel, self).__init__()
     14 

NameError: name 'torch' is not defined

I tried to install and import torch before doing the same with m3inference.

Thanks!

Segmentation Fault w/ transform_jsonl()

I believe I've installed m3-inference correctly, but running transform_jsonl() on a json lines file of tweets seems to fetch the first profile picture in the list and then terminate with a segmentation fault.

I believe the file is structured appropriately, in the format below:
{json object}\n
{json object}\n
...

Any idea what I might be running into?

Problems with installation

Hello,

I am trying to install m3inference through "pip install m3inference", but I get an error code (see below). I tried several things to fix this, but it does not resolve the issue. I believe it has something to do with the "pycld2" - when I tried to install it separately, it did also not work.

Thanks in advance:

(base) C:\Users\Rude>pip install m3inference
Collecting m3inference
Using cached m3inference-1.1.5-py3-none-any.whl (58 kB)
Requirement already satisfied: tqdm in c:\users\rude\appdata\local\continuum\ana
conda3\lib\site-packages (from m3inference) (4.28.1)
Collecting pycld2>=0.31
Using cached pycld2-0.41.tar.gz (41.4 MB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch>=1.0.0 in c:\users\rude\appdata\local\conti
nuum\anaconda3\lib\site-packages (from m3inference) (1.10.1)
Requirement already satisfied: Pillow in c:\users\rude\appdata\local\continuum\a
naconda3\lib\site-packages (from m3inference) (5.3.0)
Requirement already satisfied: pandas>=0.20 in c:\users\rude\appdata\local\conti
nuum\anaconda3\lib\site-packages (from m3inference) (1.3.4)
Requirement already satisfied: torchvision>=0.2.2 in c:\users\rude\appdata\local
\continuum\anaconda3\lib\site-packages (from m3inference) (0.11.2)
Requirement already satisfied: rauth in c:\users\rude\appdata\roaming\python\pyt
hon37\site-packages (from m3inference) (0.7.3)
Requirement already satisfied: requests in c:\users\rude\appdata\local\continuum
\anaconda3\lib\site-packages (from m3inference) (2.21.0)
Requirement already satisfied: numpy>=1.13 in c:\users\rude\appdata\roaming\pyth
on\python37\site-packages (from m3inference) (1.21.4)
Requirement already satisfied: pytz>=2017.3 in c:\users\rude\appdata\local\conti
nuum\anaconda3\lib\site-packages (from pandas>=0.20->m3inference) (2018.7)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\rude\appdata\l
ocal\continuum\anaconda3\lib\site-packages (from pandas>=0.20->m3inference) (2.7
.5)
Requirement already satisfied: typing-extensions in c:\users\rude\appdata\local
continuum\anaconda3\lib\site-packages (from torch>=1.0.0->m3inference) (4.0.1)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\rude\appdata\lo
cal\continuum\anaconda3\lib\site-packages (from requests->m3inference) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in c:\users\rude\appdata\local\con
tinuum\anaconda3\lib\site-packages (from requests->m3inference) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\rude\appdata\local
\continuum\anaconda3\lib\site-packages (from requests->m3inference) (2021.5.30)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in c:\users\rude\appdata\lo
cal\continuum\anaconda3\lib\site-packages (from requests->m3inference) (1.24.1)
Requirement already satisfied: six>=1.5 in c:\users\rude\appdata\local\continuum
\anaconda3\lib\site-packages (from python-dateutil>=2.7.3->pandas>=0.20->m3infer
ence) (1.12.0)
Building wheels for collected packages: pycld2
Building wheel for pycld2 (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\rude\appdata\local\continuum\anaconda3\python.exe' -u -c '
import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Rude\Ap
pData\Local\Temp\5\pip-install-3n3kiofw\pycld2_04bebb99f5e4481caa01025a1abb
1b1f\setup.py'"'"'; file='"'"'C:\Users\Rude\AppData\Local\Temp\5\pip
-install-3n3kiofw\pycld2_04bebb99f5e4481caa01025a1abb1b1f\setup.py'"'"';f = ge
tattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else
io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().re
place('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'
exec'"'"'))' bdist_wheel -d 'C:\Users\Rude\AppData\Local\Temp\5\pip-wheel-7as7f8
gd'
cwd: C:\Users\Rude\AppData\Local\Temp\5\pip-install-3n3kiofw\pycld2_04beb
b99f5e4481caa01025a1abb1b1f
Complete output (10 lines):
running bdist_wheel
The [wheel] section is deprecated. Use [bdist_wheel] instead.
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\pycld2
copying pycld2_init_.py -> build\lib.win-amd64-3.7\pycld2
running build_ext
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsof
t C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

ERROR: Failed building wheel for pycld2
Running setup.py clean for pycld2
Failed to build pycld2
Installing collected packages: pycld2, m3inference
Running setup.py install for pycld2 ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\rude\appdata\local\continuum\anaconda3\python.exe' -u -c
'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Rude\
AppData\Local\Temp\5\pip-install-3n3kiofw\pycld2_04bebb99f5e4481caa01025a1a
bb1b1f\setup.py'"'"'; file='"'"'C:\Users\Rude\AppData\Local\Temp\5\p
ip-install-3n3kiofw\pycld2_04bebb99f5e4481caa01025a1abb1b1f\setup.py'"'"';f =
getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) el
se io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().
replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'
"'exec'"'"'))' install --record 'C:\Users\Rude\AppData\Local\Temp\5\pip-record-3
t8c0_3x\install-record.txt' --single-version-externally-managed --compile --inst
all-headers 'c:\users\rude\appdata\local\continuum\anaconda3\Include\pycld2'
cwd: C:\Users\Rude\AppData\Local\Temp\5\pip-install-3n3kiofw\pycld2_04b
ebb99f5e4481caa01025a1abb1b1f
Complete output (11 lines):
running install
c:\users\rude\appdata\local\continuum\anaconda3\lib\site-packages\setuptools
\command\install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprec
ated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
creating build\lib.win-amd64-3.7\pycld2
copying pycld2_init_.py -> build\lib.win-amd64-3.7\pycld2
running build_ext
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Micros
oft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

----------------------------------------

ERROR: Command errored out with exit status 1: 'c:\users\rude\appdata\local\cont
inuum\anaconda3\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys
.argv[0] = '"'"'C:\Users\Rude\AppData\Local\Temp\5\pip-install-3n3kiofw\
pycld2_04bebb99f5e4481caa01025a1abb1b1f\setup.py'"'"'; file='"'"'C:\Users
\Rude\AppData\Local\Temp\5\pip-install-3n3kiofw\pycld2_04bebb99f5e4481caa0
1025a1abb1b1f\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(_file
_) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setu
p; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close()
;exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Rude
AppData\Local\Temp\5\pip-record-3t8c0_3x\install-record.txt' --single-version-ex
ternally-managed --compile --install-headers 'c:\users\rude\appdata\local\contin
uum\anaconda3\Include\pycld2' Check the logs for full command output.

(base) C:\Users\Rude>

Error with using infer_id()

Hi! I'm using this code for a research project, thank you for providing it.

I am trying to make an inference based infer_id nd I just replicated the example in the FAQ. Here's what my code looks like:

from m3inference import M3Twitter
load_dotenv()
# authentication twitter_app_auth = { 'consumer_key': os.getenv('TWITTER_API_KEY'), 'consumer_secret': os.getenv('TWITTER_API_SECRET'), 'access_token': os.getenv('TWITTER_ACCESS_TOKEN'), 'access_token_secret': os.getenv('TWITTER_ACCESS_SECRET'), }

# init the api inferenceTwitter.twitter_init(api_key=twitter_app_auth['consumer_key'], api_secret=twitter_app_auth['consumer_secret'], access_token=twitter_app_auth['access_token'], access_secret=twitter_app_auth['access_token_secret'])

pprint.pprint(inferenceTwitter.infer_id("2631881902"))

The traceback that I received was pretty confusing

`RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.`

RuntimeError: DataLoader worker (pid(s) 57016) exited unexpectedly

I'm not sure where to find the freeze_support() function call and how to deal with using the fork() child processes.

ValueError: semaphore or lock released too many times

Hi, I am working with Professor Przemek and Mattia on a project. I am using m3 inference to run predictions. But I am encountering a value error which says "semaphore or lock released too many times" while running the infer method of M3Inference module. This has something to do with multiprocessing. But I am unable to fix this error. Attaching screenshot for your reference.

Improve requests speed

Hello
I already used M3inferince and work well. but in large scale of data it's not fast enough. It takes an average 1.3 second for each account while running on kaggel GPU.
If there are any advice or technique to speed up it's progress.

Question about training procedure code

Hi team,
Thank you so much for your great work.
I was wondering could you please upload the training procedure code? I read the uploaded code but didn't find the code about multi-task classification procedure. I only found the code about evaluation.
Thanks in advance.

ModuleNotFoundError: No module named 'pycld2'

how to get done this error?

Add streaming option to infer

For inferring a large number of users, it would be fantastic if infer would have an option to stream the results to a file as it finishes, rather than returning the values. This behavior is particularly helpful for big inference jobs that need a few hours (days?) to finish and where intermediate results would be useful.

Output file options

I am using M3 for a research project and will be combining the output with other data from Twitter. Is it possible to output the results into something more manageable than the print screen output after running the code? The readme does reference the output format but not sure where to look next.

Support Different Languages Outside the EU?

Hey, thank you for making this project. What awesome and incredible research.
Is the project is also supported in different languages outside the EU? If not, which part of the project can emphasize this. I am interested to research this project.

Question about training procedure

Hi,
First of all thank you for your great work.

I was wondering what you used as ground truth label for age and gender when user profiles are organizations. You wouldn't want the model to train to recognize any gender / age on an organization profile.
I believe this is not mentioned in the article, or maybe I misunderstood something about the training procedure ?

Efficient collection of large list of screen-names/ids via Twitter API

Currently the infer_screen_name and infer_id methods in M3Twitter accept one screen-name/id and call the Twitter API to get information for that single user. This is inefficient since the endpoint can get up to 100 users at a time.

New methods should be included in the M3Twitter class to handle a long list of users. These methods should break the list into chunks of 100, respect the rate limit, and gracefully handle any API errors.

(This was previously not needed as the class was scraping profiles from HTML and was designed simply as a demonstration method rather than something to be used at scale. The change recently made to use the API opens up this opportunity, which would make the library even more user-friendly)

euagendas / m3inference Goto Github PK

m3inference's People

Contributors

Stargazers

Watchers

Forkers

m3inference's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs