GithubHelp home page GithubHelp logo

hydrator's Introduction

Hydrator

Twitter's changes to their API which greatly reduce the amount of read-only access means that the Hydrator is no longer a useful application. The application keys, which functioned for the last 7 years, have been rescinded by Twitter.


Build & Release

Hydrator is an Electron based desktop application for hydrating Twitter ID datasets. Twitter's Terms of Service do not allow the full JSON for datasets of tweets to be distributed to third parties. However they do allow datasets of tweet IDs to be shared. Hydrator helps you turn these tweet IDs back into JSON and also CSV from the comfort of your desktop.

If you are interested in learning more please join the DocNow community in Slack, or add an issue ticket here. If you would like to explore tweet identifier datasets please see the DocNow Catalog and GWU's TweetSets.

Install

Important!

It is easiest to download a pre-built version of the Hydrator instead of building it from source. Please see the list of available releases for OS X, Windows and Linux installers.

Note for OS X Users

Since the Hydrator has not been signed (which requires us to pay Apple in order to register as a developer) your initial start up of the Hydrator will be prevented. You can convince OS X to open it anyway by locating the Hydrator app in your Applications folder, control-clicking on it, selecting and then clicking Open (see the screenshot below). From this point on your Hydrator should start normally.

Develop

Get it:

git clone https://github.com/docnow/hydrator
cd hydrator

Configure:

In order to build the Hydrator you will need to get app keys from Twitter and put them in a .env file in your project directory. It should look something like this:

TWITTER_CONSUMER_KEY=CHANGEME
TWITTER_CONSUMER_SECRET=CHANGEMETOO

Next install the dependencies:

yarn install

Start a hot-swappable development server:

yarn run develop

Alternatively, create installers for OS X, Windows and Linux:

yarn run pack:mac
yarn run pack:win
yarn run pack:linux

Hydrator was created using electron-react-redux-boilerplate so check out that documentation for more information about commands that are available.

How to Cite

If you would like to cite this software please use something like the following:

Documenting the Now. (2020). Hydrator [Computer Software]. Retrieved from https://github.com/docnow/hydrator

hydrator's People

Contributors

akovalyov avatar akozhemiakin avatar amilajack avatar buckymaler avatar catalinmiron avatar chentsulin avatar davej avatar dependabot[bot] avatar domasx2 avatar dplusic avatar dustintownsend avatar edsu avatar epilande avatar g1ibby avatar greenkeeperio-bot avatar jefffriesen avatar jhen0409 avatar kilian avatar knpwrs avatar kubijo avatar longlivechief avatar machawk1 avatar talha131 avatar trstringer avatar tsemerad avatar ttacon avatar wincent avatar xwartz avatar yeti-or avatar zeevl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hydrator's Issues

Link Twitter Account not working on Windows 10

Thanks for the contributor share me with this app. When I install the hydrator in Windows. run it and when I click the 'Link Twitter Account' or any other function button. the app did not response. and I can't do anything with it.
Is there some configuration should do in Windows platform to run it?

Update Menus

The app menu items should point to the Hydrator homepage on GitHub and display other relevant information about the app.

Network Connection Indicator

When the network connection goes away hydrator stops collecting tweets (for obvious reasons) but it does not start again automatically when a network connection is reestablished. This means the operator need to stop and restart the dataset for collection to start again.

When the network is down the dataset should appear to be stopped and there should be a visual indicator that the network connection is down. It should check periodically for the network to come back, and when a connection is reestablished the indicator should go away.

can't install using yarn

Hello, noob here.

Tried to git clone and yarn install the source file but I get a 00h00m00s 0/0: : ERROR: [Errno 2] No such file or directory: 'install' error...

I guess I should have docker or something similar run the app and then build the .yml somehow, but I'm a bit lost.

What should I do?

Thanks in advance!

51% hydration rate?

I'm getting a very unreasonably poor hydration rate for the ID's found in a COVID-19 dataset https://github.com/echen102/COVID-19-TweetIDs

Their readme states that @edsu was only missing 6% due to deletions. As the title implies, i'm getting substantially poorer rate.

I passed a batch of ID's that didn't hydrate the first time back through again, and 12% were successfully hydrated on a second pass.

I'm not understanding where the 'leak' is.

I am assigning this task to non-technical students, and so using twarc is not an option for them.

verification fail

If verification fails then the Verifying ... message never seems to go away, and it also puts the rest of the application into a mostly inoperable state.

Report Twitter authorization errors.

Hello,

Thank you for sharing this app. However, I noticed that none of the buttons (datasets/ add/ settings and especially link twitter account) respond when being clicked. Is there anyway to fix this issue?

extended tweets

Via email:

I very much appreciate your Hydrator program! From what I can tell it requests the 140 character
text field of a ‘traditional’ tweet. Is there a way I can use Hydrator to recover the full 280 characters
of text in extended tweets? The tweet ID’s I have are from the November '17--February '18 period
and refer to a mixture of 140 and 280 character tweets.

Thank you for making Hydrator freely available!

Tweet deletion inconsistencies

Hello Mr. Summer, hope you're doing well. I did go through #31 and I agreed with what you said-

"51% hydration rate is very low indeed[....] That being said, I know that platforms are dealing with a huge 
amount of COVID-19  related disinformation, so it is within the realm of possibility that this amount of
data is being deleted, and that content may be  restored."

But I have a small question! I hydrated some text files towards the end of last month; I got them from https://github.com/echen102/COVID-19-TweetIDs/tree/master/2020-05, which were full of tweedIDs in many text files.

So for example, from the day of 05-11-2020, the deletion rate at the time of hydration was 54%. But, today we were double checking our data and re-hydrated some files, including the one from 05-11-2020, and the deletion rate of that text file changed to 10%.

How could that be Mr. Summers? Could it be that twitter may have been moderating tweets, so Hydrator was not able to retrieve those tweet? I can't seem to find answer about that. Input file itself hasn't changed, but deletion rate went from 54% -> 10%, even after some time.

Thank you. Let me know what I can provide you, would be happy to aid you in any ways. We have been using v0.0.7 since May, because it was approved by the IT department.

How to Save Files

Sorry in advance if this a basic question, but I am a textbook beginner when it comes to this sort of thing. I am able to read tweet IDs and hydrate them just fine. My confusion comes when I save the file of hydrated tweets. My goal is to open the hydrated tweets file in a Jupyter notebook (using python) and extract the tweet text to analyse it. What is the best way to save the hydrated tweets to be able to open them in this manner? This is for a school project, so any learning assistance would be appreciated.

hydrator keeps crashing

Ran intoerror. Won't work. Uninstalled, reinstalled but still showing error. Any solution?

Generated CSV file does not contain tweets

After my first use of Hydrator (hydrated one set of tweets successfully) any set of Tweet IDs uploaded to Hydrator produce a CSV file with absolutely nothing - size 0 bytes. Including the first set that went through successfully.

Already tried reinstalling.
Running Windows 10.

Automated Data Processing At Scale

Does this work on multiple CSV files? I'm thinking of making an automated hydration job based on the contents of a single folder either locally or on a server. Is there a way to automate access to the application through a CLI or run it locally to parse through a directory?

Feature Request: 32-bit ARM Debian/Ubuntu release

Sorry if this isn't the right place for this, but I've spent most of the day today trying to get this running on a Raspberry Pi, and was hoping you might release a 32-bit ARM .deb release in the same manner as the others. I figure Raspberry Pis have to be a pretty popular IoT device for tasks like this, so I imagine it would be helpful to others too.

Regardless of whether you get around to it/ it's feasible, I'm grateful for this application- it's been incredibly helpful (the recovery feature is amazing! It really saved me when I lost power/internet, and when someone turned off the computer the process was running on). Thank you so much!

converting to csv before completion

Hey again!

The hydrator is working great in Linux. However, it would be great to have the option to convert the json files to csv even at any point in the hydration, not only when it's completed. I'm trying to hydrate 40+ M tweets and the .json file takes a lot of space that I'll send to the trash after completion.

Another option is to split the hydration into chunks of customizable size (or number of tweets) and being able to convert each chunk to csv instead of waiting for the whole thing to finish.

Not sure if this is easy to do or not, let me know if I can help somehow.

Blas

CSV tweet text truncated

Retweets can truncate the original tweet text if it is over 140 characters. But the the original text is available in the enclosed retweet: .retweeted_status.full_text. The CSV export should get the longest representation of the text. I think it may need to do the same thing for quotes.

See 1221739807711232000 for an example. But really any long tweet that has been retweeted should do.

Dataset Description

It could be useful to add description metadata field for a prose description of the dataset.

A typo in the generated CSV file: reweet_id

When generating the CSV file out of the JSONL hydration, there's a column which I use to check if a tweet is a retweet or original: reweet_id, and it should be retweet_id I assume.

Unable to build from source on macOS

npm run dev works well for testing the hot-swappable version but when I try to package the app using npm run package-all I receive the following error on macOS 10.12.2, node v7.2.1 and the packaging process fails to build the binary.

Error: Cannot check wine version: Error: Exit code: ENOENT. spawn wine ENOENT
    at /Users/machawk1/Downloads/hydrator/node_modules/electron-builder/src/packager.ts:314:13
    at Generator.throw (<anonymous>)
From previous event:
    at checkWineVersion (/Users/machawk1/Downloads/hydrator/node_modules/electron-builder/out/packager.js:57:22)
    at /Users/machawk1/Downloads/hydrator/node_modules/electron-builder/src/packager.ts:143:17
From previous event:
    at Packager.doBuild (/Users/machawk1/Downloads/hydrator/node_modules/electron-builder/out/packager.js:275:11)
    at /Users/machawk1/Downloads/hydrator/node_modules/electron-builder/src/packager.ts:114:38
    at Generator.next (<anonymous>)
    at runCallback (timers.js:649:20)
    at tryOnImmediate (timers.js:622:5)
    at processImmediate [as _immediateCallback] (timers.js:594:5)
From previous event:
    at Packager.build (/Users/machawk1/Downloads/hydrator/node_modules/electron-builder/out/packager.js:227:11)
    at /Users/machawk1/Downloads/hydrator/node_modules/electron-builder/src/builder.ts:249:40
    at Generator.next (<anonymous>)
From previous event:
    at build (/Users/machawk1/Downloads/hydrator/node_modules/electron-builder/out/builder.js:90:21)
    at Object.<anonymous> (/Users/machawk1/Downloads/hydrator/node_modules/electron-builder/out/cli/build-cli.js:68:41)
    at Module._compile (module.js:571:32)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:488:32)
    at tryModuleLoad (module.js:447:12)
    at Function.Module._load (module.js:439:3)
    at Module.runMain (module.js:605:10)
    at run (bootstrap_node.js:420:7)
    at startup (bootstrap_node.js:139:9)
    at bootstrap_node.js:535:3

npm ERR! Darwin 16.3.0
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "package-all"
npm ERR! node v7.2.1
npm ERR! npm  v3.10.10
npm ERR! code ELIFECYCLE
npm ERR! [email protected] package-all: `npm run build && build -mwl`
npm ERR! Exit status 255
npm ERR! 
npm ERR! Failed at the [email protected] package-all script 'npm run build && build -mwl'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the electron-react-boilerplate package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     npm run build && build -mwl
npm ERR! You can get information on how to open an issue for this project with:
npm ERR!     npm bugs electron-react-boilerplate
npm ERR! Or if that isn't available, you can get their info via:
npm ERR!     npm owner ls electron-react-boilerplate
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR!     /Users/machawk1/Downloads/hydrator/npm-debug.log

Authentication issue - TwitterPinAuth's requestAuthURL (through the "Link Twitter Account" button)

I'm attempting to run the program through yarn run develop.

Hitting "Link Twitter Account" errors out in the requestAuthURL call (provided by twitter-pin-auth). For the specific output, I get

getting Twitter auth url
error when getting Twitter auth url
{
statusCode: 401,
data: '{"errors":[{"code":32,"message":"Could not authenticate you."}]}'
}

I receive this error on both Ubuntu 20.04.1 LTS and on Raspbian, both running on a 32-bit ARM Raspberry Pi B.

Consistent Data Loss over Multiple Pulls

I ran 14 datasets through the tool and each returned a dataset with roughly 33% data loss. However, I've noticed that screenshots of other pulls have differing values. Is my consistent loss due to my Developer Account/App permissions or is it just chance and simply due to the dataset?

CSV option is greyed out

I just got done with a 100 gb jsonl file but the csv option is greyed out, how to solve this?

Nothing happens when I click the button "Link Twitter Account".

I installed the Hydrator on Windows (using Hydrator-Setup-0.0.12.exe). And then I clicked the button "Link Twitter Account" under settings, but there was no response. Is there anything else I need to do?
I got some Twitter ids from the public data set. And I want to hydrate Twitter ID datasets. In addition, I applied for Twitter developer account but failed. So I only have a regular Twitter account and do not have a Twitter developer account. Can the Hydrator work without a Twitter developer account? Hope to get a reply soon, thank you!

Hydrator not reading tweets

Hi
I have installed hydrator on my windows system and have also linked my twitter account. But when I start hydrating no tweetIDs are read.
hydrator
It was working before but my system got shutdown due to some power issues, and after I restating, now no tweetIDs are getting read.
Please help.

Hydrator slowing down

I noticed that hydrator slowed down to a snail's pace when hydrating 18,000,000 plus tweet ids. Hydrator should have taken approximately 50 hours but it is currently 49 hours and its barely scratched the 7,000,000 tweet mark. It's currently churning out at a rate of 1400 tweets per minute and translates to 84000 per hour. Could you look into this?

A JavaScript error occurred in the main process

OS : Windows 10

Problem : Each time I try hydrating big dataset, I'm facing this Error. Though this error do-not show up for smaller dataset. Can you please look after this issue?
The Screenshot for the same has been attached.

Problem

Time Estimate

Since Hydrator knows how many tweets there are and how long they are taking to hydrate (as well as the rate limits) it could estimate how long the dataset will take to hydrate.

Limit of records

Hi, I want to know the limitation of records that Hydrator can hydrate. Will I be able to hydrate a total amount of 125,680,841 Twitter API?

Estimate Completion Time

It might be useful for Hydrator to guess at how long hydration might take. I'm hydrating 13M tweets and it's clearly working, but it's not very clear how long it will likely take.

Notice of Authentication Errors

I've noticed that I need to reauthorize my Hydrator app after shutting down. It doesn't seem to happen all the time, but when it does the Hydrator just fails silently. It should instead report the problem visually and let the user relink their Twitter account.

Tweet id file error

While trying to start hydrating a txt file with only the tweet ids, I get the following message
image

Active CSV Generation

When generating CSV data the CSV button is grayed out slightly. It would be better to have something that is more obvious, perhaps an animation of some kind?

Tweets are limited to 140 characters

Hi,
Currently the fetched tweets are limited to 140 characters. see below:
"display_text_range":[0,140].
Is there any option to fetch all 280 chars.
Thanks

Report JSON parsing errors.

Getting this error when i run a small sample of ~300 tweet IDs. None of the IDs are getting hydrated.
This is what the tool shows:
hydratorerror

Clean

When pointed at a JSONL file Hydrator could rehydrate or clean the dataset, filtering out any tweets that have been deleted. In addition the Hydrator could allow user to enter an optional list of users to exclude from the dataset, if someone has requested their data be removed, and they don't want to force the user to delete or protect their tweets.

(Thanks to @bergisjules for the use case and the name).

File removal error?

The app should probably display some error message when the underlying tweet ID file has been removed. Right now if you remove a tweet id file during hydration the app just stops, and it's not clear what is wrong.

Memory Leak

If you let Hydrator chew on a large number of tweet identifiers for a while (a few hours) you will notice that there is a memory leak somewhere. This can result in a large memory footprint that can be problematic for machines with limited memory. In principle it should be able to work with a very small memory footprint.

CPU Usage

As Hydrator reads deeper into the tweet id file it consumes more of the CPU. After doing 3 million or so it uses 50% of the CPU on my MacBook Air (1.8 GHz Intel Core i5).

I think this is because for every API request readTweetIds reads the first n lines of the file to get the next 100. Ideally there would be an open filehandle that would be read from rather than having to constantly open/re-open the file.

Partial CSV file with disabled CSV button when Hydrator is closed too soon

I was hydrating half a million tweets and since it was the first time I use Hydrator, I clicked the CSV button upon it finished and once I saw the CSV file I innocently closed Hydrator. However, the CSV file was incomplete and it contained around 300k tweets rather than 500k. When trying to re-save the CSV file from the Datasets tab, I couldn't as the button was disabled.

I tried to convert the huge JSONL to CSV but my machine couldn't handle it (16GB RAM), so as a quick workaround I re-hydrated all the tweets again and saved the CSV file properly this time.

Just reporting what's happened here because I feel it's tantamount to a bug. I am using Windows 10 by the way.

Hydrator getting stuck on "add dataset" on Mac

I recently installed the latest release of hydrator and was able to open it up and link my twitter account, but when I try to add a dataset of tweets I get this error message. I might be missing something obvious here, but any help would be greatly appreciated!

Screen Shot 2020-04-21 at 7 01 32 AM

Unexpected End of JSON input

I am getting this error upon trying to save as a CSV. It saves correctly as JSON and this only happens when I click the csv button. I've tried it on different sized data sets as well.
error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.