GithubHelp home page GithubHelp logo

massmove / attackvectors Goto Github PK

View Code? Open in Web Editor NEW
392.0 41.0 37.0 31.13 MB

A repository to monitor attack vectors from state-backed information operations

License: MIT License

HTML 82.56% Python 8.12% CSS 0.01% C# 9.31%
disinformation osint-reconnaissance transparency activism politics

attackvectors's Introduction

MassMove

A repository to organize group efforts to steer public opinion towards the interests of the masses

attackvectors's People

Contributors

axelstudios avatar bermos avatar ccarlton avatar codehtmai avatar darkmeat avatar heptoxide avatar iamcoder avatar jbc22 avatar kamoh avatar karan avatar key-equivalent avatar kleprevost avatar lmoroney avatar mariotacke avatar mentor20 avatar tonylizza avatar xboatvanx avatar zanetaylor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

attackvectors's Issues

Aggregate other "publications"

Not sure if you're aware already, but each publication has what looks like a list of other publications in the same region at the bottom of the site. Shouldn't be too difficult to whip up a web crawler to aggregate a list of all of their sites to add here. I can take a crack at it this weekend.
Screen Shot 2020-02-21 at 11 03 17 AM

Is there a single source of truth for the fake local journals?

I'm working on a Firefox extension that uses the list of fake local journals, but I've also seen a few other lists/csv files around the repo and I just want to make sure that I'm pulling the list of confirmed fake local journals and that there isn't any other list in the repo that I should be using.

To make things a little clearer, I'm making a Firefox extension that injects a warning to sites that are known to be fake local journals.

Update based on fb:pages

Hey, folks! Love your work!

I've been looking at <meta name="fb:pages" content="(\d+)" /> tags, and although I've confirmed most of the Facebook pages reported in sites.csv, some are apparently pointing to different pages, while some are new (not tracked in sites.csv). What I did was take the FB page ID from the scraped sites and open facebook.com/:id. Following is a list of FB page IDs, urls, and domains listed in sites.csv that don't match what's currently in the repo, or are not filled in.

440850842779072 - https://www.facebook.com/dupagepolicyjournal/
annearundeltoday.com - new
baltimorecitywire.com - new
northbaltimorejournal.com - new

299588323424419 - https://www.facebook.com/legalnewsline/
cookcountyrecord.com - stored here as facebook/cookcountyrecord
socalrecord.com - new

573209409408570 - https://www.facebook.com/CookCountyRecord/
setexasrecord.com - stored here as facebook/SETexasRecord
stlrecord.com - stored here as facebook/stlrecord

As an aside

What's interesting to me is the Legal Newsline connection. I initially started scraping for "GET THE APP" (before noticing there's already a column in the CSV for that), and was looking at "The Record, Inc." developer. Their apps use the same orange shower-looking thing that's in the Legal Newsline logo. I did a reverse image search for that and found another developer that has since changed logos: Right Mobile Pty Ltd, from this search. They have a bunch of republican/conservative apps, but haven't looked into it beyond that. I guess it makes sense if they're looking to change laws.

Not sure how much help this is, but I had fun.

Keep up the good work!

Edit: I may have misinterpreted the reverse image search results. I think it's catching the related apps section.

investigate creation date of domain

source of new domains

By googling some of the domains currently listed, I stumbled upon this website:
https://domain-status.com/archives/2019-8-30/com/registered/221

It contained westtxnews.com website that was listed on the github, and by simply searching in the page for "news" I found a few more suspicious website, showing the 404 message.

suspicious pages:

westcontracostanews.com
westdfwnews.com (already listed)
westeldoradonews.com
westhoustonnews.com (already listed)
westnovanews.com (already listed)
westrgvnews.com (already listed)
westsgvnews.com
westventuranews.com

All unlisted domains seem to be on the same AWS server. I listed what I found below so it's recorded, but I'm sure there's more on this server, but I don't know how to get all of them.

Namely:
reverse lookup on westcontracostanews.com (3.222.217.66)
alohastatenews.com
antelopevalleytoday.com
beaverstatenews.com
eastkingnews.com
evergreenreporter.com
kitsapreview.com
moseslaketoday.com
newashingtonnews.com
northkingnews.com
northsnohomishnews.com
nwwashingtonnews.com
olympictimes.com
piercetoday.com
seattlesounder.com
sewashingtonnews.com
southkingnews.com
southsnohomishnews.com
southsoundtimes.com
spokanecotimes.com
spokanestandard.com
tricitiesreporter.com
vancouverreporter.com
waislenews.com
wenatcheetimes.com
westcontracostanews.com
westeldoradonews.com
westventuranews.com
yakimatimes.com

all of the above are not listed right now

Facebook Page Transparency details and new networks

While investigating tonight I noticed a new feature on Facebook Pages called Page Transparency. This has info about the page ownership that could be useful to connect the dots and identify other pages.

I have seen 2 different designations so far for page ownership.

  • Confirmed Page Owner
  • Page Owner

There are sites that have been identified as connected in this that can easily show other related sites with a quick Google site search.

site:facebook.com "Local Government Information Services"

Searching for the apparent owner of many of the initial sites from Illinois comes up with around 363 results. Most are these types of sites, but I did find 2 that do not fit the news mold, but are definitely pushing their core issues.

Some other owner names from other Facebook pages we are tracking in the CSV.

  • Local Government Information Services
  • The Record

Many others listed have unnamed owners, but show Metric Media banners across the top of the page.

Looking into the pages owned by The Record shows the following ownership information:

Page Owner: The Record
Rolling Meadows, IL 60008
United States of America
+13127362092

Additional investigating into Metric Media has uncovered some interesting information, and a new network of sites running labeled "Product of LocalLabs" in the footer with author "Metric Media News Service". See https://collegeparktoday.com/author/metric-media-news-service

A NY Times article about Metric Media and who runs it.
https://www.nytimes.com/2019/10/21/us/michigan-metric-media-news.html

His bio page https://www.situationmanagementgroup.com/brad-bio.

President of Metric Media LLC, a national media company that operates more than 1,100 community-based news sites.

Oddly enough LGIS and LocalLabs were both named in a complaint by the FEC for their original scheme in Illinois. https://www.fec.gov/files/legal/murs/7148/19044475209.pdf

Post authors are active on all sites

This is something I discovered while weeding out the automated GasBuddy.com, census.gov and FB page reposts submitted by authors "Press release submission" and MMNS. Outside of those automated blog posts there are actual content writers that are submitting their own work (I didn't insist on looking for signs of plagiarism), but here's a catch...

These content writers are submitting content to sites in multiple states. They note "content writer" as occupation in their public LinkedIn profile, and probably make a legit living out of it by just picking up news from across the web and reporting on them on the appropriate local site. I guess that's a thing people do, so maybe nothing iffy about it. But then comes the kicker...

They apparently own accounts on ALL sites even if they don't post there. This leads me to think there's some central CnC admin that's managing all of these sites. I wouldn't be surprised if they don't even know where their blog post is ending up and it's someone else approving and pushing their content.

  1. Pick any site with the black top bar template (not the red ones - I haven't tried those)
  2. Go to a category and find a post written under any person's name. If you can't find any then try another site.
  3. Click the person's name, or if the name is not clickable convert it to a slug string and go to site-a.com/author/john-doe
  4. Copy "/author/john-doe" from the URL and append it to any other site that uses the same template: site-b.com/author/john-doe.

You will either see a list of their posts on the second site, or the message "No Results" which means they have an account there but haven't posted anything. If the account didn't exist the page would render latest news instead (try it with a gibberish author slug).

I might dig some more into this, maybe do a list of all content writers and see where it goes from there. What do you think, any suggestions?

Found another publication: *businessdaily.com

awsOrigin: 34.234.42.207
domainName: ec2-34-234-42-207.compute-1.amazonaws.com

domains:

albusinessdaily.com
akbusinessdaily.com
azbusinessdaily.com
arbusinessdaily.com
cabusinessdaily.com
cobusinessdaily.com
ctbusinessdaily.com
debusinessdaily.com
flbusinessdaily.com
gabusinessdaily.com
hibusinessdaily.com
idbusinessdaily.com
ilbusinessdaily.com
indianabusinessdaily.com
iabusinessdaily.com
ksbusinessdaily.com
kybusinessdaily.com
keystonebusinessnews.com
labusinessdaily.com
mainebusinessdaily.com
mdbusinessdaily.com
mabusinessdaily.com
mibusinessdaily.com
mnbusinessdaily.com
msbusinessdaily.com
mobusinessdaily.com
mtbusinessdaily.com
nebraskabusinessdaily.com
nvbusinessdaily.com
nhbusinessdaily.com
njbusinessdaily.com
nmbusinessdaily.com
nybusinessdaily.com
ncbusinessdaily.com
ndbusinessdaily.com
ohbusinessdaily.com
okbusinessdaily.com
orbusinessdaily.com
palmettobusinessdaily.com
ribusinessdaily.com
sdbusinessdaily.com
tnbusinessdaily.com
txbusinessdaily.com
utbusinessdaily.com
vtbusinessdaily.com
vabusinessdaily.com
wabusinessdaily.com
dcbusinessdaily.com
wibusinessdaily.com
wybusinessdaily.com

I don't have time to crawl these for title, facebook url tonight, but feel free to add them to sites.csv once we merge #10

Bad Locations/GeoLoc's.

columbustandard.com is for Columbus, GA not Columbus, OH
bluestonenews.com is for somewhere in WV, not Columbus, OH

Domain naming patterns

While looking through the currently identified domains and finding some new ones, I noticed some patterns to the domain naming convention and started listing the familiar names used in news publications on the end of the domains.

The naming convention consists of like 2 or 3 parts.

  • A cardinal direction (optional, but heavily used)
  • A geographical location, state, county, city, or town (required)
  • A familiar name used in existing news publications (required)

Cardinal Direction Examples

centraloctimes.com
northoctimes.com
southoctimes.com
westoctimes.com
eastoctimes.com is not registered currently.

Geographical Location, State, County, City, Town Examples

centralalamedanews.com
centraloregontimes.com
coachellatoday.com
eastsierranews.com
fresnoleader.com

Familiar Names Used in News Publications

news
times
reporter
sun
today
standard
leader
review
courant
sentinel
republic
wire
journal

Blocklist for Pi-Hole?

Hi,

Could you also make a blocklist that works with Pi Hole? https://pi-hole.net/.

I think it's almost exactly the same as the uBlock one, just without the pipes at the beginning of each domain.

Thanks!!

Suggestion: A writer.

I can tell that the data here is an excellent tool to reveal the machine and its manipulation of the unaware and non-savvy (especially older people).

The problem is that, even for someone aware and savvy, it's difficult to use the data to spell out the meaning in terms that can be understood by those that need to understand.

You might try to get in touch with the EFF, or one of the many free-press organizations. They can put you in touch with a writer, editor, or someone with publication skills, to make this information presentable to people that have no idea how any of this works.

Add state to CSV

The CSV has lat and lng but no state. When I found this project, the first thing I wanted to do was check which news sites were local to me, but it's hard to do right now.

I used a spreadsheet formula and these bounding boxes to get my answer, but it should be easier.

Geotag domains

It would be great to find approximate coordinates for the cities and regions having dedicated domains. This would show us concentrations of misinformation. This information would be most useful when it is added to the sites.csv file for easy processing.

From what I can gather, the domains roughly follow this pattern:
<city|region>[reporter|times|leader|today|news|wire|standard|sun|review]

Stripping away the suffix and running the city/region through Google maps should give us a general idea in most cases.

https://metrobusinessnetwork.com/

Add domains

Found these via AddThisPubID tracking ID ra-572bcb5be71832bd with highereducationtribune.com being the initial pivot point:

hostname firstSeen lastSeen
megadealernews.com 2017-06-15 15:57:33 2019-12-01 00:43:23
highereducationtribune.com 2019-08-06 18:09:12 2019-10-15 12:44:39
americansecuritynews.com 2016-06-12 15:07:07 2019-09-16 15:49:55
hrdailywire.com 2017-05-30 23:13:24 2018-09-11 19:16:29
maghrebnewswire.com 2017-08-08 20:24:26 2017-10-19 15:01:57
dentalhealthwire.com 2016-10-12 18:30:31 2017-09-24 20:54:58
tinewsdaily.com 2016-06-02 16:55:22 2017-09-20 01:12:31
americanpharmacynews.com 2016-05-18 13:04:55 2017-09-15 21:27:05
gulfnewsjournal.com 2016-05-10 02:17:32 2017-09-10 13:23:48
patientdaily.com 2016-05-27 23:15:13 2017-08-26 06:13:05
balkanbusinesswire.com 2017-01-24 22:42:51 2017-08-14 18:52:28
latinbusinessdaily.com 2017-06-12 13:56:29 2017-06-12 13:56:29
cropprotectionnews.com 2016-05-22 22:01:23 2017-06-06 07:20:09
vaccinenewsdaily.com 2016-05-22 07:41:24 2017-05-24 22:01:50
farminsurancenews.com 2017-03-30 04:00:01 2017-03-30 04:00:01
aminewswire.com 2016-05-26 23:31:25 2017-03-12 11:04:32
flarecord.com 2016-06-03 17:51:43 2017-03-09 12:34:36
westafricawire.com 2016-05-24 01:14:45 2017-02-05 18:40:04
bioprepwatch.com 2016-11-09 05:38:46 2016-11-09 05:38:46

Code to add more domains

By including the code to append to the .csv files containing the domains, it would be much easier to create a pull request with more websites.

newsbreak.com

This is not on the same network as the others. I'm not entirely sure it isn't legit, but the facebook page and outbound links are extremely suspicious including sites on this list. It could just be an innocent aggregator though. There's also newsbreak app. If this isn't legit, I don't think it's related to Timpone's network of sites.

Related entities CSV

Perhaps we need some sort of related entities CSV to track companies and other organizations involved.

I've identified several already with some sort of relation to this, and just now came across image links in the source code on some of these sites that point to https://jnswire.s3.amazonaws.com/jns-media URLs. I did a quick search of jns media and didn't find much, but when I searched for jnswire, I came up with a few things.

A Twitter page named Journatic News Wire started in... wait for it... 2012. https://twitter.com/jnswire

Another quick search of Journatic because it sounded familiar in relation to all of this so far came up with this a bunch of articles from 2012 related to Journatic publishing fake bylines...

Something that stood out to me in my Facebook investigating yesterday was that many of these pages though the Facebook Page Transparency feature show that someone is managing the page from the Philippines. This quote from the NPR article stands out due to the fact that they said they were doing the fake bylines with people from the Philippines...

"I don't know those communities, and I have no stake in them. And so it didn't matter to me that I found out all the information and I got it right," Smith said. "There is just something inauthentic about the whole process. And the picking of fake names for these writers in the Philippines is just a symptom of that."

It looks like Journatic has become Locality Labs (LocalLabs is running some SSL certs as per #19 ) through Brian Timpone as he is marked as CEO on their LinkedIn page.

The Tow Center for Digital Journalism at Columbia Journalism School has done a fair amount of investigative legwork on this identifying 450 sites, 12 state networks, and 5 separate corporate entities involved in the scheme in Dec. of 2019. https://www.cjr.org/tow_center_reports/hundreds-of-pink-slime-local-news-outlets-are-distributing-algorithmic-stories-conservative-talking-points.php

Of the 450 sites we discovered, at least 189 were set up as local news networks across ten states within the last twelve months by an organization called Metric Media.

That number seems to have grown exponentially in the last few months, as Metric Media's president's bio says they have over 1,100 sites. https://www.situationmanagementgroup.com/brad-bio

President of Metric Media LLC, a national media company that operates more than 1,100 community-based news sites.

Lafayette, LA vs. Lafayette, IN

Hello, just wanted to let you know that the site listed for Lafayette, LA is actually a site for Lafayette, IN. Thank you for putting this project together! People really need to know about this. The actual page is lafayettereporter.com

Any interest in a twitter bot (like the reddit one)?

Based on local journals sites file, I whipped up https://twitter.com/TakeoverBot. It doesn't run all the time, and isn't even built to handle all errors properly - it's ~70% done. But getting to 90+% would be a few more hours of work.

The repo is private (hard-coded keys lol), but I can share the code if there is interest in maintaining and researching the findings. Any takers?

Its relatively simple go code with likely many optimizations:

  • Using an actual DB for tracking tweets sent (instead of a local csv file)
  • Smarter rate limit handling
  • Discarding old tweets (say older than a week)
  • Saving aggregate data for analysis
  • Linking to a form to collect feedback
  • Possibly running in multiple threads (go func())
  • Variations of messages posted
  • User opt-out

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.