massmove / attackvectors Goto Github PK

View Code? Open in Web Editor NEW

392.0 41.0 37.0 31.13 MB

A repository to monitor attack vectors from state-backed information operations

License: MIT License

HTML 82.56% Python 8.12% CSS 0.01% C# 9.31%

disinformation osint-reconnaissance transparency activism politics

attackvectors's Introduction

MassMove

A repository to organize group efforts to steer public opinion towards the interests of the masses

attackvectors's People

Contributors

Stargazers

Watchers

attackvectors's Issues

Aggregate other "publications"

Not sure if you're aware already, but each publication has what looks like a list of other publications in the same region at the bottom of the site. Shouldn't be too difficult to whip up a web crawler to aggregate a list of all of their sites to add here. I can take a crack at it this weekend.

Is there a single source of truth for the fake local journals?

I'm working on a Firefox extension that uses the list of fake local journals, but I've also seen a few other lists/csv files around the repo and I just want to make sure that I'm pulling the list of confirmed fake local journals and that there isn't any other list in the repo that I should be using.

To make things a little clearer, I'm making a Firefox extension that injects a warning to sites that are known to be fake local journals.

Update based on fb:pages

Hey, folks! Love your work!

I've been looking at <meta name="fb:pages" content="(\d+)" /> tags, and although I've confirmed most of the Facebook pages reported in sites.csv, some are apparently pointing to different pages, while some are new (not tracked in sites.csv). What I did was take the FB page ID from the scraped sites and open facebook.com/:id. Following is a list of FB page IDs, urls, and domains listed in sites.csv that don't match what's currently in the repo, or are not filled in.

440850842779072 - https://www.facebook.com/dupagepolicyjournal/
annearundeltoday.com - new
baltimorecitywire.com - new
northbaltimorejournal.com - new

299588323424419 - https://www.facebook.com/legalnewsline/
cookcountyrecord.com - stored here as facebook/cookcountyrecord
socalrecord.com - new

573209409408570 - https://www.facebook.com/CookCountyRecord/
setexasrecord.com - stored here as facebook/SETexasRecord
stlrecord.com - stored here as facebook/stlrecord

As an aside

What's interesting to me is the Legal Newsline connection. I initially started scraping for "GET THE APP" (before noticing there's already a column in the CSV for that), and was looking at "The Record, Inc." developer. Their apps use the same orange shower-looking thing that's in the Legal Newsline logo. I did a reverse image search for that and found another developer that has since changed logos: Right Mobile Pty Ltd, from this search. They have a bunch of republican/conservative apps, but haven't looked into it beyond that. I guess it makes sense if they're looking to change laws.

Not sure how much help this is, but I had fun.

Keep up the good work!

Edit: I may have misinterpreted the reverse image search results. I think it's catching the related apps section.

Review voteref.com connection

Hmm: https://github.com/MassMove/AttackVectors/blob/master/LocalJournals/foia/D.-N.-M.-22-cv-00222-dckt-000013_000-filed-2022-04-14.pdf

investigate creation date of domain

source of new domains

By googling some of the domains currently listed, I stumbled upon this website:
https://domain-status.com/archives/2019-8-30/com/registered/221

It contained westtxnews.com website that was listed on the github, and by simply searching in the page for "news" I found a few more suspicious website, showing the 404 message.

suspicious pages:

westcontracostanews.com
westdfwnews.com (already listed)
westeldoradonews.com
westhoustonnews.com (already listed)
westnovanews.com (already listed)
westrgvnews.com (already listed)
westsgvnews.com
westventuranews.com

All unlisted domains seem to be on the same AWS server. I listed what I found below so it's recorded, but I'm sure there's more on this server, but I don't know how to get all of them.

Namely:
reverse lookup on westcontracostanews.com (3.222.217.66)
alohastatenews.com
antelopevalleytoday.com
beaverstatenews.com
eastkingnews.com
evergreenreporter.com
kitsapreview.com
moseslaketoday.com
newashingtonnews.com
northkingnews.com
northsnohomishnews.com
nwwashingtonnews.com
olympictimes.com
piercetoday.com
seattlesounder.com
sewashingtonnews.com
southkingnews.com
southsnohomishnews.com
southsoundtimes.com
spokanecotimes.com
spokanestandard.com
tricitiesreporter.com
vancouverreporter.com
waislenews.com
wenatcheetimes.com
westcontracostanews.com
westeldoradonews.com
westventuranews.com
yakimatimes.com

all of the above are not listed right now

Facebook Page Transparency details and new networks

While investigating tonight I noticed a new feature on Facebook Pages called Page Transparency. This has info about the page ownership that could be useful to connect the dots and identify other pages.

I have seen 2 different designations so far for page ownership.

Confirmed Page Owner
Page Owner

There are sites that have been identified as connected in this that can easily show other related sites with a quick Google site search.

site:facebook.com "Local Government Information Services"

Searching for the apparent owner of many of the initial sites from Illinois comes up with around 363 results. Most are these types of sites, but I did find 2 that do not fit the news mold, but are definitely pushing their core issues.

defendhomeowners.com (inactive)
saveyourhomenow.org (active with Facebook page) https://www.facebook.com/pages/category/Interest/Save-Your-Home-Now-469980050187830/

Some other owner names from other Facebook pages we are tracking in the CSV.

Local Government Information Services
The Record

Many others listed have unnamed owners, but show Metric Media banners across the top of the page.

Looking into the pages owned by The Record shows the following ownership information:

Page Owner: The Record
Rolling Meadows, IL 60008
United States of America
+13127362092

Additional investigating into Metric Media has uncovered some interesting information, and a new network of sites running labeled "Product of LocalLabs" in the footer with author "Metric Media News Service". See https://collegeparktoday.com/author/metric-media-news-service

A NY Times article about Metric Media and who runs it.
https://www.nytimes.com/2019/10/21/us/michigan-metric-media-news.html

His bio page https://www.situationmanagementgroup.com/brad-bio.

President of Metric Media LLC, a national media company that operates more than 1,100 community-based news sites.

Oddly enough LGIS and LocalLabs were both named in a complaint by the FEC for their original scheme in Illinois. https://www.fec.gov/files/legal/murs/7148/19044475209.pdf

RiskIQ analysis of Locality Labs/Metric Media domains

I downloaded your csv and found 1,158 domains listed after deduplication. I pasted them into my RiskIQ account and it's being fussy about taking more than 1,000 at a time, but I've started this project in public. You'd need a RiskIQ community account to see, but as I work through this there will be additional entities that show up - IPs, trackers, etc.

https://community.riskiq.com/projects/e0a44c82-d151-4434-b458-1926ebf44f06

Add domains from Google Analytics based site discovery

And check legitimacy: https://www.reddit.com/r/MassMove/comments/fc46eo/google_analytics_based_site_discovery/

Add domains from BBB

https://www.bbb.org/us/il/chicago/profile/home-sales/locality-labs-llc-0654-90019349 => https://desmoinesguide.com/ => http://spyonweb.com/ua-98899428:

tricountytoday.com

warrencountynews.com

Running a google search with random parts from their about pages keeps churning out more:

grimesjournal.com

urbandaletimes.com

Add facebookFollowers

Post authors are active on all sites

This is something I discovered while weeding out the automated GasBuddy.com, census.gov and FB page reposts submitted by authors "Press release submission" and MMNS. Outside of those automated blog posts there are actual content writers that are submitting their own work (I didn't insist on looking for signs of plagiarism), but here's a catch...

These content writers are submitting content to sites in multiple states. They note "content writer" as occupation in their public LinkedIn profile, and probably make a legit living out of it by just picking up news from across the web and reporting on them on the appropriate local site. I guess that's a thing people do, so maybe nothing iffy about it. But then comes the kicker...

They apparently own accounts on ALL sites even if they don't post there. This leads me to think there's some central CnC admin that's managing all of these sites. I wouldn't be surprised if they don't even know where their blog post is ending up and it's someone else approving and pushing their content.

Pick any site with the black top bar template (not the red ones - I haven't tried those)
Go to a category and find a post written under any person's name. If you can't find any then try another site.
Click the person's name, or if the name is not clickable convert it to a slug string and go to site-a.com/author/john-doe
Copy "/author/john-doe" from the URL and append it to any other site that uses the same template: site-b.com/author/john-doe.

You will either see a list of their posts on the second site, or the message "No Results" which means they have an account there but haven't posted anything. If the account didn't exist the page would render latest news instead (try it with a gibberish author slug).

I might dig some more into this, maybe do a list of all content writers and see where it goes from there. What do you think, any suggestions?

Found another publication: *businessdaily.com

awsOrigin: 34.234.42.207
domainName: ec2-34-234-42-207.compute-1.amazonaws.com

domains:

albusinessdaily.com
akbusinessdaily.com
azbusinessdaily.com
arbusinessdaily.com
cabusinessdaily.com
cobusinessdaily.com
ctbusinessdaily.com
debusinessdaily.com
flbusinessdaily.com
gabusinessdaily.com
hibusinessdaily.com
idbusinessdaily.com
ilbusinessdaily.com
indianabusinessdaily.com
iabusinessdaily.com
ksbusinessdaily.com
kybusinessdaily.com
keystonebusinessnews.com
labusinessdaily.com
mainebusinessdaily.com
mdbusinessdaily.com
mabusinessdaily.com
mibusinessdaily.com
mnbusinessdaily.com
msbusinessdaily.com
mobusinessdaily.com
mtbusinessdaily.com
nebraskabusinessdaily.com
nvbusinessdaily.com
nhbusinessdaily.com
njbusinessdaily.com
nmbusinessdaily.com
nybusinessdaily.com
ncbusinessdaily.com
ndbusinessdaily.com
ohbusinessdaily.com
okbusinessdaily.com
orbusinessdaily.com
palmettobusinessdaily.com
ribusinessdaily.com
sdbusinessdaily.com
tnbusinessdaily.com
txbusinessdaily.com
utbusinessdaily.com
vtbusinessdaily.com
vabusinessdaily.com
wabusinessdaily.com
dcbusinessdaily.com
wibusinessdaily.com
wybusinessdaily.com

I don't have time to crawl these for title, facebook url tonight, but feel free to add them to sites.csv once we merge #10

Bad Locations/GeoLoc's.

columbustandard.com is for Columbus, GA not Columbus, OH
bluestonenews.com is for somewhere in WV, not Columbus, OH

Domain naming patterns

While looking through the currently identified domains and finding some new ones, I noticed some patterns to the domain naming convention and started listing the familiar names used in news publications on the end of the domains.

The naming convention consists of like 2 or 3 parts.

A cardinal direction (optional, but heavily used)
A geographical location, state, county, city, or town (required)
A familiar name used in existing news publications (required)

Cardinal Direction Examples

centraloctimes.com
northoctimes.com
southoctimes.com
westoctimes.com
eastoctimes.com is not registered currently.

Geographical Location, State, County, City, Town Examples

centralalamedanews.com
centraloregontimes.com
coachellatoday.com
eastsierranews.com
fresnoleader.com

Familiar Names Used in News Publications

news
times
reporter
sun
today
standard
leader
review
courant
sentinel
republic
wire
journal

Blocklist for Pi-Hole?

Hi,

Could you also make a blocklist that works with Pi Hole? https://pi-hole.net/.

I think it's almost exactly the same as the uBlock one, just without the pipes at the beginning of each domain.

Thanks!!

Investigation Notes

https://aikentimes.com/ -> the ssl is for *.locallabs.com

Links of interest:

https://metricmedia-staging01.locallabs.com :: makes the connection to Metric Media
https://gcbot.locallabs.com :: U/P protected
https://roseland-staging.locallabs.com/login
https://sleipnir2.locallabs.com/profile
-- "In Norse mythology, Sleipnir (Old Norse "slippy" or "the slipper") is an eight-legged horse ridden by Odin."

Suggestion: A writer.

I can tell that the data here is an excellent tool to reveal the machine and its manipulation of the unaware and non-savvy (especially older people).

The problem is that, even for someone aware and savvy, it's difficult to use the data to spell out the meaning in terms that can be understood by those that need to understand.

You might try to get in touch with the EFF, or one of the many free-press organizations. They can put you in touch with a writer, editor, or someone with publication skills, to make this information presentable to people that have no idea how any of this works.

https://urbanreform.org/

A whole new undisclosed network of disinformation news sites. Metric Media is listed all over it: https://urbanreform.org/stories/525758614-the-atlantic-county-economic-alliance-receives-over-1-million-from-eda-headquarters-on-july-23-2019

Their agenda is pretty clear in the about us section: https://urbanreform.org/about_us

Add state to CSV

The CSV has lat and lng but no state. When I found this project, the first thing I wanted to do was check which news sites were local to me, but it's hard to do right now.

I used a spreadsheet formula and these bounding boxes to get my answer, but it should be easier.

Geotag domains

It would be great to find approximate coordinates for the cities and regions having dedicated domains. This would show us concentrations of misinformation. This information would be most useful when it is added to the sites.csv file for easy processing.

Stripping away the suffix and running the city/region through Google maps should give us a general idea in most cases.

"Hundreds of ‘pink slime’ local news outlets are distributing algorithmic stories and conservative talking points"

https://www.cjr.org/tow_center_reports/hundreds-of-pink-slime-local-news-outlets-are-distributing-algorithmic-stories-conservative-talking-points.php

https://metrobusinessnetwork.com/

I saw that some of the related sites in the network have been documented already, but I saw some were missed...

Here is a more complete list:

Possibly wrong location for 'kentcountytoday.com' on ArcGIS Map

'kentcountytoday.com' show's its location as being in Grand Rapids Michigan, but the news articles look like its focused in Kent County Delaware.

Review FOIA requests

Initial findings https://github.com/MassMove/AttackVectors/pull/104/files

There are tons more to be found with basic Google-Fu.

https://fdahealthnews.com/

It gets juicer!

https://fdahealthnews.com/
https://www.facebook.com/FDAHealthNews/

Owned by Locality Labs: https://fdahealthnews.com/privacy

Search for exact text reveals similar domains

A simple duckduckgo search for exact text shows similar domains that aren't listed yet. Here's a few:
https://harfordnews.com/
https://montgomerymdnews.com/
https://howardconews.com/

https://americanpharmacynews.com/

Similar to #29 ... Opened a separate issue because I can't find any structured links between the websites.

https://cistranfinance.com/

Locality Labs: https://cistranfinance.com/privacy

https://hansondirectory.com/

Again, owned and operated by Locality Labs: https://hansondirectory.com/privacy

Add domains

Found these via AddThisPubID tracking ID ra-572bcb5be71832bd with highereducationtribune.com being the initial pivot point:

hostname	firstSeen	lastSeen
megadealernews.com	2017-06-15 15:57:33	2019-12-01 00:43:23
highereducationtribune.com	2019-08-06 18:09:12	2019-10-15 12:44:39
americansecuritynews.com	2016-06-12 15:07:07	2019-09-16 15:49:55
hrdailywire.com	2017-05-30 23:13:24	2018-09-11 19:16:29
maghrebnewswire.com	2017-08-08 20:24:26	2017-10-19 15:01:57
dentalhealthwire.com	2016-10-12 18:30:31	2017-09-24 20:54:58
tinewsdaily.com	2016-06-02 16:55:22	2017-09-20 01:12:31
americanpharmacynews.com	2016-05-18 13:04:55	2017-09-15 21:27:05
gulfnewsjournal.com	2016-05-10 02:17:32	2017-09-10 13:23:48
patientdaily.com	2016-05-27 23:15:13	2017-08-26 06:13:05
balkanbusinesswire.com	2017-01-24 22:42:51	2017-08-14 18:52:28
latinbusinessdaily.com	2017-06-12 13:56:29	2017-06-12 13:56:29
cropprotectionnews.com	2016-05-22 22:01:23	2017-06-06 07:20:09
vaccinenewsdaily.com	2016-05-22 07:41:24	2017-05-24 22:01:50
farminsurancenews.com	2017-03-30 04:00:01	2017-03-30 04:00:01
aminewswire.com	2016-05-26 23:31:25	2017-03-12 11:04:32
flarecord.com	2016-06-03 17:51:43	2017-03-09 12:34:36
westafricawire.com	2016-05-24 01:14:45	2017-02-05 18:40:04
bioprepwatch.com	2016-11-09 05:38:46	2016-11-09 05:38:46

Code to add more domains

By including the code to append to the .csv files containing the domains, it would be much easier to create a pull request with more websites.

newsbreak.com

This is not on the same network as the others. I'm not entirely sure it isn't legit, but the facebook page and outbound links are extremely suspicious including sites on this list. It could just be an innocent aggregator though. There's also newsbreak app. If this isn't legit, I don't think it's related to Timpone's network of sites.

https://collegeparktoday.com/

Again, Metric Media listed all over the site, and a whole new network of disinformation news sites with a completely different web design.

https://collegeparktoday.com/

Don't forget the ads

https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=ALL&impression_search_field=has_impressions_lifetime&view_all_page_id=105059500884218

Clicking on the ad in the library will give some info about the targeting

Related entities CSV

Perhaps we need some sort of related entities CSV to track companies and other organizations involved.

I've identified several already with some sort of relation to this, and just now came across image links in the source code on some of these sites that point to https://jnswire.s3.amazonaws.com/jns-media URLs. I did a quick search of jns media and didn't find much, but when I searched for jnswire, I came up with a few things.

A Twitter page named Journatic News Wire started in... wait for it... 2012. https://twitter.com/jnswire

Another quick search of Journatic because it sounded familiar in relation to all of this so far came up with this a bunch of articles from 2012 related to Journatic publishing fake bylines...

Something that stood out to me in my Facebook investigating yesterday was that many of these pages though the Facebook Page Transparency feature show that someone is managing the page from the Philippines. This quote from the NPR article stands out due to the fact that they said they were doing the fake bylines with people from the Philippines...

"I don't know those communities, and I have no stake in them. And so it didn't matter to me that I found out all the information and I got it right," Smith said. "There is just something inauthentic about the whole process. And the picking of fake names for these writers in the Philippines is just a symptom of that."

It looks like Journatic has become Locality Labs (LocalLabs is running some SSL certs as per #19 ) through Brian Timpone as he is marked as CEO on their LinkedIn page.

The Tow Center for Digital Journalism at Columbia Journalism School has done a fair amount of investigative legwork on this identifying 450 sites, 12 state networks, and 5 separate corporate entities involved in the scheme in Dec. of 2019. https://www.cjr.org/tow_center_reports/hundreds-of-pink-slime-local-news-outlets-are-distributing-algorithmic-stories-conservative-talking-points.php

Of the 450 sites we discovered, at least 189 were set up as local news networks across ten states within the last twelve months by an organization called Metric Media.

That number seems to have grown exponentially in the last few months, as Metric Media's president's bio says they have over 1,100 sites. https://www.situationmanagementgroup.com/brad-bio

President of Metric Media LLC, a national media company that operates more than 1,100 community-based news sites.

Lafayette, LA vs. Lafayette, IN

Hello, just wanted to let you know that the site listed for Lafayette, LA is actually a site for Lafayette, IN. Thank you for putting this project together! People really need to know about this. The actual page is lafayettereporter.com

https://highereducationtribune.com/

https://highereducationtribune.com/
https://twitter.com/higheredtribune

Operated by Locality Labs: https://highereducationtribune.com/privacy

Possible partner: NewsGuard

This company offers services built around vetting news sites:

https://www.newsguardtech.com/

They also have browser extensions/apps:
Chrome: https://chrome.google.com/webstore/detail/newsguard/hcgajcpgaalgpeholhdooeddllhedegi

Edge:
https://www.microsoft.com/en-us/p/newsguard/9nwp4lmmkfkt

Firefox:
https://addons.mozilla.org/en-US/firefox/addon/newsguard/

Mac:
https://apps.apple.com/us/app/newsguard/id1438657064?mt=12

Any interest in a twitter bot (like the reddit one)?

Based on local journals sites file, I whipped up https://twitter.com/TakeoverBot. It doesn't run all the time, and isn't even built to handle all errors properly - it's ~70% done. But getting to 90+% would be a few more hours of work.

The repo is private (hard-coded keys lol), but I can share the code if there is interest in maintaining and researching the findings. Any takers?

Its relatively simple go code with likely many optimizations:

Using an actual DB for tracking tweets sent (instead of a local csv file)
Smarter rate limit handling
Discarding old tweets (say older than a week)
Saving aggregate data for analysis
Linking to a form to collect feedback
Possibly running in multiple threads (go func())
Variations of messages posted
User opt-out