GithubHelp home page GithubHelp logo

ibm / amlsim Goto Github PK

View Code? Open in Web Editor NEW
251.0 24.0 82.0 9.98 MB

The AMLSim project is intended to provide a multi-agent based simulator that generates synthetic banking transaction data together with a set of known money laundering patterns - mainly for the purpose of testing machine learning models and graph algorithms. We welcome you to enhance this effort since the data set related to money laundering is critical to advance detection capabilities of money laundering activities.

License: Apache License 2.0

Shell 1.16% Python 62.46% Groovy 4.46% Java 31.93%
graph network-visualization network-science finance-application finance fraud-detection

amlsim's Introduction

Please cite our following papers if you use the data set for your publications.

BibTeX @misc{AMLSim, author = {Toyotaro Suzumura and Hiroki Kanezashi}, title = {{Anti-Money Laundering Datasets}: {InPlusLab} Anti-Money Laundering DataDatasets}, howpublished = {\url{http://github.com/IBM/AMLSim/}}, year = 2021 }

EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs https://arxiv.org/abs/1902.10191

Scalable Graph Learning for Anti-Money Laundering: A First Look https://arxiv.org/abs/1812.00076

Important: Please use the "master" branch for the practical use and testing. Other branches such as "new-schema" are outdated and unstable. Wiki pages are still under construction and some of them do not catch up with the latest implementations. Please refer this README.md instead.

AMLSim

This project aims at building a multi-agent simulator of anti-money laundering - namely AML, and sharing synthetically generated data so that researchers can design and implement their new algorithms over the unified data.

Dependencies

Directory Structure

See Wiki page Directory Structure for details.
NOTE: (October 2021): bin/ folder has been renamed to target/classes/

Introduction for Running AMLSim

See Wiki page Quick Introduction to AMLSim for details.

1. Generate transaction CSV files from parameter files (Python)

Before running the Python script, please check and edit configuration file conf.json.

{
//...
  "input": {
    "directory": "paramFiles/1K",  // Parameter directory
    "schema": "schema.json",  // Configuration file of output CSV schema
    "accounts": "accounts.csv",  // Account list parameter file
    "alert_patterns": "alertPatterns.csv",  // Alert list parameter file
    "degree": "degree.csv",  // Degree sequence parameter file
    "transaction_type": "transactionType.csv",  // Transaction type list file
    "is_aggregated_accounts": true  // Whether the account list represents aggregated (true) or raw (false) accounts
  },
//...
}

Then, please run transaction graph generator script.

cd /path/to/AMLSim
python3 scripts/transaction_graph_generator.py conf.json

2. Build and launch the transaction simulator (Java)

Parameters for the simulator are defined at the "general" section of conf.json.

{
  "general": {
      "random_seed": 0,  // Seed of random number
      "simulation_name": "sample",  // Simulation name (identifier)
      "total_steps": 720,  // Total simulation steps
      "base_date": "2017-01-01"  // The date corresponds to the step 0 (the beginning date of this simulation)
  },
//...
}

Please compile Java files (if not yet) and launch the simulator.

sh scripts/build_AMLSim.sh
sh scripts/run_AMLSim.sh conf.json

2.b. Optional: Install and Use Maven as build system.

On Mac: brew install maven If you already have a java installed, you can run brew uninstall --ignore-dependencies openjdk because brew installs that along with maven as a dependency.

If you choose to use Maven, you only manually need to fetch and place 1 jar file (MASON) in your jars/ folder and then install it using the command shown below. If you do not use Maven, you will have to place all the dependency jar files listed above as dependencies in the jars/ folder.
If using Maven, use the following commands to install the MASON dependency to your local Maven repository.

mvn install:install-file \
-Dfile=jars/mason.20.jar \
-DgroupId=mason \
-DartifactId=mason \
-Dversion=20 \
-Dpackaging=jar \
-DgeneratePom=true

Please compile Java files (if not yet) (will detect and use Maven) and launch the simulator.

sh scripts/build_AMLSim.sh
sh scripts/run_AMLSim.sh conf.json

3. Convert the raw transaction log file

The file names of the output data are defined at the "output" section of conf.json.

{
//...
"output": {
    "directory": "outputs",  // Output directory
    "accounts": "accounts.csv",  // Account list CSV
    "transactions": "transactions.csv",  // All transaction list CSV
    "cash_transactions": "cash_tx.csv",  // Cash transaction list CSV
    "alert_members": "alert_accounts.csv",  // Alerted account list CSV
    "alert_transactions": "alert_transactions.csv",  // Alerted transaction list CSV
    "sar_accounts": "sar_accounts.csv",    // SAR account list CSV
    "party_individuals": "individuals-bulkload.csv",
    "party_organizations": "organizations-bulkload.csv",
    "account_mapping": "accountMapping.csv",
    "resolved_entities": "resolvedentities.csv",
    "transaction_log": "tx_log.csv",
    "counter_log": "tx_count.csv",
    "diameter_log": "diameter.csv"
  },
//...
}
python3 scripts/convert_logs.py conf.json

4. Export statistical information of the output data to image files (optional)

python3 scripts/visualize/plot_distributions.py conf.json

5. Validate alert transaction subgraphs by comparison with the parameter file (optional)

python3 scripts/validation/validate_alerts.py conf.json

6. Remove all log and generated image files from outputs directory and a temporal directory

sh scripts/clean_logs.sh

amlsim's People

Contributors

hkanezashi avatar nelsonjd avatar stevemart avatar suzumura avatar zygm0nt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amlsim's Issues

Timesteps

Hi there,

Thanks for providing this benchmark for this field.
I am wondering about the ideal timesteps with regard to the number of accounts that would resemble real-world data.
I used the suggested 150 with 1K nodes, and I got an almost fully connected network. While in an unlabelled data I have, there are many small connected components, with clustering coefficient that us nearly zero.
I don't if you can give an insight on how to set this parameter correctly.
Cheers

About the documentation of the code

I need to generate a synthetic money transaction network.
I have been working on running the code you shared in your GitHub repository, but the documentation is not sufficiently explicit for a beginner user.
I am using Jupiter notebook and Python 3 for Python codes and Eclipse environment for Java. I couldn’t figure out which Java version should I use to run the transaction simulator.
Also, I did not understand whether the Python and Java actions you briefly explained in "Quick Introduction to AMLSim” are doing the same job in different languages or should I run them both respectively?
Are you going to publish a documentation or a standardized code about this soon? Or may I kindly ask for your assistance?

Transaction_Graph_Generator should generate an adjacency list.

An adjacency list should be generated by the transaction graph generator. The transactions csv file is currently acting as a lengthy adjacency list already and the concept of "transactions" should only exist in the Java (business logic) and the Python exists solely for building graphs.

java.lang.NoSuchMethodError when running the simulator

I am trying to run the simulator on a macos:
sh scripts/run_AMLSim.sh conf.json

The error log is as below.

Dec 01, 2021 11:34:48 AM amlsim.AMLSim loadAccountFile
INFO: Account CSV header: ACCOUNT_ID,CUSTOMER_ID,INIT_BALANCE,START_DATE,END_DATE,COUNTRY,ACCOUNT_TYPE,IS_SAR,TX_BEHAVIOR_ID,BANK_ID
Dec 01, 2021 11:34:48 AM amlsim.AMLSim loadAccountFile
INFO: Account CSV header: ACCOUNT_ID,CUSTOMER_ID,INIT_BALANCE,START_DATE,END_DATE,COUNTRY,ACCOUNT_TYPE,IS_SAR,TX_BEHAVIOR_ID,BANK_ID
Exception in thread "main" java.lang.NoSuchMethodError: sim.engine.Schedule.scheduleRepeating(Lsim/engine/Steppable;)Lsim/engine/IterativeRepeat;
	at amlsim.AMLSim.loadAccountFile(AMLSim.java:269)
	at amlsim.AMLSim.initSimulation(AMLSim.java:135)
	at paysim.PaySim.start(PaySim.java:115)
	at amlsim.AMLSim.executeSimulation(AMLSim.java:433)
	at amlsim.AMLSim.runSimulation(AMLSim.java:108)
	at amlsim.AMLSim.main(AMLSim.java:550)
(base) anand@my AMLSim % javac -version
javac 1.8.0_292

Error in alert config example description?

The description here contains an example of the alert parameter CSV file; the description of the 2nd row 20,fan_in,0,4,6,2700,3000,10,30,,True reads:
"All alerts are SAR (is_sar = True) and all involved accounts are not flagged as SAR."
Should this actually be "... and all involved accounts are flagged as SAR."?

Further, the description of the 3rd row reads:
"All alerts are false-alert (is_sar = False) and all involved accounts are also flagged as SAR"
Should this actually be "... and all involved accounts are not flagged as SAR"?

Account 8841 does not exist

When I try to generate the transaction network for than 1K, I get "Account 8841 does not exist", for the 1M its a slightly different account. What is the solution for this?

Smaller dataset

I tried to generate one for 100 nodes but got problems with the in-degree and out-degree, even though they seemed equal in the param file I tinkered with. I didn't find the wiki pages very helpful on this.
Could you provide a param file for a really small dataset, e.g. 30 nodes, to be used for visualisation purposes?

Generate paramFiles/.../accounts.csv

I would like to change the distribution of the init_balance aka initial deposit of the accounts in output/accounts.csv but in the code the values of min_balance and max_balance are taken from the file in paramFiles/.../accounts.csv instead of the conf.json file. Do you know how to generate a new accounts.csv file with the parameters in the conf.json file?

Origin and Destination balance set to 0

Hi,

When I try to generate the sample_log.csv file I end up having many transaction with a origin and destination balance that is equal to 0.0

I do this

sh scripts/build_AMLSim.sh

sh scripts/clean_logs.sh

python scripts/transaction_graph_generator.py prop.ini paramFiles/1K/accounts.csv paramFiles/1K/degree.csv paramFiles/1K/transactionType.csv

sh scripts/run_AMLSim.sh sample 5

But what I get in the output folder is an account.csv that looks correct, since all the accounts have an init_balance that is not 0, but on the output/sample/sample_log.csv file the majority of my rows looks like this

step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,alertID
0,TRANSFER,22.68,60,0.0,0.0,99,0.0,22.68,0,-1
0,TRANSFER,22.68,60,0.0,0.0,46,0.0,22.68,0,-1
0,TRANSFER,22.68,60,0.0,0.0,8,0.0,22.68,0,-1
0,TRANSFER,22.68,60,0.0,0.0,91,0.0,22.68,0,-1
0,TRANSFER,22.68,60,0.0,0.0,60,0.0,22.68,0,-1
0,TRANSFER,22.68,60,22.68,0.0,93,0.0,22.68,0,-1
0,TRANSFER,22.68,60,0.0,0.0,84,0.0,22.68,0,-1
0,TRANSFER,22.68,60,0.0,0.0,86,0.0,22.68,0,-1

Download example data link fail

The link on the Wiki page to download the example data
here
seems to be broken ("shared file or folder link has been removed or is unavailable to you")
Can this be fixed?
Thanks

transaction time stamp

Hi Authors,

I was wondering if the transaction time stamp is an important feature in the AML studies that you've done. I was a bit surprised that AMLSim doesn't seem to generate different timestamps for the different transactions. Do you have any pointers or advice on this topic?

Output CSV file Label not syncronized

Output files explained in wiki/Input-and-Output-Files are not aligned with the files generated. it doesnot say anything about the LOG.csv generated by run_AMLSim.sh script.

The Log.csv file generated by run_AMLSim.sh is quite unclear when we try to match it with accounts.csv

Can you help in understanding the output log file. and which file we can use to train the predict for ML algo.

LOG.csv

step type amount nameOrig oldbalanceOrg newbalanceOrig nameDest oldbalanceDest newbalanceDest isFraud alertID
1 CASH-OUT 70.86 0 0.0 0.0 1 0.0 70.86 0 -1
1 CASH-OUT 30.19 0 0.0 0.0 14 0.0 30.19 0 -1
2 CASH-OUT 54.1 0 0.0 0.0 0 0.0 54.1 0 -1

Cannot generate data with provided default settings

I got the below error when I tried to execute without modification of the downloaded settings.

AMLSim-master>python scripts\transaction_graph_generator.py conf.json
INFO:main:Random seed: 0
INFO:main:Simulation name: sample
INFO:main:Generated 1000 accounts.
INFO:main:Add 9900 base transactions
Traceback (most recent call last):
File "scripts\transaction_graph_generator.py", line 1154, in
txg.generate_normal_transactions() # Load a parameter CSV file for the base transaction types
File "scripts\transaction_graph_generator.py", line 536, in generate_normal_transactions
self.add_transaction(src, dst) # Add edges to transaction graph
File "scripts\transaction_graph_generator.py", line 580, in add_transaction
self.check_account_exist(orig) # Ensure the originator and beneficiary accounts exist
File "scripts\transaction_graph_generator.py", line 314, in check_account_exist
raise KeyError("Account %s does not exist" % str(aid))
KeyError: "Account {'label': 'account', 'init_balance': 92221.09257625241, 'start': -1, 'end': -1, 'country': 'US', 'business': 'I', 'is_sar': False, 'model_id': 1, 'bank_id': 'bank'} does not exist"

Unable to visualize the transaction simulation

I am expecting to visualize the network generated and the simulation of the transaction with some alerted groups. However even though the simulation results appears on the terminal, the transaction simulation is not visualized
I am using jupyter notebook

After running the command
!sh scripts/run_AMLSim.sh conf.json

General transaction interval: 7
Base transaction amount: Normal = 100.000000, Suspicious= 1000.000000
Random seed: 0
Simulation name: sample
Working directory: tmp/sample/
Dec 03, 2020 4:19:02 PM amlsim.AMLSim parseArgs
INFO: PaySim Properties File: paramFiles/paysim.properties
Dec 03, 2020 4:19:02 PM amlsim.AMLSim parseArgs
INFO: PaySim Properties File: paramFiles/paysim.properties
Dec 03, 2020 4:19:02 PM amlsim.AMLSim parseArgs
INFO: Simulation Steps: 720
Dec 03, 2020 4:19:02 PM amlsim.AMLSim parseArgs
INFO: Simulation Steps: 720
Norm: 100 Case: 50
Dec 03, 2020 4:19:02 PM amlsim.AMLSim initSimulatorName
INFO: Simulator Name: sample
Dec 03, 2020 4:19:02 PM amlsim.AMLSim initSimulatorName
INFO: Simulator Name: sample
Dec 03, 2020 4:19:02 PM amlsim.AMLSim initSimulatorName
WARNING: Output log directory already exists: tmp/sample/
Dec 03, 2020 4:19:02 PM amlsim.AMLSim initSimulatorName
WARNING: Output log directory already exists: tmp/sample/
Dec 03, 2020 4:19:02 PM amlsim.AMLSim executeSimulation
INFO: Transaction log file: tmp/sample/tx_log.csv
Dec 03, 2020 4:19:02 PM amlsim.AMLSim executeSimulation
INFO: Transaction log file: tmp/sample/tx_log.csv
PAYSIM: Financial Simulator v1.0

Dec 03, 2020 4:19:02 PM amlsim.AMLSim loadAccountFile
INFO: Account CSV header: ACCOUNT_ID,CUSTOMER_ID,INIT_BALANCE,START_DATE,END_DATE,COUNTRY,ACCOUNT_TYPE,IS_SAR,TX_BEHAVIOR_ID,BANK_ID
Dec 03, 2020 4:19:02 PM amlsim.AMLSim loadAccountFile
INFO: Account CSV header: ACCOUNT_ID,CUSTOMER_ID,INIT_BALANCE,START_DATE,END_DATE,COUNTRY,ACCOUNT_TYPE,IS_SAR,TX_BEHAVIOR_ID,BANK_ID
Dec 03, 2020 4:19:02 PM amlsim.AMLSim loadAccountFile
INFO: Number of total accounts: 1000
Dec 03, 2020 4:19:02 PM amlsim.AMLSim loadAccountFile
INFO: Number of total accounts: 1000
Dec 03, 2020 4:19:02 PM amlsim.AMLSim loadAlertMemberFile
INFO: Load alert member list from:tmp/sample/alert_members.csv
Dec 03, 2020 4:19:02 PM amlsim.AMLSim loadAlertMemberFile
INFO: Load alert member list from:tmp/sample/alert_members.csv
InInit TagName: 1
NrOfMerchants: 34749
Seed: 1607012342331
parameterFilePath /Users/annalisa/GitHub/AMLSim//paramFiles//AggregateTransaction.csv

Inputting this paramfile:

NrOfMerchants: 1737.45

NrOfFraudsters: 50.0

Starting PaySim Running for 720 steps. Current loop:0
****************************************************************************************************Time Step 100, 0 [s]
***************************************************************************************************Time Step 200, 0 [s]
***************************************************************************************************Time Step 300, 0 [s]
***************************************************************************************************Time Step 400, 0 [s]
***************************************************************************************************Time Step 500, 0 [s]
***************************************************************************************************Time Step 600, 1 [s]
***************************************************************************************************Time Step 700, 1 [s]
******************* - Finished running 720 steps

| Indicator | Orig | Synth | Error Rate |

| NR OF TRANS | | | |
| CASH_IN 0.0 0.0 �
| CASH_OT 0.0 0.0 �
| TRANS | 0.0 0.0 � |
| PAYM | 0.0 | 0 | � |
| DEB | 0.0 | 0.0 | � |

| AVG TRANS SIZE | | | |
| CASH_IN | � | � | � |
| CASH_OT | � | � | � |
| TRANS | � | � | � |
| PAYM | � | � | � |
| DEB | � | � | � |

| TOT ERR RATE | | | � |

NrOfFailed 0
NrOfTrueClients: 0.0
NrOfDaysParticipated 0

It took: 1.89 seconds to execute the simulation

Simulation name: sample

Error run_AMLSim.sh

I following the documentation step by step. However, I receive an error when doing the following part:

Build and launch AMLSim
Please run the first script to compile Java files if you did not yet.

sh scripts/build_AMLSim.sh # Compile AMLSim Java files
sh scripts/run_AMLSim.sh conf.json # Launch an AMLSim Java application

running the command sh scripts/build_AMLSim.sh results in:

C:\Path\to\AMLSim-master>sh scripts/build_AMLSim.sh

C:\Path\to\AMLSim-master>

I suppose this is correct?

Then when I can to run the command (sh scripts/run_AMLSim.sh conf.json) I get the following error:

C:\Path\to\AMLSim-master>sh scripts/run_AMLSim.sh conf.json
Error: Could not find or load main class amlsim.AMLSim

Could you help me with this issue.

I cannot find what the problem is, thanks in advance

Normal Transaction Models has a has-one relationship to Account when it should be has many.

The Typologies have many accounts in each typology which makes sense. A typology contains many accounts. The normal transaction models however only contain one account each. I want the transaction models to contain many accounts too, just like the typologies. Because an account can be involved in more than 1 transaction model. For example an account that is a destination of a fan out can also be an origin of a single transaction. So that account is involved in 2 normal transaction models. The app would then contain those two typologies. I'm thinking about adding normalPatterns.csv and removing the "model" column in accounts.csv because the "model" column implies a 1 to 1 relationship.

Building AMLSim project shows Compilation Errors

Steps:
Generated paysim.jar using steps mentioned here : https://github.com/IBM/AMLSim/wiki/Quick-Introduddction-to-AMLSim#dependencies. Attached file https://ibm.box.com/s/o9agh0uqm439a77d1guhzg6ws339vlag ( paySim.jar)

Also i imported maven project from https://github.com/EdgarLopezPhD/PaySim and generated jar,
Attached file https://ibm.box.com/s/o9agh0uqm439a77d1guhzg6ws339vlag ( PaySim-Snaphsot.jar)

with both times, i get this compilation errors building AMLSim.sh

Attached logs : https://zenhub.ibm.com/app/files/293714/adb983cc-7917-4f23-8df6-24616d11c67f/download

non-fraud amount calculation

I've noticed that non-fraud transaction amounts are currently being calculated in a purely deterministic way: amount(t) = c * balance(t), where c is a constant determined by parameters set in conf.json. This makes the fraud transactions easy to detect as their amounts are independent of balance.
Will there be a future update to getTransactionAmount() to improve how amounts get calculated?

AccountGroups should replace Alerts

Also Account Groups have a dependency on an AbstractTransactionModel.
This dependency should be removed. Account Groups are just groups of accounts (mini graphs) and should not know they are being used in models.

Duplicate accounts and also missed accounts in the final output file "./outputs/simulationName/sar_accounts.csv"

  1. I found duplicate accounts in the output file "./outputs/simulationName/sar_accounts.csv".
    It was caused by that the program goes through every alert transaction and writes its sender to the aforementioned file without check whether the send has been written ("./scripts/convert_logs.py", Line 883).
    E.g. in a fan_out pattern, a single account transfers money to multiple accounts, which cause multiple alert transactions with the same sender.

  2. It also misses some accounts that are flagged as SAR.
    There are two reasons that cause this.

    1. The first one is the same as above. The program only writes senders while misses recipients.

    2. The second one is that when writing SAR accounts to the file, the program takes the combination of sender's id and date as the key to distinguish transactions. However, there are transactions initiated by the same sender happen in the same time (to different recipients); thus only one of the transactions is added to the candidate set.

SingleTransactionModel does not generate normal transactions

According to Wiki:

an account sends a transaction to another account in his/her neighborhood that is currently selected in a random way

However with account's model set to 0 (SINGLE) in accounts.csv the simulator does not generate any normal transactions between the accounts at all.

This can be reproducible with the following test config:

conf.json

{
  "general": {
    "random_seed": 0,
    "simulation_name": "sample",
    "total_steps": 720,
    "base_date": "2020-01-08"
  },
  "default": {
    "min_amount": 100,
    "max_amount": 1000,
    "min_balance": 50000,
    "max_balance": 100000,
    "start_step": -1,
    "end_step": -1,
    "start_range": -1,
    "end_range": -1,
    "transaction_model": 1,
    "margin_ratio": 0.1,
    "bank_id": "default",

    "cash_in": {
      "normal_interval": 100,
      "fraud_interval": 50,
      "normal_min_amount": 50,
      "normal_max_amount": 100,
      "fraud_min_amount": 500,
      "fraud_max_amount": 1000
    },
    "cash_out": {
      "normal_interval": 10,
      "fraud_interval": 100,
      "normal_min_amount": 10,
      "normal_max_amount": 100,
      "fraud_min_amount": 1000,
      "fraud_max_amount": 2000
    }
  },
  ...

accounts.csv

count,min_balance,max_balance,country,business_type,model,bank_id
5,50000,100000,AUS,I,0, CBA
5,50000,100000,AUS,I,0, NAB
2,50000,100000,AUS,I,0, ING

degree.csv

Count,In-degree,Out-degree
4,1,0
1,0,4
0,3,0

alertPatterns.csv

count,type,schedule_id,min_accounts,max_accounts,min_amount,max_amount,min_period,max_period,bank_id,is_sar
2,fan_in,1,3,3,100.0,200.0,1,5,,True
1,fan_out,1,2,2,100.0,200.0,1,5,,False

This seems to happen due to this condition in SingleTransactionModel when endStep = -1 and step goes from 0 to 720.

Use MASON library directly

Because it does not use PaySim library so much, we need to use APIs of MASON directly without PaySim.

Networkx MultiDiGraph attribute not found

I am using network 1.1 version as stated in Readme. and there is no edge_subgraph method in its documentation.. I tried with g.subgraph() as well but still it remain blank graph.

python scripts/visualize/plot_transaction_graph.py outputs/sampleaml/sampleaml_log.csv 2
Traceback (most recent call last):
File "scripts/visualize/plot_transaction_graph.py", line 81, in
g_ = get_alert_graph(g, alertID)
File "scripts/visualize/plot_transaction_graph.py", line 55, in get_alert_graph
g_ = g.edge_subgraph(edges)
AttributeError: 'MultiDiGraph' object has no attribute 'edge_subgraph'

alertPatterns parameters

Hi,
Please could you clarify the parameters individual_amount and aggregated_amount in the alertPatterns.csv parameters file? For example, in the add_alert_pattern function in transaction_graph_generator.py:
min_amount=individual_amount
max_amount=2*individual_amount
Why a factor of 2 here?
Does aggregated_amount refer to the total amounts associated with each account in the alert pattern?
Thanks

Normal transaction value variation

In all my simulations that I have done all the normal transaction value are between 90 - 110. I cannot find a way to change these normal transaction values, Therefore I was wondering if it is possible to tweak the value of the normal transactions?

thanks in advance,

Floris

UI for generating data

Need a UI to allow users to generate synthetic data from an easy-to-use interface such as a web interface or Jupyter notebook.

How to generate graph with fewer edges?

I downloaded the sample dataset with 1k nodes but I need more fraud activities. I tride to increase the number of alerts but in this way there are too many fraud nodes. So I want to reduce the edge number in the dataset. I changed the degree.csv (reduced all numbers in "count" column under 10) but the total transaction number is still over 100K. Is there any way to reduce edge number?

Customized patterns

Need a mechanism that allows you to specify patterns as an external file rather than implementing those patterns embedded in the simulation model so that they can easily add new models.

tx_log.csv being created in wrong directory?

Playing with AMLSim for the first time for a dissertation project. Getting an error on this step - python3 scripts/convert_logs.py conf.json

Simulation name: sample
Load alert groups: alert_members.csv
Convert transaction list from tmp/sample/tx_log.csv to transactions.csv, cash_tx.csv and alert_transactions.csv
Traceback (most recent call last):
File "scripts/convert_logs.py", line 912, in
converter.convert_acct_tx()
File "scripts/convert_logs.py", line 632, in convert_acct_tx
in_tx_f = open(self.log_file, "r") # Transaction log file from the Java simulator
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/sample/tx_log.csv'

Fan-In & Fan-Out SAR transactions all have the same amount

I noticed that all SAR transaction created with a FanOutTopology and a FanInTopology have the exact same transaction amounts for a given originator (resp. beneficiary). From looking at the code I think this is not intended. Also from an AML perspective I would say this is unwanted. Do you know what is going on?

See below a screenshot of how the data looks like:
image

Setup docker container

Install all SDKs (Python and Java), libraries and a Jupyter Notebook server to the container to enable users to run various environments (including servers)

GatherScatterModel bug?

Hi,
Thanks for adding the new AML typologies!
In GatherScatterModel.java line 91 reads:
sendTransaction(step, minAmount, orig, bene, isSar, alertID);
Is this a bug? Should minAmount be replaced with scatterAmount?
In any case line 85 always seems to result in scatterAmount being set to minAmount, so there is never an event when the scatter account gets the 10% margin. Is this the right behavior?
Thanks

Build error - package paysim does not exist

I did a mvn install of the paysim project to create the jar file (paysim-2.0-SNAPSHOT.jar) which I copied into AMLSim's jars folder.
Then when I run the shell script build_AMLSim.sh I get the error: package paysim does not exist

Quality default settings simulation

Hi all,

I am doing research on detecting money laundering financial transactions, I have done quite some simulations and research on the datasets generated from the simulation. However I am wondering how representative the outcome from the simulation is to the real world using the default settings? For example the 10k parameter files, or is it necessary to acquire money laundering input parameters yourself?

thanks in advance.

Feature vector

I want to generate a feature Vector of Node/accounts which I want to use in GCN also. But I am not understanding it how to do it for this dataset. Because what I understand from this dataset, feature vector is not enough to mark/identify Fraud transaction, it is a set of transaction which should be observed to find a fraud activity.

can you guide/help me out.

Add Maven to the project

I'm currently working on adding Maven to this project. The old scheme will still work for those that don't want to install maven. They will just have to go out and find all the jars.

The best way to install maven on Mac is

brew install maven

I'm thinking adding Maven is good because it's much easier to manage dependencies this way. If custom jars are still needed you can install them to your local repository.

I only needed to install 2 jars to my local repository. The other ones were easily found on the maven remote repostitories

mvn install:install-file \
-Dfile=jars/mason.18.jar \
-DgroupId=mason \
-DartifactId=mason \
-Dversion=18 \
-Dpackaging=jar \
-DgeneratePom=true
mvn install:install-file \
-Dfile=jars/paysim.jar \
-DgroupId=paysim \
-DartifactId=paysim \
-Dversion=1.0.0 \
-Dpackaging=jar \
-DgeneratePom=true

The build and run scripts work exactly the same as before. The only change is I'm getting rid of bin/ directory and now we are using target/classes/ directory as the bin directory. So I will update the README. We will need to update the wiki. @hkanezashi

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 7 during run_AMLSim.sh conf.json

Thanks for making the changes, the first part of sh scripts/run_AMLSim.sh conf.json works I suppose. However I get another exception error:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 7
at amlsim.AMLSim.loadAccountFile(AMLSim.java:214)
at amlsim.AMLSim.initSimulation(AMLSim.java:99)
at paysim.PaySim.start(PaySim.java:115)
at amlsim.AMLSim.executeSimulation(AMLSim.java:403)
at amlsim.AMLSim.runSimulation(AMLSim.java:84)
at amlsim.AMLSim.main(AMLSim.java:496)
The full command prompt is:

Microsoft Windows [Version 10.0.17763.864]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\Path\To\AMLSim-master>

C:\Path\To\AMLSim-master> python scripts/transaction_graph_generator.py conf.json
Random seed: 0
Simulation name: sample
Generated 10000 accounts.
Add 99900 base transactions
Exported 10000 accounts to tmp\sample\accounts.csv
Exported 100632 transactions to tmp\sample\transactions.csv
Output alert member list to: tmp\sample\alert_members.csv
Exported 792 members for 100 AML typologies to tmp\sample\alert_members.csv

C:\Path\To\AMLSim-master> sh scripts/build_AMLSim.sh

C:\Path\To\AMLSim-master>sh scripts/run_AMLSim.sh conf.json
General transaction interval: 7
Base transaction amount: Normal = 100.000000, Suspicious= 1000.000000
Random seed: 0
Simulation name: sample
Working directory: tmp\sample
dec 05, 2019 12:53:12 PM amlsim.AMLSim parseArgs
INFO: PaySim Properties File: paramFiles/paysim.properties
dec 05, 2019 12:53:12 PM amlsim.AMLSim parseArgs
INFO: PaySim Properties File: paramFiles/paysim.properties
dec 05, 2019 12:53:12 PM amlsim.AMLSim parseArgs
INFO: Simulation Steps: 720
dec 05, 2019 12:53:12 PM amlsim.AMLSim parseArgs
INFO: Simulation Steps: 720
Norm: 100 Case: 50
dec 05, 2019 12:53:12 PM amlsim.AMLSim initSimulatorName
INFO: Simulator Name: sample
dec 05, 2019 12:53:12 PM amlsim.AMLSim initSimulatorName
INFO: Simulator Name: sample
dec 05, 2019 12:53:12 PM amlsim.AMLSim initSimulatorName
WARNING: Output log directory already exists: tmp\sample
dec 05, 2019 12:53:12 PM amlsim.AMLSim initSimulatorName
WARNING: Output log directory already exists: tmp\sample
dec 05, 2019 12:53:12 PM amlsim.AMLSim executeSimulation
INFO: Transaction log file: tmp\sample\tx_log.csv
dec 05, 2019 12:53:12 PM amlsim.AMLSim executeSimulation
INFO: Transaction log file: tmp\sample\tx_log.csv
PAYSIM: Financial Simulator v1.0

dec 05, 2019 12:53:12 PM amlsim.AMLSim loadAccountFile
INFO: Account CSV header: ACCOUNT_ID,CUSTOMER_ID,INIT_BALANCE,START_DATE,END_DATE,COUNTRY,ACCOUNT_TYPE,IS_SAR,TX_BEHAVIOR_ID,BANK_ID
dec 05, 2019 12:53:12 PM amlsim.AMLSim loadAccountFile
INFO: Account CSV header: ACCOUNT_ID,CUSTOMER_ID,INIT_BALANCE,START_DATE,END_DATE,COUNTRY,ACCOUNT_TYPE,IS_SAR,TX_BEHAVIOR_ID,BANK_ID
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 7
at amlsim.AMLSim.loadAccountFile(AMLSim.java:214)
at amlsim.AMLSim.initSimulation(AMLSim.java:99)
at paysim.PaySim.start(PaySim.java:115)
at amlsim.AMLSim.executeSimulation(AMLSim.java:403)
at amlsim.AMLSim.runSimulation(AMLSim.java:84)
at amlsim.AMLSim.main(AMLSim.java:496)

C:\Path\To\AMLSim-master>

Add options how to choose alert accounts

Function set_subject_candidates in transaction_graph_generator.py chooses alert (fraud) accounts from hub vertices with many neighbors. The policy to choose alert accounts should be more flexible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.