anssi-fr / ascad Goto Github PK

View Code? Open in Web Editor NEW

173.0 173.0 58.0 179 KB

Side Channels Analysis and Deep Learning

License: Other

Python 100.00%

ascad's People

Contributors

Stargazers

Watchers

ascad's Issues

Cannot break other subkeys.

It seems like cannot break other subkeys, even retrain new models for each subkey : (
but I'm not sure weather it is my problem.

Can't Reproduce Rank Performance Results

Hi,

I tried both MLP and CNN models, while only using the pre-trained CNN model (cnn_best_ascad_desync0_epochs75_classes256_batchsize200.h5) can achieve good rank performance as shown in the paper. Utilizing the pre-trained MLP model (mlp_best_ascad_desync0_node200_layernb6_epochs200_classes256_batchsize100.h5), or utilizing the MLP and CNN models trained from the training script isn't able to produce reasonable results. Could you please tell me what I might be missing here?

I have downloaded the ATM_AES_v1_fixed_key dataset and the corresponding pre-training models. I didn't modify the generating, training or testing scripts. The training parameter I am using is shown below:

{
"ascad_database" : "ATMEGA_AES_v1/ATM_AES_v1_fixed_key/ASCAD_data/ASCAD_databases/ASCAD_fixed.h5",
"training_model": "./trained_cnn.h5",
"network_type": "cnn",
"epochs": 100,
"batch_size": 100
}

and the testing parameters are shown below:

{
"ascad_database" : "ATMEGA_AES_v1/ATM_AES_v1_fixed_key/ASCAD_data/ASCAD_databases/ASCAD_fixed.h5",
"model_file": "./trained_cnn.h5",
"num_traces": 1000,
"save_file": "test_cnn.png"
}

Am I missing anything here?

default value in the ASCAD_generate.py script

Hi,

I have a remark concerning the ASCAD_generate.py script.

I have noticed that the default behavior of the script is to generate a set of Profiling traces without de-synchronization for ASCAD_desync50.h5 and ASCAD_desync100.h5 while the Attack traces are de-synchronized.

The following lines are from the script:

extract_traces(original_raw_traces_file, ascad_databases_folder + "ASCAD.h5")
extract_traces(original_raw_traces_file, ascad_databases_folder + "ASCAD_desync50.h5", attack_desync = 50)
extract_traces(original_raw_traces_file, ascad_databases_folder + "ASCAD_desync100.h5", attack_desync = 100)

For the two last line, the parameter profiling_desync is left to its default value i.e. 0.

Is this normal behavior?

Regarding metadata of ASCAD v2 dataset

          Regarding this I came across some doubts that

In v2 dataset we have different random keys for each pair of plain text , masks and trace so Please clear us that is there any way that all the different keys which have been given produces us the same master key through masking of given different key bytes in Metadata
or it's vice versa that given key has been already masked and provided to us in the dataset then how we can derive a common key through them

I have trying to understand that how during the attack step we will be recovering the key bytes as we have different key for each different trace , Thanking you. @rstrullu @keykeykeykeyk

Originally posted by @Vaibhav-Sharma010204 in #18 (comment)

The index of attack traces in v2 trace set

In v2 trace set, do you use the fixed key after index 500,000? Thank you

Wrong number of traces in new dataset for multi-key

Hi All,

I tried the small dataset (281mb) but the number of traces does not match with what is documented.
Please check script the code below. There are 200000 profile traces and 1000 attack traces.
I was expecting 100000 profile traces and 50000 attack traces.

It will be overkill for me to download 71Gb data file and generate the dataset ;)

Below is the code for reference, any help is much appreciated.

import h5py
import numpy as np

# -----------------------------------------------------------
# read single and multi key groups from h5_file
h5_file = h5py.File("ASCAD.h5", "r")
h5_file_sk = h5_file["Attack_traces"]
h5_file_mk = h5_file["Profiling_traces"]

# -----------------------------------------------------------
# read all fields for single key
sk_traces = h5_file_sk["traces"][()]
sk_labels = h5_file_sk["labels"][()].astype(dtype=np.uint8)
sk_key = h5_file_sk["metadata"]["key"]
sk_plaintext = h5_file_sk["metadata"]["plaintext"]
sk_masks = h5_file_sk["metadata"]["masks"]

# -----------------------------------------------------------
# read all fields for multi key
mk_traces = h5_file_mk["traces"][()]
mk_labels = h5_file_mk["labels"][()].astype(dtype=np.uint8)
mk_key = h5_file_mk["metadata"]["key"]
mk_plaintext = h5_file_mk["metadata"]["plaintext"]
mk_masks = h5_file_mk["metadata"]["masks"]

# -----------------------------------------------------------
print(sk_traces.shape)  # outputs: (1000, 1400)
print(mk_traces.shape)  # outputs: (200000, 1400)

I was expecting

print(sk_traces.shape)  # outputs: (50000, 1400)
print(mk_traces.shape)  # outputs: (100000, 1400)

byte wise attack

I don't understand why you choose the output of sbox as the label

In your paper, you said that there is no first-order leakage on the unmasked sobx output , at the same time， the snr of the unmasked sbox output is very low . So it doesn't make sense that we choose the output of sbox as the label.

Clock frequency of the target

Could you give any information on the clock frequency of the target implementation? From the documentation, I was only to figure out you are using an external clock, but not at which frequency you run it. Thank you very much!

Why do I need X_profiling and Y_profiling for testing?

Hi, thanks a lot for sharing this work!

I don't understand why the test script requires X_profiling and Y_profiling. I thought the profiling is done in the trainting phase and the test phase is just for attacking!

Thank you in advance.

corrupted links in ReadMe.md

The links to the campaigns result in 404 Error - Page not found

ASCAD v2 dataset metadata minor issue

Please check the eight full raw trace files, each of 100 GB size.
The last examples metadata for PTX, CTX, KEY, MASKS tokens are all set to zeros

This ends up adding the wrong metadata in a smaller version file at an interval of, 100000 when generated using ASCAD_generate.py

The training result of VGG-16 is not as good as your paper.

hi @prouff , I trained a VGG-16 model with the hyper-parameters you provided, batch size = 200, RMSprop optimizer with a initial learning rate of 0.00001, Desynchronization = 0, and this is already provided in your training script.
Actually I just modified the number of Conv1D of Block12345(Block1, 64 kernel numbers; Block2, 128 kernel numbers...) to make it fit the VGG-16 model prototype, and started the training with different epoch(75, 150, 300, 1200), but training result is not as good as your paper(page 39).
Can you give me some advice for getting better results? thank you for your attention to this matter.

Regarding metadata of ASCAD v2 dataset

I had just come across a doubt regarding the metadata that has been given in ASCAD v2 extracted Dataset (7GB)
the **metadata for attacking traces consists of different key different pair of plainText and masks (38 Byte) then how we are going to utilize it for SCA using deep learning ** I have referred to this implimentation which were taking reference of paper "Side Channel Analysis against the ANSSI’s protected AES implementation on ARM](https://eprint.iacr.org/2021/592)" and Dataset :- "ASCAD-V2-EXTRACTED"
2). pls clear us that is there any way that all the different keys which have been given produces us the same master key through masking or vice versa that given key has been already masked and provided to us in the dataset then how we can derive a common key through them

referenced dataset img :- https://github.com/ANSSI-FR/ASCAD/assets/114134627/001c16aa-576e-4b90-90ab-6f2097210549

kindly pls look into this.

Thanking You.

Yours Sincerely,
Vaibhav

Trace Points

ASCAD/ASCAD_generate.py

Line 382 in 410b92b

target_points=[n for n in range(45400, 46100)]

How the trace points are selected? How the tracepoints must be selected and extracted for training and retrieving other bytes (from other rounds of SBOX) than the 3rd byte in first round of Sbox

Difference of Datasets: Sampling Frequency / EM & Power?

First of all, thank you for providing the additional data set with the random keys. Great work! :-)

When looking at the two different data sets, I was a little puzzled about the difference regarding sampling frequencies and lengths. From the information available, the following sampling rates and length of a trace can be concluded:

Fixed key version: 2GS/s (paper & github), 100,000 samples
Random key version: 500MS/s (info in HDF5), 250,000 samples

However, assuming in both cases the clock frequency of 4MHz is used, either the Random Key Version uses a sampling frequency of 5GS/s or the one with the Fixed Key Version was sampled at 200MS/s. To conclude: how samples are contained in a clock cycle in each of the sets?

Also regarding the qualitative behavior of the data sets there are some notable difference. Where the two campaigns measured both with EM (and if so with the same probe at the same position) or is the fixed key version measured over a shunt resistor (on github only "Power consumptions measurements" is stated, while in the paper is says "EM").

Could you shed some light onto these observations?

Dataset

From the White paper, it is found that tracepoints are selected from the interval [45400...46100] out of 1Lakh points. So, how it is possible to point out that these 700 points only correspond to the output of the third sbox operating in the first round of the encryption?

how to implement SNR analysis?

in your paper, figure 2,3 shows the various intermediate values related to the processing of sbox(p[3] xor k[3]) in the interval [45400..46100], that was brilliant, would you mind introduce how to commit the SNR calculation betweent sbox(p[3] xor k[3]) and the time samples ? i am going to reproduce this result for a further study, thank you.

one_hot probability rank problem

When the classifier predict a one hot probability
the rank is always 0
when calculating key_bytes_proba,
when proba == 0
key_bytes_proba is calculated by taking the square of the second min element in the array. this is problematic when the second min element == 1 (happens in 1 hot vectors)

the in the rank function in ASCAD_test_models.py
Suggestion :ASCAD_test_models.py (function rank) line 70
replace
key_bytes_proba[i] += np.log(min_proba**2) by
key_bytes_proba[i] += np.log(min_proba/2)#or divided by a larger

Or add another if case.

`ASCAD_generate.py` script for new dataset

Hey good to see that the dataset is now available for random key.
But I do not see the updated script for ASCAD_generate.py.
It is still having old information and the link for new script as given in documentation here is broken.

profiling_index = [n for n in range(0, 50000)]
attack_index = [n for n in range(50000, 60000)]
target_points=[n for n in range(45400, 46100)]

Shouldn't that be

profiling_index = [n for n in range(0, 100000)]
attack_index = [n for n in range(100000, 150000)]
target_points=[n for n in range(45400, 46100)]

Also, can you point me to publication or repo to check the accuracy numbers.
I can use the trained model provided by you but it would be helpful to put the numbers in readme.md.

why the value of loss and acc is better when has desyn?

Hi,
Does this mean that mean rank have nothing to do with loss and acc in train?
you said that the desynchronization only deal with test_deta, not profiling_data. i wonder why the value of loss and acc will different..

Error in creating labels for ASCADv2 dataset

I wonder what's the label for y = to_categorical(Y_profiling, num_classes=256) since Y contains multiple tensors within one sample? The error of course occurs:
TypeError: Cannot cast array data from dtype([('alpha_mask', 'u1', (1,)), ('beta_mask', 'u1', (1,)), ('sbox_masked', 'u1', (16,)), ('sbox_masked_with_perm', 'u1', (16,)), ('perm_index', 'u1', (16,))]) to dtype('int64')

Regarding first-order leakage in the database

Hi,

We found that the variable-key database would have a first-order leakage and was broken by a first-order CPA though it is mentioned that the traces in the database should have no first-order leakage.
For the fixed-key database, such first-order leakage was not clearly confirmed via our experiment, but the CPA result suggests that we would find such a leakage and retrieve the secret key by the first-order CPA if we would have a sufficient number of traces.
Please find the attach the PDF file for our experimental results.
Could you please confirm and comment on the attached results?
ASCAD_issue_slide.pdf
If necessary, please also refer to the following link to check the source code of the experiment.
https://github.com/skotskt/ASCAD_First_Order_CPA

Best regards,

Cannot download databases

Hi,
Thanks for your work.
I am now reading your paper, and want to run the code on my computer.
But I cannot down the databases, both fixed_key and vairial key. It seems the link : https://static.data.gouv.fr/resources/ascad-atmega-8515-variable-key cannot be openned anymore.

I am puzzled by the results presented in your paper.

In your paper, you mentioned that the mean rank of the model is close to 0, but the average correct rate of the model is only about 0.004.
But I think that a mean rank close to 0 should mean that the correct rate is close to 100%. Why is this?

anssi-fr / ascad Goto Github PK

ascad's People

Contributors

Stargazers

Watchers

Forkers

ascad's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs