owasp / www-project-machine-learning-security-top-10 Goto Github PK
View Code? Open in Web Editor NEWOWASP Machine Learning Security Top 10 Project
Home Page: http://owasp.org/www-project-machine-learning-security-top-10/
License: Other
OWASP Machine Learning Security Top 10 Project
Home Page: http://owasp.org/www-project-machine-learning-security-top-10/
License: Other
Video will be uploaded to OWASP Youtube Channel
Video will be uploaded to OWASP Youtube Channel
General Feedback
I would like to report the following issue/feedback
Originally posted by ankitloud October 6, 2023
Originally posted by giscus[bot] September 20, 2023
https://mltop10.info/ML01_2023-Input_Manipulation_Attack.html
Website Issue Report
I would like to report the following issue/feedback
Documentation Issue Report
There is a comprehensive existing body of work at: https://ethical.institute
The intent would be to review the current Top 10 list in this project and:
Website Issue Report
This applies to both the website and documentation content.
Suggestions for Improvement
Re-thinking and re-writing ML06 - corrupted packages
The description of ML05 is quite limited given how complicated the software supply chains are, especially those related to ML-using software.
In the summary of the vulnerability it is written: This type of attack can be particularly dangerous as it can go unnoticed for a long time, since the victim may not realize that the package they are using has been compromised. The attacker's malicious code could be used to steal sensitive information, modify results, or even cause the machine learning model to fail.. Meanwhile, in the Detectability section in Risk Factors it says, that it's easy to detect this kind of vulns.
What is more, there's nothing said about countermeasures such as SBOM/MLBOM etc. in the description of this vulnerability. In my opinion that should be included.
There's plenty of resources that should be analyzed and used for the description of this specific vulnerability:
Create a set of guidelines for how to consume the information presented in the Top 10 based on roles
Example roles:
.. etc
as per feedback from #87
Glossary page to standardise on terminologies and definitions.
Suggestions for Improvement
The term adversarial attack usually has a broader definition than the intention of ML01. For example it usually includes data poisoning.
The intention seems to refer to what is more often called 'evasion attack'. The problem with that term is that it usually means small changes to the input. This is why in the AI guide we used the term 'input manipulation', which is more clear.
General Feedback
The LLM top 10 mentions excessive agency, because it is important to limit privileges /autonomy / have oversight over LLM's. This is a general AI problem.
One could argue whether this is a security risk, and I would argue that it is, because just as AI models are unpredictable, they may also have been manipulated.
I believe the ML top 10 also needs Excessive agency.
Suggestions for Improvement
I believe 'Packages' to be a too specific term for the problem of supply chain attacks. Calling it 'supply chain attacks' will make the reader aware of the risk that any external component in the AI pipeline can be manipulated.
Also, add 'data' as a potential supply chain risk, and refer to 'data poisoning' for that, and also add 'model', referring to the transfer learning attack.
[Async] Meeting -> Jul 25 - Jul 30 2023
Join us for this async meeting running from 25th July - 30th July 2023, by participating in Slack Thread or by commenting to this issue in Github
Originally posted by giscus[bot] September 20, 2023
https://mltop10.info/ML01_2023-Input_Manipulation_Attack.html
The top 10 list should have an ability to export the Markdown files into a PDF. This could be done for example via a Github Action.
as an example _includes has information from the standard top 10 directory which shows error pages such as: https://owasp.org/www-project-machine-learning-security-top-10/2023/Acknowledgements.html
mentioned in: #44
Documentation Issue Report
Hi team,
I would like to focus on the missing information related to the Risk Ranking number of Top 10 at the starting of the page, the table mentioned in every vulnerability.
It would be a great resource if added so that people can related to the risk associated with the vulnerability.
All Contributors has a specification for recognising contributions that are not just code.
e.g.
The full list is shown here: https://allcontributors.org/docs/en/emoji-key
--
Implementation would involve either using a bot: https://allcontributors.org/docs/en/bot/overview or manually via cli: https://allcontributors.org/docs/en/cli/overview
2023-Jul-20 06:00 UTC (11:30 Hyderabad, 16:00 Melbourne)
Attendees:
The Top 10 list is being rendered using Markdown at https://mltop10.info
The site is being rendered using Quarto and the files from https://github.com/OWASP/www-project-machine-learning-security-top-10/tree/master/docs are mirrored to https://github.com/mltop10-info/mltop10.info
Currently a manual process is run for the https://github.com/mltop10-info/mltop10.info locally to render the HTML and PDF outputs which are stored in https://github.com/mltop10-info/mltop10.info/tree/main/docs and used by Github Pages.
The rendering for PDF is currently using the default method of LaTeX - example at: https://github.com/mltop10-info/mltop10.info/blob/main/docs/OWASP-Machine-Learning-Security-Top-10.pdf
Quarto has a lot of formatting options for generating PDF and this needs to be explored to make the PDF and ePUB formats look more presentable.
Video will be uploaded to OWASP Youtube Channel
Thinking of ML10:2023 Model Poisoning, we can create two scripts that, although carrying out the same operation (perhaps classification), which provide different outcomes.
By this was, we can showcase model poisoning in action along with the theory corresponding to it.
Please share your ideas with me on this!
Construct workflow to clone the Top 10 attack's MD File to the Repo https://github.com/mltop10-info/mltop10.info.
So that all changes to attack scenarios are pushed by WF rather than human interaction.
Suggestions for Improvement
[FEEDBACK]: Model skewing requires altering training data, making it a form of data poisoning. Therefore it is probably better to integrate the two threats.
the following is an initial review taken from Slack logs: https://owasp.slack.com/archives/C04PESBUWRZ/p1677192099712519
Dear all,
I did a first scan through the list to mainly look at taxonomy. Here are my remarks.
1.
ML01
In 'literature' the term ‘adversarial’ is often used for input manipulation attacks, but also for data poisoning, model extraction etc. Therefore in order to avoid confusion it is probably better to rename the ML01 adversarial attack entry to input manipulation?
2.
It is worth considering to add ‘model evasion’ aka black box input manipulation to your top 10? Or do you prefer to have one entry for input manipulation all together?
3.
ML03
It is not clear to me how scenarios 1 and 2 work. I must be missing something. Usually model inversion is explained by manipulating synthesized faces until the algorithm behaves like it recognizes the face.
4
ML04
It is not clear to me how scenario 1 works.
Standard methods against overtraining are missing form the ‘how to prevent’ part. Instead the advice is to reduce the training set size - which typically increases the overfitting problem.
5
ML05
Model stealing describes a scenario where an attacker steals model parameters, but generally this attack takes place by ways of black box: gathering input-output pairs and training a new model on it.
6
ML07
I don’t understand exactly how the presented scenario should work. I do know about the scenario where a pre-trained model was obtained that has been altered by an attacker. This matches the description.
7
ML08
Isn’t model skewing the same as data poisoning? If there’s a difference, to me they are not apparent from the scenario and description.
8
ML10 is called Neural net reprogramming but I guess the attack of changing parameters will work on any type of algorithm - not just neural networks. The description also mentions changing the training data, but perhaps that is better left out to avoid confusion with data poisoning?
Each of the Top 10 items are scored according to OWASP's Risk Rating Methodology. There should be a page defining how to use the ratings to provide a severity score. This will assist practitioners in knowing 'what to fix' and 'when'.
as per feedback in #85
Video will be uploaded to OWASP Youtube Channel
General Feedback
Corrupting/manipulating model parameters is a general threat, referred to as model poisoning, and is not restricted to neural networks.
Reference https://github.com/OWASP/www-project-machine-learning-security-top-10/blob/master/GUIDELINES.md#ciso
The current model stealing only describes the model being stolen through parameters, but the model can also be stolen by presenting inputs, capturing the output and using those combinations to train your own model. See AI guide
as per feedback in #84
General Feedback
The risk of leaking training data or other confidentiality issues of the AI pipeline (code, model parameters) are missing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.