eric-mitchell / detect-gpt Goto Github PK

View Code? Open in Web Editor NEW

347.0 347.0 51.0 15 KB

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

License: MIT License

Python 63.32% Shell 36.68%

detect-gpt's People

Contributors

Stargazers

Watchers

detect-gpt's Issues

is there trained model weights to use?

Dear developer, I am interested in your work, may I ask you if you could uplaod your trained weights?

How can I run this CODE on Windows System？

Sorry to bother you, I am very interested in your DetectGPT code, but because my computer system is Windows, I want to know whether this code can run on Windows System.

About the normalization of perturbation discrepancy

Hi. Thank you for your exemplary contribution to the paper and implementation code.

I have a question because the formula mentioned in the paper and the implementation in the code are different.

In line 6 of Algorithm 1 of the paper, there is a part where you divide the square root of the standard deviation $\sigma$ to normalize the perturbation discrepancy $\mathbf{d}$. In the code run.py line 462-462, where this part is implemented, you only calculate the standard deviation. I can't find the amount where we calculate the square root. Could you have intended the z-score standardization to divide by the standard deviation $\sigma$ simply?

Sincerely,

Getting Permission denied error

I am trying to execute the following command
python run.py --output_name n_perturb --base_model_name gpt2-medium --mask_filling_model_name t5-small --n_perturbation_list 1,10,100,1000 --n_samples 100 --pct_words_masked 0.3 --span_length 2

Getting the following error:
with open(file_path, encoding="utf-8") as f:
PermissionError: [Errno 13] Permission denied: 'C:\Users\SOUVIK\.cache\huggingface\datasets\downloads\extracted\6c23c0182c80c071cd57f17c1b534be8fa7ea6cb31f30c8dd7a95bf1ed1b0d33.py'

how to run interactive demo locally?

Models not set to eval mode

Why are the models not set to eval() mode before running inference? If my understanding is correct, this means that dropout, etc. will be applied when perturbations are being generated and log likelihood is estimated.

How to test?

Could you provide the version of each package?

I installed python3.8, but it doesn't work correctly. Thank you.

How to quickly start

How can I quickly use it to tell if a piece of text is machine generated

Evaluate/fine tune model on custom data

Is it possible to fine tune or evaluate model on custom data?

Are prompts included in the perturbation / likelihood computation?

Hi @eric-mitchell,
Thanks for setting up this clean implementation! I had a quick question about the method.

Do you include the prompts as a prefix to the samples while you perform perturbations / calculate likelihood?

From here and here it seems like prompts are included.

I was wondering if very long prompts (relative to the generation length) increase the false-positive rate?

Thank you!

The type and number of GPUs

Hello, I would like to know the type and number of GPUs used in your experiment, thanks

Evaluation dataset for GPT-3 generations

Hi, I'm woundering if you could release your evaluation dataset for GPT-3 generations, including PubMedQA, XSum, and WritingP (each 150 samples). Since the randomness in OpenAI services, a shared evaluation dataset will definitely make the followup work easier. Thanks!

Getting stuck when applying extracted fills

I have encountered the following issue when I am processing my own text:

WARNING: 1 texts have no fills. Trying again [attempt 1].
WARNING: 1 texts have no fills. Trying again [attempt 2].
WARNING: 1 texts have no fills. Trying again [attempt 3].
WARNING: 1 texts have no fills. Trying again [attempt 4].
WARNING: 1 texts have no fills. Trying again [attempt 5].
WARNING: 1 texts have no fills. Trying again [attempt 6].
WARNING: 1 texts have no fills. Trying again [attempt 7].
WARNING: 1 texts have no fills. Trying again [attempt 8].
WARNING: 1 texts have no fills. Trying again [attempt 9].
...

As I set breakpoints and looked at the intermediate variables, I found that if len(fills) < n is activated in function apply_extracted_fills which results in an empty list output of the function. What might be the problem?

The text that I am dealing with is:

A Ponzi scheme is a type of investment scam where earlier investors are paid with the money of newer investors, rather than with actual profits earned. It's called a Ponzi scheme because it was named after Charles Ponzi, who became famous for using this technique in the early 1900s. Here's an example of how a Ponzi scheme might work: Imagine there are three people: Alice, Bob, and Carol. Alice is the person running the Ponzi scheme. Bob and Carol are the investors. Alice tells Bob and Carol that she has a special investment opportunity where they can earn a lot of money very quickly. Bob and Carol are excited and give Alice some of their money to invest. Alice takes the money from Bob and Carol and doesn't actually invest it anywhere. Instead, she uses some of the money to pay herself and keep some for herself. Then, she uses the rest of the money to pay Bob and Carol a small amount of money, pretending that it's the profits they've earned from the investment. Bob and Carol are happy because they're getting paid, so they tell their friends Dave and Emily about the investment opportunity. Dave and Emily also give Alice some of their money to invest. Alice uses the same process with Dave and Emily's money. She pays herself and keeps some for herself, and then uses the rest of the money to pay Bob, Carol, Dave, and Emily a little more money, pretending that it's the profits they've earned. This process continues, with Alice getting more and more money from new investors and using it to pay the earlier investors, who are happy because they think they're making a lot of money. However, the whole thing is a lie. Alice is not actually investing the money at all. She's just using the money from new investors to pay the earlier investors, and keeping some for herself. Eventually, the Ponzi scheme will collapse because there won't be enough new investors to pay all of the earlier investors, and people will start to realize that they're not actually making any real profits. A pyramid scheme is similar to a Ponzi scheme in that it's a type of investment scam. However, in a pyramid scheme, the people running the scam make their money by recruiting new members, rather than by investing the money of the members. Like a pyramid, the scheme relies on having a large number of people at the bottom to support the people at the top. Like a Ponzi scheme, a pyramid scheme will eventually collapse because there aren't enough new members to support the people at the top.

Thank you!

XSum Fake Samples

Hi
Can i get an access to the real/fake XSum, pubmed qa dataset you used to evaluate my model on?

I want the same fake samples you generated from the original XSum and pubmed dataset

Thank you!

There is something wrong when downloading the xsum dataset

The program got stuck at line 627, data = datasets.load_dataset(dataset, split='train', cache_dir=cache_dir)[key], while running. I tried running this line of code separately in another project, then moving the downloaded xsum file to the ~/.cache, but the program still got stuck here every time. I also changed xsum to "enlish" to call custom_datasets.py, but the data would download partially and then stop downloading, and the program would get stuck at data = custom_datasets.load(dataset, cache_dir).
When the program got stuck, I checked my CPU, GPU, and they were not overloaded, my network signal was also good, I don't know what happened.

The demo link seems broke down.

Changing the buffer size for no fills warnings in perturbation.

Hi,
Encountering lots of no fills warnings and being stuck in perturbation, can we change the buffer size from 1 to 2 or something?
What does the buffer size mean?
Thanks.

eric-mitchell / detect-gpt Goto Github PK

detect-gpt's People

Contributors

Stargazers

Watchers

Forkers

detect-gpt's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs