GithubHelp home page GithubHelp logo

test-mass-forker-org-1 / recursiveextractor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/recursiveextractor

0.0 0.0 0.0 196.09 MB

RecursiveExtractor is a .NET Standard 2.0 archive extraction Library, and Command Line Tool which can process 7zip, ar, bzip2, deb, gzip, iso, rar, tar, vhd, vhdx, vmdk, wim, xzip, and zip archives and any nested combination of the supported formats.

License: MIT License

JavaScript 0.95% C# 91.41% CSS 2.58% HTML 5.06%

recursiveextractor's Introduction

About

CodeQL Nuget Nuget

Recursive Extractor is a Cross-Platform .NET Standard 2.0 Library and Command Line Program for parsing archive files and disk images, including nested archives and disk images.

Supported File Types

7zip+ ar bzip2
deb gzip iso
rar^ tar vhd
vhdx vmdk wim*
xzip zip+
Details
* Windows only
+ Encryption Supported
^ Rar version 4 Encryption supported

Variants

Command Line

Installing

  1. Ensure you have the latest .NET SDK.
  2. run dotnet tool install -g Microsoft.CST.RecursiveExtractor.Cli

This adds RecursiveExtractor to your path so you can run it directly from the shell.

Running

Basic usage is: RecursiveExtractor --input archive.ext --output outputDirectory

Detailed Usage
  • input: The path to the Archive to extract.
  • output: The path a directory to extract into.
  • passwords: A comma separated list of passwords to use for archives.
  • allow-globs: A comma separated list of glob patterns to require each extracted file match.
  • deny-globs: A comma separated list of glob patterns to require each extracted file not match.
  • raw-extensions: A comma separated list of file extensions to not recurse into.
  • no-recursion: Don't recurse into sub-archives.
  • single-thread: Don't attempt to parallelize extraction.
  • printnames: Output the name of each file extracted.

For example, to extract only ".cs" files:

RecursiveExtractor --input archive.ext --output outputDirectory --allow-globs **/*.cs

Run RecursiveExtractor --help for more details.

.NET Standard Library

Recursive Extractor is available on NuGet as Microsoft.CST.RecursiveExtractor. Recursive Extractor targets netstandard2.0+ and the latest .NET, currently .NET 6.0.

Usage

The most basic usage is to enumerate through all the files in the archive provided and do something with their contents as a Stream.

using Microsoft.CST.RecursiveExtractor;

var path = "path/to/file";
var extractor = new Extractor();
foreach(var file in extractor.Extract(path))
{
    doSomething(file.Content); //Do Something with the file contents (a Stream)
}
Extracting to Disk
This code adapted from the Cli extracts the contents of given archive located at `options.Input` to a directory located at `options.Output`, including extracting failed archives as themselves.
using Microsoft.CST.RecursiveExtractor;

var extractor = new Extractor();
var extractorOptions = new ExtractorOptions()
{
    ExtractSelfOnFail = true,
};
extractor.ExtractToDirectory(options.Output, options.Input, extractorOptions);
Async Usage
This example of using the async API prints out all the file names found from the archive located at the path.
var path = "/Path/To/Your/Archive"
var extractor = new Extractor();
try {
    IEnumerable<FileEntry> results = extractor.ExtractFileAsync(path);
    await foreach(var found in results)
    {
        Console.WriteLine(found.FullPath);
    }
}
catch(OverflowException)
{
    // This means Recursive Extractor has detected a Quine or Zip Bomb
}
The FileEntry Object
The Extractor returns `FileEntry` objects. These objects contain a `Content` Stream of the file contents.
public Stream Content { get; }
public string FullPath { get; }
public string Name { get; }
public FileEntry? Parent { get; }
public string? ParentPath { get; }
public DateTime CreateTime { get; }
public DateTime ModifyTime { get; }
public DateTime AccessTime { get; }
Extracting Encrypted Archives
You can provide passwords to use to decrypt archives, paired with a Regex that will operate against the Name of the Archive.
var path = "/Path/To/Your/Archive"
var directory
var extractor = new Extractor();
try {
    IEnumerable<FileEntry> results = extractor.ExtractFile(path, new ExtractorOptions()
    {
        Passwords = new Dictionary<Regex, List<string>>()
        {
            { new Regex("\.zip"), new List<string>(){ "PasswordForZipFiles" } },
            { new Regex("\.7z"), new List<string>(){ "PasswordFor7zFiles" } },
            { new Regex(".*"), new List<string>(){ "PasswordForAllFiles" } }

        }
    });
    foreach(var found in results)
    {
        Console.WriteLine(found.FullPath);
    }
}
catch(OverflowException)
{
    // This means Recursive Extractor has detected a Quine or Zip Bomb
}

Exceptions

RecursiveExtractor protects against ZipSlip, Quines, and Zip Bombs. Calls to Extract will throw an OverflowException when a Quine or Zip bomb is detected.

Otherwise, invalid files found while crawling will emit a logger message and be skipped. RecursiveExtractor uses NLog for logging.

Feedback

If you have any issues or feature requests (for example, supporting other formats) you can open a new Issue.

If you have an archive you are having trouble parsing a specific archive, it is helpful if you can include an archive that demonstrates the issue.

Dependencies

Recursive Extractor uses a number of libraries to parse archives.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

recursiveextractor's People

Contributors

gfs avatar daalcant avatar microsoftopensource avatar scovetta avatar guyacosta avatar jhoak avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.