GithubHelp home page GithubHelp logo

arimus / jmimemagic Goto Github PK

View Code? Open in Web Editor NEW
206.0 17.0 58.0 246 KB

jMimeMagic is a Java library for determining the MIME type of files or streams.

Home Page: http://sourceforge.net/projects/jmimemagic/

License: Apache License 2.0

Java 88.58% Shell 0.12% Perl 0.06% Python 11.03% HTML 0.21%

jmimemagic's Introduction

jMimeMagic (TM) v0.1.5
Copyright (C) 2003-2017, David Castro
Contact:  David Castro <[email protected]>

jMimeMagic is a Java library for determining the MIME type of files or streams.

Please see LICENSE in this directory for jMimeMagic licensing information.  See
LICENSE_log4j, LICENSE_oro, LICENSE_xerces, LICENSE_junit respectively for
Log4j, ORO, Xerces2 and JUnit licensing information.  Log4j, ORO, Xerces2 and
JUnit are bundled with jMimeMagic for convenience.

** NOTE **
This API absolutely will change until there is a stable release! Relying on it
to not change is probably NOT a safe bet.  It is an initial release, given as a
(hopefully) better than nothing option.  The plan is for this library to become
much cleaner and well-architected, but only time will tell.  The more you show
interest in this library/nudge me, the more likely that will be the case.
Comments and feedback greatly welcome.


Requirements:
  Java 2 SDK 1.3+
  Apache Maven 1.0.2+
  JUnit 3.8.x
  Jakarta ORO 2.0.x
  Commons Logging 1.0.x
  Log4j 1.2.x
  Xerces 2.4.0 (optional)

Building:
  type 'mvn clean jar:jar'

  should have your jar file in ./target

Testing:
  Log4j setting can be modified in resources/log4j.properties

  Run all unit tests
  ------------------
  edit build.properties and create the line 'maven.test.skip=true'
  type 'maven clean test'
  
  Run test against a particular file
  ----------------------------------
  type 'maven clean run -Dclass=net.sf.jmimemagic.Magic -Dargs=<file to test>'
  - or - simply ./test <file to test> in a unix shell
  (this is similar to the 'file' command in *nix)

Maven:

  To add jMimeMagic as a dependency in a Maven project, you can use the
  following in the dependencies section of your pom.xml.

  <dependency>
      <groupId>net.sf.jmimemagic</groupId>
      <artifactId>jmimemagic</artifactId>
      <version>0.1.3</version>
  </dependency>

Contributions:
  Thanks to the MMBase team (http://www.mmbase.org/) for doing the work of
  creating the original basis for the XML version of the magic file.

  To contribute code or other help, send an email to [email protected]
  or submit patches/bug reports/etc on the jMimeMagic project page:

      http://sf.net/projects/jmimemagic/

Notes:
  Remember that you will need the proper libraries (XML Parser/Xerces2, Log4j,
  Commons Logging, and ORO) in the classpath for any applications that use
  jMimeMagic.  If you want to run any of the jUnit tests, then you will also
  need JUnit in the classpath.

Developers:
  David Castro <[email protected]>
  Nate Jones <[email protected]>

Problems/questions/suggestions:
  David Castro <[email protected]>

jmimemagic's People

Contributors

arimus avatar joshuapinter avatar qnerd avatar tuxburner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jmimemagic's Issues

LGPL

is better than GPL but still is not suitable for Apache projects or many other entities.
A more liberal license would be nice.

Licensing information is inconsistant

Hey,

Nifty code and very useful. However, while you state Apache License 2.0 in your readme / license.txt file, the source code headers still include LPLG.

Remove Jackson (and other?) dependencies

Hi,
I wondered why those two jackson dependencies are needed, as the project does not make use of any org.codehaus.jackson-classes.
If still needed, an upgrade to 2.8.x would be useful:

<dependency>
	<groupId>com.fasterxml.jackson.jaxrs</groupId>
	<artifactId>jackson-jaxrs-json-provider</artifactId>
	<version>2.8.6</version>
</dependency>
<dependency>
	<groupId>com.fasterxml.jackson.module</groupId>
	<artifactId>jackson-module-jaxb-annotations</artifactId>
	<version>2.8.6</version>
</dependency>

See also #26
Regards

Matthias

Error with Unicode files

We got the following exception when trying to upload a Unicode UTF-8 file. I saved the same file as ANSI and ASCII UTF-8, and the file uploaded fine, and MagicMatch found the Mime Type. But it fails on all Unicode files, not matter the Encoding, size, or simplicity of character strings in the file. Can open Word, type in a simple sentence, and Save As a Unicode UTF-8 file and test it.

net.sf.jmimemagic.MagicMatchNotFoundException
at net.sf.jmimemagic.Magic.getMagicMatch(Magic.java:222)

Add JAR file

I used an older version from SourceForge and this lib appears great. I ask you to put the JAR file while the lib still in development.

tiff isn't recognized

running with tiff file from https://github.com/arimus/jmimemagic/tree/master/test_docs :

import java.io.File;
import net.sf.jmimemagic.Magic;
import net.sf.jmimemagic.MagicMatch;
public class TiffCheck {
   public static void main(String[] args) {
      // image/tiff
      // TIFF image data, little-endian
      checkJmimemagic("test_nocompress.tif");
   }
   // net.sf.jmimemagic
   public static void checkJmimemagic(final String file) {
      String mimeType = null;
      try {
         File currentFile = new File(file);
         MagicMatch match = Magic.getMagicMatch(currentFile, true, false);
         mimeType = match.getMimeType();
         System.out.println(mimeType);
      } catch (Exception e) {
         e.printStackTrace();
      }
   }
}

delivers:

net.sf.jmimemagic.MagicMatchNotFoundException
    at net.sf.jmimemagic.Magic.getMagicMatch(Magic.java:368)
    at TiffCheck.checkJmimemagic(TiffCheck.java:22)
    at TiffCheck.main(TiffCheck.java:13)

Can you please confirm or tell me what's wrong with my test.

Thanks
Cheers

Heinrich

Text file with byte order mark not detected

Text file with BOM (Byte order mark) can't be detected correct. In TextFileDetector method process(byte[] data, int offset, int length, long bitmask, char comparator, String mimeType, Map params) should contains something like:

log.debug("processing stream data");

Perl5Util util = new Perl5Util();

try {
String s = new String(data, "UTF-8");

if (!util.match("/[^[:ascii:][:space:]]/", s)) {
return new String[] { "text/plain" };
}

// trying to find if BOM present
if (bomPresent(ByteOrderMark.UTF_8, data) ||
bomPresent(ByteOrderMark.UTF_16LE, data) ||
bomPresent(ByteOrderMark.UTF_16BE, data)) {
return new String[] { "text/plain" };
}

return null;
} catch (UnsupportedEncodingException e) {
log.error("TextFileDetector: failed to process data");

return null;
}

And method bomPresent could be next:

private boolean bomPresent(ByteOrderMark byteOrderMark, byte[] data) {
int bomLength = byteOrderMark.length();
byte[] startingBytes = Arrays.copyOf(data, bomLength);

return Arrays.equals(startingBytes, byteOrderMark.getBytes());
}

Attached example file.

(copied from SF ticket - http://sourceforge.net/tracker/?func=detail&aid=3462414&group_id=94418&atid=607846)

BMP files are not detected

Add the follwing test to MagicTest and it will fail:

public void testBMP() {
	System.out.print("\ntesting BMP image...");
	try {
		MagicMatch match = Magic.getMagicMatch(new File("test_docs/test.bmp"), true, false);
		if (match != null) {
			assertEquals("image/bmp", match.getMimeType());
		} else {
			System.out.print("failed");
			fail("no match in testBMP()");
		}
		System.out.print("ok");
	} catch (Exception e) {
		e.printStackTrace();
		fail("exception in testBMP(). message: " + e);
	} catch (Error e) {
		e.printStackTrace();
		fail("error in testBMP(). message: " + e.getMessage());
	}
}

I noticed there's no <match> for image/bmp in magix.xml

Html Sgml confusion

i have tried to match an html file mime type detected sgml
both starts with 'doctype' but html file continues with 'html'
maybe it is required to order mathchers

svg mime type not found

 <dependency>
            <groupId>jmimemagic</groupId>
            <artifactId>jmimemagic</artifactId>
            <version>0.1.2</version>
</dependency>
public String getMimeType(byte[] file) throws Exception {
        MagicMatch match = Magic.getMagicMatch(file);
        return match.getMimeType();
}
public void testContentTypeFind() throws Exception {
        TestCase.assertEquals(
                "image/svg+xml",
                getMimeType(filenameToByteArray("public/assets/svg/avatar-1.svg"))
        );
}

Expected :image/svg+xml
Actual :text/plain

.MagicParseException

When I user Magic.getMagicMatch( new File(Environment.getExternalStorageDirectory().getAbsolutePath() + File.separator + "1"+File.separator+"jweixin-1.0.0.js").

DTD is illegal

The DTD file does not match the magic.xml definition?

changes to magic.xml shouldn't have to be bundled in the jar

Obviously. A deficiency that has long existed and just hasn't been remedied.

From sf.net...

I'd like to be able to configure the magic XML file so that I could use an external file and not the resource file that is directly bundled with the jmimemagic.jar. So it is possible to modify the XML at runtime and reinitialize JMimeMagic with the new file.


Example:

File f = new File("/path/to/mymagic.xml");

Magic.initialize(file);

In order to reinitialize JMimeMagic, an mechanism as follows comes to my mind:

Magic.reset();

Magic.initialize(file);

Maven Central

Would be great to have this library in Maven Central

Xerces and XercesImpl

There are 2 Dependencies: Xerces and XercesImpl which are actually the same. The xerces dependency is relocated to xercesImpl.
This leads to 2 version of xercesImpl. Some maven plugins cannot handle this situation correctly.

e.g. compare dependency:tree and dependency:list

a solution would be to remove xerces:xercesImpl:jar:2.4.0:runtime

<dependency>
  <groupId>xerces</groupId>
  <artifactId>xerces</artifactId>
  <version>2.4.0</version>
  <scope>runtime</scope>
</dependency>

<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.7.1</version>
  <scope>runtime</scope>
</dependency>

Plain text with diacritic is not recognized

Hey there!

When I have plain text file with normal characters there is no problem to recognize it as text/plain. But when I add some diacritic (ěščřžýáí) the plain text file is not recognized and MagicMatchNotFoundException is thrown.

Take care and thanks for great library!

BMP error : net.sf.jmimemagic.MagicMatchNotFoundException: null

I get a byte array from a url, then test it with magic class. And it has thrown a MagicMatchNotFoundException.

Code:

public static String getMimeType(byte[] bytes) throws MagicParseException, MagicMatchNotFoundException, MagicException {
Magic parser = new Magic();
MagicMatch match = parser.getMagicMatch(bytes);
return match.getMimeType();
}

Test code:

String bmpUrl = "https://raw.githubusercontent.com/arimus/jmimemagic/master/test_docs/test.bmp";

byte[] bmpFIle = IOUtils.toByteArray((new URL(bmpUrl)).openStream());
assertThat(FileHelper.getMimeType(bmpFIle)).isEqualTo("image/bmp");

Code review: change the object you act on to some nio buffer? Interface with JAVA 1.7 FileTypeDetector

I am reviewing how you do this.

Ideally you would be drop in ready for java.nio.file.spi.FileTypeDetector then you would just be used with

Files.probeContentType(Path path)

But then from your reviewing your code it seems you cycle through a linear list of matcher objects which you call with test() on either a byte Array or a File. So you have a fork in your code which leads to you write a lot of code implementing for these two completely different objects.

In case of File, you seem to open the RandomAccessFile every time you test. That could be 100s of times, no? It would seem to be better to just use the same RandomAccessFile.

I would think it's best to do that with one of those new Buffer objects, then you can map that to either an array to deal with the byte array case and to a file to deal with the file case, but in either case you won't re-open the file all the time.

The ms office open xml formats in the stock magic.xml broke the zip detections.

The Detection for: ms office open xml formats commit: 8ea79ca broke the zip detection when using data and not file based. When using data detection there is no file extension to check on. So the pattern which matches zip files: PK\003\004 is detected as an office file which is wrong.

My suggestion is to move the ms office stuff under the zip detection.

java.util.ConcurrentModificationException

Hi there!

We're having an issue with random concurrent modification exceptions in our application after switching from Java 1.6 -> Java 1.8. I'm thinking this has something to do with the changes Java made after jdk1.8u20 (see https://medium.com/@edouard.kaiser/collections-sort-the-java8u20-modification-d6a9acf96861) regarding Collections.sort(). Any thoughts on this?
We're using version 0.0.4a of you lib. Regards, David

2019-09-26 06:17:18,926 ERROR [jetspeed] DynamicPortlet.getContent(): null
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
at java.util.ArrayList$Itr.next(ArrayList.java:859)
at net.sf.jmimemagic.Magic.getMagicMatch(Magic.java:139)...

OpenOffice .ods file recognized as application/vnd.openxmlformats-officedocument.wordprocessingml.document

Using the StockChart.ods file provided at https://wiki.openoffice.org/wiki/File:StockChart.ods the following will fail:

File stockCharts = ...
assertEquals("application/vnd.oasis.opendocument.spreadsheet", Magic.getMagicMatch(stockCharts, true, false).getMimeType())

It results in:

org.junit.ComparisonFailure: 
Expected :application/vnd.oasis.opendocument.spreadsheet
Actual   :application/vnd.openxmlformats-officedocument.wordprocessingml.document

So it looks like *.ods files are recognized as MS Excel *.xlsx Spreadsheets.

extremely poor javadocs

From the latest version, just about every usable method has a DOCUMENT ME in it. Very frustrating and somewhat worthless. Time to roll my own. Sigh.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.