arimus / jmimemagic Goto Github PK

jMimeMagic is a Java library for determining the MIME type of files or streams.

Home Page: http://sourceforge.net/projects/jmimemagic/

License: Apache License 2.0

Java 88.58% Shell 0.12% Perl 0.06% Python 11.03% HTML 0.21%

jmimemagic's Introduction

jMimeMagic (TM) v0.1.5
Copyright (C) 2003-2017, David Castro
Contact:  David Castro <[email protected]>

jMimeMagic is a Java library for determining the MIME type of files or streams.

Please see LICENSE in this directory for jMimeMagic licensing information.  See
LICENSE_log4j, LICENSE_oro, LICENSE_xerces, LICENSE_junit respectively for
Log4j, ORO, Xerces2 and JUnit licensing information.  Log4j, ORO, Xerces2 and
JUnit are bundled with jMimeMagic for convenience.

** NOTE **
This API absolutely will change until there is a stable release! Relying on it
to not change is probably NOT a safe bet.  It is an initial release, given as a
(hopefully) better than nothing option.  The plan is for this library to become
much cleaner and well-architected, but only time will tell.  The more you show
interest in this library/nudge me, the more likely that will be the case.
Comments and feedback greatly welcome.


Requirements:
  Java 2 SDK 1.3+
  Apache Maven 1.0.2+
  JUnit 3.8.x
  Jakarta ORO 2.0.x
  Commons Logging 1.0.x
  Log4j 1.2.x
  Xerces 2.4.0 (optional)

Building:
  type 'mvn clean jar:jar'

  should have your jar file in ./target

Testing:
  Log4j setting can be modified in resources/log4j.properties

  Run all unit tests
  ------------------
  edit build.properties and create the line 'maven.test.skip=true'
  type 'maven clean test'
  
  Run test against a particular file
  ----------------------------------
  type 'maven clean run -Dclass=net.sf.jmimemagic.Magic -Dargs=<file to test>'
  - or - simply ./test <file to test> in a unix shell
  (this is similar to the 'file' command in *nix)

Maven:

  To add jMimeMagic as a dependency in a Maven project, you can use the
  following in the dependencies section of your pom.xml.

  <dependency>
      <groupId>net.sf.jmimemagic</groupId>
      <artifactId>jmimemagic</artifactId>
      <version>0.1.3</version>
  </dependency>

Contributions:
  Thanks to the MMBase team (http://www.mmbase.org/) for doing the work of
  creating the original basis for the XML version of the magic file.

  To contribute code or other help, send an email to [email protected]
  or submit patches/bug reports/etc on the jMimeMagic project page:

      http://sf.net/projects/jmimemagic/

Notes:
  Remember that you will need the proper libraries (XML Parser/Xerces2, Log4j,
  Commons Logging, and ORO) in the classpath for any applications that use
  jMimeMagic.  If you want to run any of the jUnit tests, then you will also
  need JUnit in the classpath.

Developers:
  David Castro <[email protected]>
  Nate Jones <[email protected]>

Problems/questions/suggestions:
  David Castro <[email protected]>

jmimemagic's People

Contributors

Stargazers

Watchers

jmimemagic's Issues

Depends on ORO, a project that Apache retired in 2010

Would you consider switching to the RegEx handler in the JDK ? this looks cool, but I don't want to bake in a dependency on something dead.

LGPL

is better than GPL but still is not suitable for Apache projects or many other entities.
A more liberal license would be nice.

should not return application/zip for docx & co.

E.g. for docx the correct MIME type would be application/vnd.openxmlformats-officedocument.wordprocessingml.document

Licensing information is inconsistant

Hey,

Nifty code and very useful. However, while you state Apache License 2.0 in your readme / license.txt file, the source code headers still include LPLG.

MP3 file with an ID3v2 container

MP3 files with an ID3v2 container are parsed as text/plain

Remove Jackson (and other?) dependencies

Hi,
I wondered why those two jackson dependencies are needed, as the project does not make use of any org.codehaus.jackson-classes.
If still needed, an upgrade to 2.8.x would be useful:

<dependency>
	<groupId>com.fasterxml.jackson.jaxrs</groupId>
	<artifactId>jackson-jaxrs-json-provider</artifactId>
	<version>2.8.6</version>
</dependency>
<dependency>
	<groupId>com.fasterxml.jackson.module</groupId>
	<artifactId>jackson-module-jaxb-annotations</artifactId>
	<version>2.8.6</version>
</dependency>

We got the following exception when trying to upload a Unicode UTF-8 file. I saved the same file as ANSI and ASCII UTF-8, and the file uploaded fine, and MagicMatch found the Mime Type. But it fails on all Unicode files, not matter the Encoding, size, or simplicity of character strings in the file. Can open Word, type in a simple sentence, and Save As a Unicode UTF-8 file and test it.

net.sf.jmimemagic.MagicMatchNotFoundException
at net.sf.jmimemagic.Magic.getMagicMatch(Magic.java:222)

xls spreadsheet recognized as application/msword

The correct mime type for xls Spreadsheets according to http://blogs.msdn.com/b/vsofficedeveloper/archive/2008/05/08/office-2007-open-xml-mime-types.aspx is application/vnd.mx-excel. However, jMimeMagic currently recognized xls spreadsheets as application/msword

Add JAR file

I used an older version from SourceForge and this lib appears great. I ask you to put the JAR file while the lib still in development.

tiff isn't recognized

running with tiff file from https://github.com/arimus/jmimemagic/tree/master/test_docs :

import java.io.File;
import net.sf.jmimemagic.Magic;
import net.sf.jmimemagic.MagicMatch;
public class TiffCheck {
   public static void main(String[] args) {
      // image/tiff
      // TIFF image data, little-endian
      checkJmimemagic("test_nocompress.tif");
   }
   // net.sf.jmimemagic
   public static void checkJmimemagic(final String file) {
      String mimeType = null;
      try {
         File currentFile = new File(file);
         MagicMatch match = Magic.getMagicMatch(currentFile, true, false);
         mimeType = match.getMimeType();
         System.out.println(mimeType);
      } catch (Exception e) {
         e.printStackTrace();
      }
   }
}

delivers:

net.sf.jmimemagic.MagicMatchNotFoundException
    at net.sf.jmimemagic.Magic.getMagicMatch(Magic.java:368)
    at TiffCheck.checkJmimemagic(TiffCheck.java:22)
    at TiffCheck.main(TiffCheck.java:13)

Can you please confirm or tell me what's wrong with my test.

Thanks
Cheers

Heinrich

Big sized file throws OutOfMemoryError

Heap is configured 4GB for my application.
When call getMagicMatch() for Big sized file like 5GB throws OutOfMemoryError

Get a much larger set of out-of-the-box supported mime types

Need to finish up the convertor for magic files.

Text file with byte order mark not detected

Text file with BOM (Byte order mark) can't be detected correct. In TextFileDetector method process(byte[] data, int offset, int length, long bitmask, char comparator, String mimeType, Map params) should contains something like:

log.debug("processing stream data");

Perl5Util util = new Perl5Util();

try {
String s = new String(data, "UTF-8");

if (!util.match("/[^[:ascii:][:space:]]/", s)) {
return new String[] { "text/plain" };
}

// trying to find if BOM present
if (bomPresent(ByteOrderMark.UTF_8, data) ||
bomPresent(ByteOrderMark.UTF_16LE, data) ||
bomPresent(ByteOrderMark.UTF_16BE, data)) {
return new String[] { "text/plain" };
}

return null;
} catch (UnsupportedEncodingException e) {
log.error("TextFileDetector: failed to process data");

return null;
}

And method bomPresent could be next:

private boolean bomPresent(ByteOrderMark byteOrderMark, byte[] data) {
int bomLength = byteOrderMark.length();
byte[] startingBytes = Arrays.copyOf(data, bomLength);

return Arrays.equals(startingBytes, byteOrderMark.getBytes());
}

Attached example file.

(copied from SF ticket - http://sourceforge.net/tracker/?func=detail&aid=3462414&group_id=94418&atid=607846)

BMP files are not detected

Add the follwing test to MagicTest and it will fail:

public void testBMP() {
	System.out.print("\ntesting BMP image...");
	try {
		MagicMatch match = Magic.getMagicMatch(new File("test_docs/test.bmp"), true, false);
		if (match != null) {
			assertEquals("image/bmp", match.getMimeType());
		} else {
			System.out.print("failed");
			fail("no match in testBMP()");
		}
		System.out.print("ok");
	} catch (Exception e) {
		e.printStackTrace();
		fail("exception in testBMP(). message: " + e);
	} catch (Error e) {
		e.printStackTrace();
		fail("error in testBMP(). message: " + e.getMessage());
	}
}

I noticed there's no <match> for image/bmp in magix.xml

Html Sgml confusion

i have tried to match an html file mime type detected sgml
both starts with 'doctype' but html file continues with 'html'
maybe it is required to order mathchers

.odt file are recognized as a .zip file.

The jmimemagic lib needs a fix to recognize .odt files.
the mime type is application/vnd.oasis.opendocument.text

svg mime type not found

 <dependency>
            <groupId>jmimemagic</groupId>
            <artifactId>jmimemagic</artifactId>
            <version>0.1.2</version>
</dependency>

public String getMimeType(byte[] file) throws Exception {
        MagicMatch match = Magic.getMagicMatch(file);
        return match.getMimeType();
}

public void testContentTypeFind() throws Exception {
        TestCase.assertEquals(
                "image/svg+xml",
                getMimeType(filenameToByteArray("public/assets/svg/avatar-1.svg"))
        );
}

Expected :image/svg+xml
Actual :text/plain

Case insensitivity for some of the matchers by default

Moved over from sf.net:
Peter G B Whitham ( pgbw )

Running JMimeMagic on some publicly available web sites
yielded the following doctype declarations:

.MagicParseException

When I user Magic.getMagicMatch( new File(Environment.getExternalStorageDirectory().getAbsolutePath() + File.separator + "1"+File.separator+"jweixin-1.0.0.js").

DTD is illegal

The DTD file does not match the magic.xml definition？

changes to magic.xml shouldn't have to be bundled in the jar

Obviously. A deficiency that has long existed and just hasn't been remedied.

From sf.net...

I'd like to be able to configure the magic XML file so that I could use an external file and not the resource file that is directly bundled with the jmimemagic.jar. So it is possible to modify the XML at runtime and reinitialize JMimeMagic with the new file.

Example:

File f = new File("/path/to/mymagic.xml");

Magic.initialize(file);

In order to reinitialize JMimeMagic, an mechanism as follows comes to my mind:

Magic.reset();

Magic.initialize(file);

Maven Central

Would be great to have this library in Maven Central

Xerces and XercesImpl

There are 2 Dependencies: Xerces and XercesImpl which are actually the same. The xerces dependency is relocated to xercesImpl.
This leads to 2 version of xercesImpl. Some maven plugins cannot handle this situation correctly.

e.g. compare dependency:tree and dependency:list

a solution would be to remove xerces:xercesImpl:jar:2.4.0:runtime

<dependency>
  <groupId>xerces</groupId>
  <artifactId>xerces</artifactId>
  <version>2.4.0</version>
  <scope>runtime</scope>
</dependency>

<dependency>
  <groupId>xerces</groupId>
  <artifactId>xercesImpl</artifactId>
  <version>2.7.1</version>
  <scope>runtime</scope>
</dependency>

For rtf files method getMimeType() should return "application/rtf" instead of "text/rtf"

Moved from sf.net
Hedek ( hedek )

MP4 files cause heap space issues

Moved from sf.net
Rara (raprami)

When analysing the "Video029.mp4" file ( you can get it here : http://raphael.ramirez.free.fr/video029.mp4 ) with a custom magic.xml file (see attachement),
MagicMatch match = Magic.getMagicMatch(file, true) breaks with a "java.lang.OutOfMemoryError: Java heap space".

Plain text with diacritic is not recognized

Hey there!

When I have plain text file with normal characters there is no problem to recognize it as text/plain. But when I add some diacritic (ěščřžýáí) the plain text file is not recognized and MagicMatchNotFoundException is thrown.

Take care and thanks for great library!

Host javadocs online

Would be nice to link to an online version of the javadocs from the README.

BMP error : net.sf.jmimemagic.MagicMatchNotFoundException: null

I get a byte array from a url, then test it with magic class. And it has thrown a MagicMatchNotFoundException.

Code:

public static String getMimeType(byte[] bytes) throws MagicParseException, MagicMatchNotFoundException, MagicException {
Magic parser = new Magic();
MagicMatch match = parser.getMagicMatch(bytes);
return match.getMimeType();
}

Test code:

String bmpUrl = "https://raw.githubusercontent.com/arimus/jmimemagic/master/test_docs/test.bmp";

byte[] bmpFIle = IOUtils.toByteArray((new URL(bmpUrl)).openStream());
assertThat(FileHelper.getMimeType(bmpFIle)).isEqualTo("image/bmp");

Code review: change the object you act on to some nio buffer? Interface with JAVA 1.7 FileTypeDetector

I am reviewing how you do this.

Ideally you would be drop in ready for java.nio.file.spi.FileTypeDetector then you would just be used with

Files.probeContentType(Path path)

But then from your reviewing your code it seems you cycle through a linear list of matcher objects which you call with test() on either a byte Array or a File. So you have a fork in your code which leads to you write a lot of code implementing for these two completely different objects.

In case of File, you seem to open the RandomAccessFile every time you test. That could be 100s of times, no? It would seem to be better to just use the same RandomAccessFile.

I would think it's best to do that with one of those new Buffer objects, then you can map that to either an array to deal with the byte array case and to a file to deal with the file case, but in either case you won't re-open the file all the time.

The ms office open xml formats in the stock magic.xml broke the zip detections.

The Detection for: ms office open xml formats commit: 8ea79ca broke the zip detection when using data and not file based. When using data detection there is no file extension to check on. So the pattern which matches zip files: PK\003\004 is detected as an office file which is wrong.

My suggestion is to move the ms office stuff under the zip detection.

java.util.ConcurrentModificationException

Hi there!

We're having an issue with random concurrent modification exceptions in our application after switching from Java 1.6 -> Java 1.8. I'm thinking this has something to do with the changes Java made after jdk1.8u20 (see https://medium.com/@edouard.kaiser/collections-sort-the-java8u20-modification-d6a9acf96861) regarding Collections.sort(). Any thoughts on this?
We're using version 0.0.4a of you lib. Regards, David

2019-09-26 06:17:18,926 ERROR [jetspeed] DynamicPortlet.getContent(): null
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
at java.util.ArrayList$Itr.next(ArrayList.java:859)
at net.sf.jmimemagic.Magic.getMagicMatch(Magic.java:139)...

OpenOffice .ods file recognized as application/vnd.openxmlformats-officedocument.wordprocessingml.document

Using the StockChart.ods file provided at https://wiki.openoffice.org/wiki/File:StockChart.ods the following will fail:

File stockCharts = ...
assertEquals("application/vnd.oasis.opendocument.spreadsheet", Magic.getMagicMatch(stockCharts, true, false).getMimeType())

It results in:

org.junit.ComparisonFailure: 
Expected :application/vnd.oasis.opendocument.spreadsheet
Actual   :application/vnd.openxmlformats-officedocument.wordprocessingml.document

So it looks like *.ods files are recognized as MS Excel *.xlsx Spreadsheets.

extremely poor javadocs

From the latest version, just about every usable method has a DOCUMENT ME in it. Very frustrating and somewhat worthless. Time to roll my own. Sigh.

arimus / jmimemagic Goto Github PK

jmimemagic's Introduction

jmimemagic's People

Contributors

Stargazers

Watchers

Forkers

jmimemagic's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs