museumofmodernart / collection Goto Github PK

View Code? Open in Web Editor NEW

1.4K 1.4K 254.0 36.88 MB

The Museum of Modern Art (MoMA) collection data

License: Creative Commons Zero v1.0 Universal

collection's People

Contributors

Stargazers

Watchers

Forkers

straup sbavier melissaagill foeromeo tfmorris namminammi fionaromeo rossgoodwin polosecki coleww opensensorsdata edsu jqnatividad mzeinstra emijrp iamwfx megyoung frankieroberto descartez leenbean dannguyen rodneyramsey kippjohnson linux-devil vonguard jiandeyu-zju datajensen miguelbermudez annemiaoli andrewjtimmons merrem therockstardba mikepqr cgtian lukezhangma lyzerana hoonida lindamood spencerkaplan chrisswk jasonparker amashigeseiji bishopchui collbb mclawges22 sirenita667 cathryng reecekol samarkandiy rasmusruhnau dieface kalinn dfmooreqqq tonyfast gurlinthewurld alanschrank keithompson didiercabrera pantsateria amirosimani ainsleyoc rlehnhof namimi928 nalsi loretoparisi xander119 dsjacq o-c-r pathadley amnakamura thocell johanndiedrick yikesi susuxianxian dignes serviciosculturales sjoerdapp lars-olsen aronlindhagen yangyang-huang augustanage mogon-flow slmerritt acupoftee armxsys hilanguyen malavikasrinivasan samdc915 joetm jeisc iglee jingruli11 robfirth peoplemakeculture aycasarez miro-ka eanakim inventium penhchet arsen3d

collection's Issues

"Title" header in CSV file contains 'ZERO WIDTH NO-BREAK SPACE' (U+FEFF)

For example:

{
"Medium": "Lift ground aquatint, aquatint, and soft ground etching, printed in black",
"Dimensions": "plate 4 5/8 x 5 1/2" (11.8 x 13.9 cm)",
"Classification": "Illustrated Book",
"Artist": "Bill Jensen",
"URL": "",
"CuratorApproved": "N",
"CreditLine": "Gift of Emily Fisher Landau",
"Date": "1989-1994",
"Department": "Prints & Illustrated Books",
"MoMANumber": "595.1994.5",
"ArtistBio": "(American, born 1945)",
"DateAcquired": "1994-11-08",
"\ufeffTitle": "Headpiece (folio 18) from POSTCARDS FROM TRAKL",
"ObjectID": "19407"
}

Contribution policies

I've raised this issue over at the Carnegie Museum's collection repo: what is the contributor policy for this repo, i.e. if you had a CONTRIBUTOR page, what would it say?

Since the data here are (presumably) generated from your internal collections management system, the usual pull request system may not work, as you would want to effect changes to either the content or presentation of these data in the upstream CMS and/or scripts. Should all suggested changes go through issues? And what would your process be for addressing them?

I understand this would likely involve part of a larger internal discussion by the maintainers - but it'd be great to have some process documentation.

HTTPS thumbnail URLs

It looks like all thumbnail URLs are HTTP which then get redirected to HTTPS. Could they be updated to be HTTPS in JSON already?

Thumbnails are for old versions of images

Some works have thumbnails which are for old versions of digitized images of works. Example:

https://www.moma.org/collection/works/2

Which in the dataset has thumbnail:
(https://www.moma.org/media/W1siZiIsIjU5NDA1Il0sWyJwIiwiY29udmVydCIsIi1yZXNpemUgMzAweDMwMFx1MDAzZSJdXQ.jpg?sha=137b8455b1ec6167)

But a comparable thumbnail size of the new version of a digitized image looks like:
(https://www.moma.org/media/W1siZiIsIjUyNzc3MCJdLFsicCIsImNvbnZlcnQiLCItcXVhbGl0eSA5MCAtcmVzaXplIDI3MngxNjhcdTAwM2UiXV0.jpg?sha=de1bbae3ef278e8f)

Could those thumbnails be updated?

include artwork permissions in the dataset

Because some of the artworks in the collection change status on whether they can be included in this dataset (or our permissions for the image)

From #29

I would love to have a license code (e.g. cc-0) in the Artwork data, so if I'm working on something with images I can filter out anything that doesn't have the right license.

The documentation just says that images aren't included, but this comment makes it seem like some images might be available already.

Metadata about artworks belonging to series is missing

On MoMA website I see that artworks can belong to a series. But this metadata is not available in the data dump here. Could it be added?

ULAN listing for artists

There are several items that would vastly improve the value of this database, beginning with a ULAN column for artist authority in the artworks cvs file. Separating the acquisition date into separate cells for day/month/year would also be very valuable and save researchers quite a bit of work.

invalid JSON

It looks like there is an invalid comma at the end of Artworks.json:

tail Artworks.json
  "CreditLine": "Mies van der Rohe Archive, gift of the architect\r\n", 
  "MoMANumber": "MR2.336",  
  "Classification": "A&D Mies van der Rohe Archive",    
  "Department": "Architecture & Design",    
  "DateAcquired": null, 
  "CuratorApproved": "N",   
  "ObjectID": 199449,   
  "URL": null   
},
]

This makes it impossible to parse with a tool like jq

Row count vs info in README

The README indicates that the dataset has more than 120,000 records, but the row count of the CSV is 65,500. Is there something I'm missing in in my clone, or something else I'm missing?

Thanks. I always appreciate an interesting new GLAM dataset!

All artworks with thumbnails do not really have an image on the website

When I link to a website I would expect that artworks with thumbnails have an image on the website. But this does not seem to be so. There are artworks which have thumbnails but do not have images. Examples (I can provide full list if needed):

Could this be brought in sync?

And that opens also the opposite question: are there artworks which do have images but do not have thumbnails in the dataset?

Classification of the work

On the website, I see that works have classification (e.g., Furniture and interiors), but that is not available here. Could it be added?

License Clarification: Is this really a public domain work?

Howdy!

This is a superb dataset and it's super exciting to see the Museum of Modern Art share it with the world.

I was wondering if you could help clarify the way your usage guidelines on the readme concerning derivative works and the license of this work interact. IANAL and it's confusing to me. 😦

Background

In particular: [Emphasis mine]

"Do not misrepresent the dataset
Do not mislead others or misrepresent the dataset or its source. ... Whenever you transform, translate or otherwise modify the dataset, you must make it clear that the resulting information has been modified by you. If you enrich or otherwise modify the dataset, consider publishing the derived dataset without reuse restrictions."

And the license: [Emphasis mine]

"To the greatest extent permitted by, but not in contravention of,
applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
and Related Rights and associated claims and causes of action, whether now
known or unknown (including existing as well as future claims and causes of
action), in the Work (i) in all territories worldwide, (ii) for the maximum
duration provided by applicable law or treaty (including future time
extensions), (iii) in any current or future medium and for any number of
copies, and (iv) for any purpose whatsoever, including without limitation
commercial, advertising or promotional purposes (the "Waiver")."

I was considering playing with this dataset, but it seems like a plausible interpretation that:

Visualizing the data or reporting regression results is a translation or transformation of the data
I could optionally follow the guideline that mandates that "you must make it clear that the resulting information has been modified by you."; thus I would need to modify my derivative work to include proper attribution to MoMA and myself. This at least constrains me from releasing any derivative work anonymously.
I could optionally follow the guideline: "If you enrich or otherwise modify the dataset, consider publishing the derived dataset without reuse restrictions." This probably means I can't consider a CC BY SA NC license. 👎 😿 (It's one of my favorites)

Neither of these guidelines make sense for a work that has been given to the public domain.

The first clause seems to address citation/plagiarism, which is a very important concern. Attribution of the original work is key for avoiding plagiarism.
Plagiarism.org or many university citation guides, like this one explain this further.

However, beyond the academic context, there's no requirement that a derivative work contain the author's original name. In fact, some authors may choose to avoid the use of their real name on the internet.

The second clause encouraging me to avoid one of my favorite creative commons licenses seems misplaced. If I make a derivative work from a public domain work, the derivative work (as independent from the dataset itself) can be copyrighted however I please. The dataset remains in the public domain.

"The copyright in a derivative work covers only the additions,
changes, or other new material appearing for the first time
in the work. Protection does not extend to any preexisting
material, that is, previously published or previously registered
works or works in the public domain or owned by a
third party."-
copyright.gov

Coda

"The public domain comprises a body of knowledge and innovation over which no person or other legal entity can assert proprietary rights" - https://www1.villanova.edu/villanova/generalcounsel/copyright/edumaterial/plagiarism.html

Proposal

MoMA should figure out if they intend to assert any proprietary rights over this data set. I contend this includes, at a minimum, picking a stance on attribution rights and the license of derivative works.
If MoMA wishes to assert no proprietary rights of the work, there should be no normative statements about derivative works in the readme. I can submit a pull request to this repository that removes those if you'd like.
If MoMA wishes to assert proprietary rights over the dataset, they should select an appropriate license. Creative Commons licenses are free culture licenses, so they're a great place for museums to start looking. http://creativecommons.org/choose/ I can submit a pull request that adds an appropriate free culture license if you'd like.

Thank you so much for your time and thank you for sharing your data!

Download not working

When clicking on the download button for this file the data is presented in the browser rather than downloading as a .csv file.

Inconsistent acquired dates

Out of 123,919 records, all but five of the acquired dates of the artworks read in a standardized YYYY-MM-DD format. The following four were in MM-DD-YYYY format, and I think it would be good to change them from 11-17-2009 to 2009-11-17.

Row 110,555:

Untitled #136,José Antonio Suárez Londoño,"(Colombian, born 1955)",1997,Etching,"plate: 5 13/16 × 1 15/16"" (14.7 × 5 cm); sheet: 10 15/16 × 7 9/16"" (27.8 × 19.2 cm)",Gift of the artist through the Latin American and Caribbean Fund,1528.2009,Print,Prints & Illustrated Books,11-17-2009,N,133104,

Row 110,556:

Untitled #137,José Antonio Suárez Londoño,"(Colombian, born 1955)",1997,Etching,"plate: 5 13/16 × 1 15/16"" (14.8 × 4.9 cm); sheet: 11 × 7 1/2"" (28 × 19.1 cm)",Gift of the artist through the Latin American and Caribbean Fund,1529.2009,Print,Prints & Illustrated Books,11-17-2009,N,133105,

Row 110,557:

Untitled #138,José Antonio Suárez Londoño,"(Colombian, born 1955)",1997,Etching,"plate: 5 13/16 × 1 7/8"" (14.7 × 4.7 cm); sheet: 11 × 7 1/2"" (28 × 19.1 cm)",Gift of the artist through the Latin American and Caribbean Fund,1530.2009,Print,Prints & Illustrated Books,11-17-2009,Y,133106,http://www.moma.org/collection/works/133106

Row 110,558:

Untitled #139,José Antonio Suárez Londoño,"(Colombian, born 1955)",1997,Etching,"plate: 5 13/16 × 1 15/16"" (14.7 × 4.9 cm); sheet: 10 7/8 × 7 9/16"" (27.7 × 19.2 cm)",Gift of the artist through the Latin American and Caribbean Fund,1531.2009,Print,Prints & Illustrated Books,11-17-2009,Y,133107,http://www.moma.org/collection/works/133107

The last inconsistent artwork simply had the year 1941 without a month or day. While I understand that dates can be fuzzy in the art world (the date header in the CSV file testifies to that), every other record is very consistent. Personally, I made a change from 1941 to 1941-01-01 and I'll likely attach a note describing a possible missing date. Can we get the true date acquired?

Row 132,209:

Two Figures Seated Beside a Corpse,Cândido Portinari,"(Brazilian, 1903–1962)",1939,Lithograph,"Composition: 5 9/16 × 7 1/8"" (14.2 × 18.1 cm)",Gift of the Artist,352.1941,Print,Prints & Illustrated Books,1941,N,179107,

l instead of 1

On three works by Pierre Petit the date listed is l860s ?

(moma numbers 367.1981, 368.1981, and 369.1981)

Seeing how he was alive 1832-1909 I assume the date is supposed to be 1860s ?

Correct? I'll make this a pull request if that is the case.

ArtistId

Would you mind adding the artistId to allow linking to the artist-page like
http://www.moma.org/collection/artists/7056?

Add all fields to JSON

Record 77, id 102, in the JSON doesn't have the dimensions.

For consistency, can you add them, so all records have the same number of fields in the same order? Of course, in JSON you don't need to do that, as you inherently would in CSV.

The JSON data is more complete, but I'm pretty much treating it the same as the CSV, and am thrown off when some fields are missing.

No height and width, like all the preceding records have.

Thanks!

Duplicate artists on artworks

It seems there is at least one artwork (object ID 334) which has duplicate artists. This is visible by observing the ConstituentID array which has 8158 twice.

Numerical date (year) of the work

On the website (in its search engine) I see that works have a numerical date (year) you can filter on. But in data here date is an arbitrary string. So there is already a cleaned version? Could it be added?

The DOI badge/reference is broken

The one I found is MOMA Collection of Artsits. The README references something else that no longer exists.

clarify trademark restrictions

I'm translating the moma collection and want to put it on a website. Most of the restrictions are easy to understand and follow, but

You must not use MoMA’s trademarks

Can I use moma in the subdomain? If not, I'll simply use "artworks" or something generic, but as the data is from MOMA (and of course all the attribution and disclaimers will be properly displayed, for clarity and credit I'd like to use moma in the subdomain (not domain, which would be confusing).

Flickr does not allow that term to be used in a subdomain, so I called my project "glimmer" (get it, "flicker" synonym?) Moma is clearer, the site is primarily a demo of how to use the translation tool, but the MOMA data is interesting so I thought I'd try to provide a demo of some value.

Curator Approved Artworks

Hi there,
Just want to double check - what is the relevant column for curator approved artworks? Is it the column "Cataloged"? (values Y and N).
Could you please publish a clear documentation for this dataset that includes full explanations on each column and an indexed set of values?

Many thanks!

Data Quota Limit? Error fetching LFS

Hi,
Just for your information, we're unable to fetch your data due to github data quota limit.

Are there other mirrors for this repository?

Downloading Artists.csv (1.0 MB)
Error downloading object: Artists.csv (bb2d7a6):
Smudge error: Error downloading Artists.csv (bb2d7a697dac8cf19a38a0675aec400ec5c840862c44ee6398b0863f1f0a0f6b):
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Thanks

Glossary of terms

Hi,

Thanks for sharing this valuable database for easy use. I'm wondering if you're willing to share glossary of terms too (https://www.moma.org/learn/moma_learning/glossary) I know it's not that hard to extract a json from that page myself, but an official release from Museum is more valuable and I can rely on updates, etc.

provide schema, refactor for version 2

{
  "Title": "Ferdinandsbrücke Project, Vienna, Austria (Elevation, preliminary version)",
  "Artist": [
    "Otto Wagner"
  ],
  "ConstituentID": [
    6210
  ],
  "ArtistBio": [
    "Austrian, 1841–1918"
  ],
  "Nationality": [
    "Austrian"
  ],
  "BeginDate": [
    1841
  ],
  "EndDate": [
    1918
  ],
  "Gender": [
    "male"
  ],
  "Date": "1896",
  "Medium": "Ink and cut-and-pasted painted pages on paper",
  "Dimensions": "19 1/8 x 66 1/2\" (48.6 x 168.9 cm)",
  "CreditLine": "Fractional and promised gift of Jo Carole and Ronald S. Lauder",
  "AccessionNumber": "885.1996",
  "Classification": "Architecture",
  "Department": "Architecture & Design",
  "DateAcquired": "1996-04-09",
  "Cataloged": "Y",
  "ObjectID": 2,
  "URL": "https://www.moma.org/collection/works/2",
  "ImageURL": "https://www.moma.org/media/W1siZiIsIjUyNzc3MCJdLFsicCIsImNvbnZlcnQiLCItcmVzaXplIDEwMjR4MTAyNFx1MDAzZSJdXQ.jpg?sha=712ac0fd74ea5bd5",
  "OnView": "",
  "Height (cm)": 48.6,
  "Width (cm)": 168.9
}

It'd be great if all these fields were documented.

In the process, I think it might evoke a conversation about the structure, like why is there a "Gender" field on an artwork? And "BeginDate" and "EndDate" on an artwork sounds like when the artwork was created, not the birth/death dates of the artist. If BeginData were moved to inside the artist then it could also be called birthYear and be an integer rather than an array.

Tiny issues. I know how hard it is to manage the schema for this kind of data, and I appreciate it very much that it is even available as is! Thanks for making it available!

Inconsistencies in array data

There are obviously 2 artists here, but the bio and some other fields are inconsistent in the number of records.

From Artworks.json

The csv is even more difficult to parse when there are arrays.

More generally, it feels like the artist fields shouldn't be repeated in the Artwork, but rather embedded (in JSON for the CSV)

{ 
   "Title":"(title)",
   "Artist": [
        {"id": 123, "bio": "artist bio"},
        {"id": 234, "bio": "artist bio"}
   ]
}

Update schedule

I understand from the README that the goal is to update this data extract regularly to reflect changes in the upstream CMS. Has there been any movement towards formalizing this update process?

gender field

A single artist, 67622 Feliza Bursztyn, has the gender "female" not "Female", which is bothersome when trying to do data mining. I would recommend changing all to consistent caps.