museumofmodernart / collection Goto Github PK
View Code? Open in Web Editor NEWThe Museum of Modern Art (MoMA) collection data
License: Creative Commons Zero v1.0 Universal
The Museum of Modern Art (MoMA) collection data
License: Creative Commons Zero v1.0 Universal
For example:
{
"Medium": "Lift ground aquatint, aquatint, and soft ground etching, printed in black",
"Dimensions": "plate 4 5/8 x 5 1/2" (11.8 x 13.9 cm)",
"Classification": "Illustrated Book",
"Artist": "Bill Jensen",
"URL": "",
"CuratorApproved": "N",
"CreditLine": "Gift of Emily Fisher Landau",
"Date": "1989-1994",
"Department": "Prints & Illustrated Books",
"MoMANumber": "595.1994.5",
"ArtistBio": "(American, born 1945)",
"DateAcquired": "1994-11-08",
"\ufeffTitle": "Headpiece (folio 18) from POSTCARDS FROM TRAKL",
"ObjectID": "19407"
}
I've raised this issue over at the Carnegie Museum's collection repo: what is the contributor policy for this repo, i.e. if you had a CONTRIBUTOR page, what would it say?
Since the data here are (presumably) generated from your internal collections management system, the usual pull request system may not work, as you would want to effect changes to either the content or presentation of these data in the upstream CMS and/or scripts. Should all suggested changes go through issues? And what would your process be for addressing them?
I understand this would likely involve part of a larger internal discussion by the maintainers - but it'd be great to have some process documentation.
It looks like all thumbnail URLs are HTTP which then get redirected to HTTPS. Could they be updated to be HTTPS in JSON already?
Some works have thumbnails which are for old versions of digitized images of works. Example:
https://www.moma.org/collection/works/2
Which in the dataset has thumbnail:
(https://www.moma.org/media/W1siZiIsIjU5NDA1Il0sWyJwIiwiY29udmVydCIsIi1yZXNpemUgMzAweDMwMFx1MDAzZSJdXQ.jpg?sha=137b8455b1ec6167)
But a comparable thumbnail size of the new version of a digitized image looks like:
(https://www.moma.org/media/W1siZiIsIjUyNzc3MCJdLFsicCIsImNvbnZlcnQiLCItcXVhbGl0eSA5MCAtcmVzaXplIDI3MngxNjhcdTAwM2UiXV0.jpg?sha=de1bbae3ef278e8f)
Could those thumbnails be updated?
Because some of the artworks in the collection change status on whether they can be included in this dataset (or our permissions for the image)
From #29
I would love to have a license code (e.g. cc-0) in the Artwork data, so if I'm working on something with images I can filter out anything that doesn't have the right license.
The documentation just says that images aren't included, but this comment makes it seem like some images might be available already.
On MoMA website I see that artworks can belong to a series. But this metadata is not available in the data dump here. Could it be added?
There are several items that would vastly improve the value of this database, beginning with a ULAN column for artist authority in the artworks cvs file. Separating the acquisition date into separate cells for day/month/year would also be very valuable and save researchers quite a bit of work.
It looks like there is an invalid comma at the end of Artworks.json:
tail Artworks.json
"CreditLine": "Mies van der Rohe Archive, gift of the architect\r\n",
"MoMANumber": "MR2.336",
"Classification": "A&D Mies van der Rohe Archive",
"Department": "Architecture & Design",
"DateAcquired": null,
"CuratorApproved": "N",
"ObjectID": 199449,
"URL": null
},
]
This makes it impossible to parse with a tool like jq
The README indicates that the dataset has more than 120,000 records, but the row count of the CSV is 65,500. Is there something I'm missing in in my clone, or something else I'm missing?
Thanks. I always appreciate an interesting new GLAM dataset!
When I link to a website I would expect that artworks with thumbnails have an image on the website. But this does not seem to be so. There are artworks which have thumbnails but do not have images. Examples (I can provide full list if needed):
Could this be brought in sync?
And that opens also the opposite question: are there artworks which do have images but do not have thumbnails in the dataset?
On the website, I see that works have classification (e.g., Furniture and interiors), but that is not available here. Could it be added?
Howdy!
This is a superb dataset and it's super exciting to see the Museum of Modern Art share it with the world.
I was wondering if you could help clarify the way your usage guidelines on the readme concerning derivative works and the license of this work interact. IANAL and it's confusing to me. 😦
In particular: [Emphasis mine]
"Do not misrepresent the dataset
Do not mislead others or misrepresent the dataset or its source. ... Whenever you transform, translate or otherwise modify the dataset, you must make it clear that the resulting information has been modified by you. If you enrich or otherwise modify the dataset, consider publishing the derived dataset without reuse restrictions."
And the license: [Emphasis mine]
"To the greatest extent permitted by, but not in contravention of,
applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
and Related Rights and associated claims and causes of action, whether now
known or unknown (including existing as well as future claims and causes of
action), in the Work (i) in all territories worldwide, (ii) for the maximum
duration provided by applicable law or treaty (including future time
extensions), (iii) in any current or future medium and for any number of
copies, and (iv) for any purpose whatsoever, including without limitation
commercial, advertising or promotional purposes (the "Waiver")."
I was considering playing with this dataset, but it seems like a plausible interpretation that:
Neither of these guidelines make sense for a work that has been given to the public domain.
The first clause seems to address citation/plagiarism, which is a very important concern. Attribution of the original work is key for avoiding plagiarism.
Plagiarism.org or many university citation guides, like this one explain this further.
However, beyond the academic context, there's no requirement that a derivative work contain the author's original name. In fact, some authors may choose to avoid the use of their real name on the internet.
The second clause encouraging me to avoid one of my favorite creative commons licenses seems misplaced. If I make a derivative work from a public domain work, the derivative work (as independent from the dataset itself) can be copyrighted however I please. The dataset remains in the public domain.
"The copyright in a derivative work covers only the additions,
changes, or other new material appearing for the first time
in the work. Protection does not extend to any preexisting
material, that is, previously published or previously registered
works or works in the public domain or owned by a
third party."-
copyright.gov
"The public domain comprises a body of knowledge and innovation over which no person or other legal entity can assert proprietary rights" - https://www1.villanova.edu/villanova/generalcounsel/copyright/edumaterial/plagiarism.html
Thank you so much for your time and thank you for sharing your data!
When clicking on the download button for this file the data is presented in the browser rather than downloading as a .csv file.
Out of 123,919 records, all but five of the acquired dates of the artworks read in a standardized YYYY-MM-DD format. The following four were in MM-DD-YYYY format, and I think it would be good to change them from 11-17-2009
to 2009-11-17
.
Row 110,555:
Untitled #136,José Antonio Suárez Londoño,"(Colombian, born 1955)",1997,Etching,"plate: 5 13/16 × 1 15/16"" (14.7 × 5 cm); sheet: 10 15/16 × 7 9/16"" (27.8 × 19.2 cm)",Gift of the artist through the Latin American and Caribbean Fund,1528.2009,Print,Prints & Illustrated Books,11-17-2009,N,133104,
Row 110,556:
Untitled #137,José Antonio Suárez Londoño,"(Colombian, born 1955)",1997,Etching,"plate: 5 13/16 × 1 15/16"" (14.8 × 4.9 cm); sheet: 11 × 7 1/2"" (28 × 19.1 cm)",Gift of the artist through the Latin American and Caribbean Fund,1529.2009,Print,Prints & Illustrated Books,11-17-2009,N,133105,
Row 110,557:
Untitled #138,José Antonio Suárez Londoño,"(Colombian, born 1955)",1997,Etching,"plate: 5 13/16 × 1 7/8"" (14.7 × 4.7 cm); sheet: 11 × 7 1/2"" (28 × 19.1 cm)",Gift of the artist through the Latin American and Caribbean Fund,1530.2009,Print,Prints & Illustrated Books,11-17-2009,Y,133106,http://www.moma.org/collection/works/133106
Row 110,558:
Untitled #139,José Antonio Suárez Londoño,"(Colombian, born 1955)",1997,Etching,"plate: 5 13/16 × 1 15/16"" (14.7 × 4.9 cm); sheet: 10 7/8 × 7 9/16"" (27.7 × 19.2 cm)",Gift of the artist through the Latin American and Caribbean Fund,1531.2009,Print,Prints & Illustrated Books,11-17-2009,Y,133107,http://www.moma.org/collection/works/133107
The last inconsistent artwork simply had the year 1941
without a month or day. While I understand that dates can be fuzzy in the art world (the date
header in the CSV file testifies to that), every other record is very consistent. Personally, I made a change from 1941
to 1941-01-01
and I'll likely attach a note describing a possible missing date. Can we get the true date acquired?
Row 132,209:
Two Figures Seated Beside a Corpse,Cândido Portinari,"(Brazilian, 1903–1962)",1939,Lithograph,"Composition: 5 9/16 × 7 1/8"" (14.2 × 18.1 cm)",Gift of the Artist,352.1941,Print,Prints & Illustrated Books,1941,N,179107,
On three works by Pierre Petit the date listed is l860s ?
(moma numbers 367.1981, 368.1981, and 369.1981)
Seeing how he was alive 1832-1909 I assume the date is supposed to be 1860s ?
Correct? I'll make this a pull request if that is the case.
Would you mind adding the artistId to allow linking to the artist-page like
http://www.moma.org/collection/artists/7056?
Record 77, id 102, in the JSON doesn't have the dimensions.
For consistency, can you add them, so all records have the same number of fields in the same order? Of course, in JSON you don't need to do that, as you inherently would in CSV.
The JSON data is more complete, but I'm pretty much treating it the same as the CSV, and am thrown off when some fields are missing.
No height and width, like all the preceding records have.
Thanks!
It seems there is at least one artwork (object ID 334) which has duplicate artists. This is visible by observing the ConstituentID
array which has 8158 twice.
On the website (in its search engine) I see that works have a numerical date (year) you can filter on. But in data here date is an arbitrary string. So there is already a cleaned version? Could it be added?
The one I found is MOMA Collection of Artsits. The README references something else that no longer exists.
I'm translating the moma collection and want to put it on a website. Most of the restrictions are easy to understand and follow, but
You must not use MoMA’s trademarks
Can I use moma in the subdomain? If not, I'll simply use "artworks" or something generic, but as the data is from MOMA (and of course all the attribution and disclaimers will be properly displayed, for clarity and credit I'd like to use moma in the subdomain (not domain, which would be confusing).
Flickr does not allow that term to be used in a subdomain, so I called my project "glimmer" (get it, "flicker" synonym?) Moma is clearer, the site is primarily a demo of how to use the translation tool, but the MOMA data is interesting so I thought I'd try to provide a demo of some value.
Hi there,
Just want to double check - what is the relevant column for curator approved artworks? Is it the column "Cataloged"? (values Y and N).
Could you please publish a clear documentation for this dataset that includes full explanations on each column and an indexed set of values?
Many thanks!
Hi,
Just for your information, we're unable to fetch your data due to github data quota limit.
Are there other mirrors for this repository?
Downloading Artists.csv (1.0 MB)
Error downloading object: Artists.csv (bb2d7a6):
Smudge error: Error downloading Artists.csv (bb2d7a697dac8cf19a38a0675aec400ec5c840862c44ee6398b0863f1f0a0f6b):
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
Thanks
Hi,
Thanks for sharing this valuable database for easy use. I'm wondering if you're willing to share glossary of terms too (https://www.moma.org/learn/moma_learning/glossary) I know it's not that hard to extract a json from that page myself, but an official release from Museum is more valuable and I can rely on updates, etc.
{
"Title": "Ferdinandsbrücke Project, Vienna, Austria (Elevation, preliminary version)",
"Artist": [
"Otto Wagner"
],
"ConstituentID": [
6210
],
"ArtistBio": [
"Austrian, 1841–1918"
],
"Nationality": [
"Austrian"
],
"BeginDate": [
1841
],
"EndDate": [
1918
],
"Gender": [
"male"
],
"Date": "1896",
"Medium": "Ink and cut-and-pasted painted pages on paper",
"Dimensions": "19 1/8 x 66 1/2\" (48.6 x 168.9 cm)",
"CreditLine": "Fractional and promised gift of Jo Carole and Ronald S. Lauder",
"AccessionNumber": "885.1996",
"Classification": "Architecture",
"Department": "Architecture & Design",
"DateAcquired": "1996-04-09",
"Cataloged": "Y",
"ObjectID": 2,
"URL": "https://www.moma.org/collection/works/2",
"ImageURL": "https://www.moma.org/media/W1siZiIsIjUyNzc3MCJdLFsicCIsImNvbnZlcnQiLCItcmVzaXplIDEwMjR4MTAyNFx1MDAzZSJdXQ.jpg?sha=712ac0fd74ea5bd5",
"OnView": "",
"Height (cm)": 48.6,
"Width (cm)": 168.9
}
It'd be great if all these fields were documented.
In the process, I think it might evoke a conversation about the structure, like why is there a "Gender" field on an artwork? And "BeginDate" and "EndDate" on an artwork sounds like when the artwork was created, not the birth/death dates of the artist. If BeginData were moved to inside the artist then it could also be called birthYear and be an integer rather than an array.
Tiny issues. I know how hard it is to manage the schema for this kind of data, and I appreciate it very much that it is even available as is! Thanks for making it available!
There are obviously 2 artists here, but the bio and some other fields are inconsistent in the number of records.
From Artworks.json
The csv is even more difficult to parse when there are arrays.
More generally, it feels like the artist fields shouldn't be repeated in the Artwork, but rather embedded (in JSON for the CSV)
{
"Title":"(title)",
"Artist": [
{"id": 123, "bio": "artist bio"},
{"id": 234, "bio": "artist bio"}
]
}
I understand from the README that the goal is to update this data extract regularly to reflect changes in the upstream CMS. Has there been any movement towards formalizing this update process?
A single artist, 67622 Feliza Bursztyn, has the gender "female" not "Female", which is bothersome when trying to do data mining. I would recommend changing all to consistent caps.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.