Science and Social Science Data


The collection of data for the purpose of supporting current teaching and research requirements across the sciences and social sciences occurs primarily on a per-request basis. Yale Library has a long history of collecting numeric data in the social sciences, which have been supported by the Social Science Data Archive (SSDA) since 1972. Collection of science numeric data as part of the broader data collection program is a more recent practice, and is being done in conjunction with the subject specialists.

The social science numeric and humanities data resources collected by the Marx Science and Social Science Library include surveys of public opinion, economic behavior, and electoral behavior; census and demographic data; economic and social indicators; trade data; elections data; and geospatial data. Science data currently collected by Marx Library includes remote sensing data and ecological indicators but could expand to other science datasets in response to developing curricula and changing researcher needs.

All decisions regarding the collection of data are made in collaboration with the selectors in the subject areas for which those data will be most relevant.  Please refer to the collection development statements for a specific discipline for more information about the scope of materials collected within that subject area.

Data that is collected by the Marx Library is hosted on a local server and is accessible to current Yale affiliates.

Departments/disciplines/programs/subject areas supported

Data is collected in all subject areas supported by the Marx Library.


There are a number of current service subscriptions that fall into the realm of data, including the Bloomberg and Datastream terminals and online services like Social Explorer and SimplyMap. Databases of this type will also be purchased at the discretion of selectors in individual fields.

Selection of new data resources at Marx Library is done primarily in response to direct researcher request. Decisions about acquisition are based on the following:

  • Cost
  • Terms of use/license agreement. The ability to make the resource available to the Yale community through networked access is preferred. Datasets that require a confidentiality agreement, individual registration, or are otherwise restricted are excluded.
  • Applicability to a wide set of research and teaching interests
  • Subject coverage that is relevant
  • Quality of data and documentation
  • Software-independent formats are preferred, but other formats will be considered

Formats collected

Formats and materials acquired generally:

Numeric data is made available in several different ways. The term dataset is used here to describe a distinctive, stand-alone, bundled (in one or more files) extent of data, such as all the responses collected for a particular survey.

The Marx Library acquires datasets for mounting on a local server from which they can be retrieved after authentication though the currently deployed Yale authorization/authentication system (example: Latin American Public Opinion Project).

The Marx Library acquires datasets on media such as CD-ROM or DVD-ROM (example: CPS utilities - education & school enrollment) that can either be checked out from the circulating collection or used on designated workstations within the library, depending on the license agreement and other factors. In addition to data and documentation, datasets on CD or DVD typically include proprietary software to select, analyze and display data.

Numeric data may also be found within online databases licensed by the library, from which the end user selects datasets and data elements based on their criteria of need (example: ASEP/JDS data bank). These databases may provide for online statistical analyses of data, downloading of data for offline statistical analysis, or both.

By institutional subscription or membership, Yale also provides its researchers access to large online repositories of numeric data in the social sciences, such as the Roper Center Public Opinion Archives and ICPSR.

Languages collected

Numeric data resources licensed or purchased by Marx Library will generally be usable by English-language readers. Some data resources from studies conducted in non-English speaking countries may contain non-translated components; for example, while the statistical software data or setup file for a survey conducted outside the USA will provide English-language labels, the instructions to interviewers or the survey instrument may only be available in the language(s) of the country/countries where the survey was conducted.

Chronological and geographical focus

Due to the nature of numeric datasets, there is a focus on collecting material generated in current formats. This does not, however, mean that material produced in the past is not collected. This would include data that was part of previous censuses.

There is no specific geographical focus of the data collection, and material is collected for all regions relevant to Yale research interests. Material for a specific region is collected in consultation with the International Collections librarians at Sterling Memorial Library.

Collaborations within Yale

Data is collected in collaboration with the subject selectors in the Marx Library, the Medical School Library, the Law Library and Sterling Memorial Library.

Subject Librarian

Barbara Esty
Data Librarian
Marx Science and Social Science Library
+1 (203) 432-4587