The American Society for Indexing held its 2023 Virtual Conference, “The Eyes Have It: The Indexer Perspective–Past, Present & Future,” on Friday, April 28, 2023 and Saturday, April 29, 2023. Four sessions were held virtually on Zoom each day.
The first session on Friday, Metadata at The New York Times: Organizing and Leveraging News Content from 1851-Today, was presented by Senior Taxonomist Jennifer Parrucci. She showed how The New York Times is ahead of the curve when it comes to metadata and how that legacy has turned into a thorough process for content classification. She also outlined how The Times leverages this metadata to push content to consumers, pull analytics and make sure things are easily findable in the archive.
She explained that clippings of newspaper articles and photos were kept in the Morgue in the basement of the newspaper building. These were pasted to a card and kept in a card catalog to be searched. The New York Times Index, a red reference book, was published from 1913 through 2017. The index includes articles from the first issue in 1851 on. Later, the newspaper could be searched by microfilm or microfiche. Times Tags have been used for The Times website in the present.
Times Tags
- Controlled vocabulary based off of the original index terms.
- Named entities (over a million) of people, places, organizations, and titles
- Subjects ~5,000. These include semantic relations, broad, narrow and related terms and scope notes, and news events.
- Assigned to all published assets.
- Rule-based software
- Entity extraction: normalization, disambiguation
- Normalization example of choosing just one spelling for Qaddafi, Muammer el-
- Disambiguation example of distinguishing between different names: Brown, Michael DeWayne (1954-) by adding certain subject words
- Categorization: frequency, proximity, placement
- Entity extraction: normalization, disambiguation
- Do not tag peripheral mentions, figurative language, historical references, spokespersons, people quoted
Tags are used today for Collections/Topics, Daily Email Recommendations for readers, Advertising, Search, and Audience Analytics.
The Archives go back to 1851. The Times Machine is a tool for subscribers that is a digital microfiche, that is searchable and gives a link to buy reprints. She highlighted some challenges with the Archives:
- Ambiguous names and clashes in archives.
- Offensive and outdated index terms in archives. Need to update, without sanitizing history.
- Problem digitizing list with label name as identifier.
- Index knowledge passed down as apprenticeship. Problem knowing what something was called without scope notes. For example, Watergate.
In the next blog posting, I will discuss the next Friday session of the ASI 2023 Virtual Conference. For more information about the services provided by the author of this blog, see the Stellar Searches LLC website, http://www.stellarsearches.com.
Tags: American Society for Indexing, archives, metadata, newspaper indexing, The New York Times, Virtual Conference