Ingest data from Programming Historian
Programming Historian features a number of relevant training materials. The materials are written in Markdown and available on github
The individual lessons feature a structured header that should be of primary concern for mapping to SSHOCMP data model.
---
title: "Analyzing Documents with TF-IDF"
collection: lessons
layout: lesson
slug: analyzing-documents-with-tfidf
date: 2019-05-13
authors:
- Matthew J. Lavin
reviewers:
- Quinn Dombrowski
- Catherine Nygren
editors:
- Zoe LeBlanc
review-ticket: https://github.com/programminghistorian/ph-submissions/issues/206
difficulty: 2
activity: analyzing
topics: [distant-reading]
abstract: This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.
mathjax: true
---