R-Packages

R-Package: ReinforcementLearning

This package performs model-free reinforcement learning in R. The implementation enables the learning of an optimal policy based on sample sequences consisting of states, actions and rewards. In addition, it supplies multiple predefined reinforcement learning algorithms, such as experience replay.

R-Package: SentimentAnalysis

This package performs a sentiment analysis of textual contents in R. The implementation utilizes various existing dictionaries, such as Harvard IV, or finance-specific dictionaries. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.

Datasets

SentimentDictionaries

This library provides domain-specific dictionaries for sentiment analysis. Each dictionary consists of words that statistically feature a positive or negative polarity in movie reviews or financial filings The dictionaries are extracted from two different corpora, namely, IMDb movie reviews and U.S. regulated Form 8-K filings. Details are available from the following reference.

    • Pröllochs N, Feuerriegel S, Neumann D (2017): Statistical Inferences for Polarity Identification in Natural Language, Working Paper, Chair for Information Systems Research, University of Freiburg, Germany.

Details

This library contains the following dictionary resources in CSV format.

    • Movie reviews dictionary : This dictionary contains words that feature a positive or negative connotation in IMDb movie reviews (DictionaryIMDB.csv),
    • Financial filings dictionary: This dictionary contains words that feature a positive or negative connotation in U.S. regulated 8-K filings (Dictionary8K.csv).

The individual columns of each dictionary are as follows:

    • Words: This column lists the individual dictionary entries. We provide stems instead of complete words as stemming is part of the document preprocessing.
    • Scores: This column denotes the polarity score of each entry.
    • Idf: This column denotes the inverse document frequency (idf) of each entry.

Usage in R

We also provide both dictionaries in the form of a package for the statistical software R. You can install SentimentDictionaries from github with:

# install.packages("devtools")
devtools::install_github("nproellochs/SentimentDictionaries", subdir = "R-package")

Both dictionaries can be easily used in combination with the SentimentAnalysis R package.

SentimentDictionaries on GitHub: https://github.com/nproellochs/SentimentDictionaries

NegatedSentences

This repository provides annotations of negation scopes for 500 sentences from IMDb movie reviews. The dataset is labeled manually by two external persons (Annotator A and Annotator B). Each sentence contains at least one explicit negation phrase from the list of Jia et al. (2009). The labeled sentences can, for example, be used in machine learning models that aim at learning accurate negation scopes for sentiment analysis. Details are available from the following reference.

    • Pröllochs, Feuerriegel and Neumann (2017): Understanding Negations in Information Processing: Learning from Replicating Human Behavior, Working Paper, Chair for Information Systems Research, University of Freiburg, Germany.

Details

This library contains the following resources in CSV format.

    • Negation Labels Annotator A: This file contains the annotations from Annotator A (sentences_annotator_a.csv).
    • Negation Labels Annotator B: This file contains the annotations from Annotator B (sentences_annotator_b.csv).

The individual columns of each resource are as follows:

    • Id: This column assigns a unique Id to each sentence.
    • Sentence: This column contains the sentences that are labeled by the two human annotators.
    • IsNegated: This column contains the negation pattern for each sentence. The value T denotes that a word is marked as negated by the human annotator, whereas F denotes that the word is marked as not negated.

NegatedSentences on GitHub: https://github.com/nproellochs/NegationDataset

NYSE CIK Ticker Symbol Master List

This file in CSV format links EDGAR CIK Numbers to stock ticker symbol. The list includes all companies listed at the New York Stock Exchange (NYSE) as of February 18, 2018. Furthermore, the file includes additional columns referring to market capitalization, company name, market capitalization, etc..
The file can be downloaded here.