Skip to content
D

Digital DNA Compression to Detect Social Media Bots

Here, Digital DNA is a way to encode social media activity. It is compressed as an entropy measure to detect social media bots. This repository documents the code I use in my Bachelor's thesis to replicate Pasricha's & Hayes' (2019) Digital DNA compression approach. After generating the Digital DNA string, it is compressed and used for supervised learning to tell bots and genuine users apart. Standard metrics are used for evaluation purposes. Firstly, I exactly replicate their approach using the original code and the MIB dataset, which includes genuine Twitter users as well as bots. Secondly, I conceptually replicate their approach on the Twibot-20 dataset in four test cases. (A more detailed description of what Digital DNA does and what I replicate how can be found in my thesis. Read the section "Research Motivation and Procedure" in Chapter 1 to get a first idea.) My findings are that the results of Pasricha & Hayes (2019) can be fully replicated and confirmed. Applying their approach to the newer Twibot-20 dataset does not provide satisfactory results. This suggests the Digital DNA compression approach tested is not a suitable detection method for advanced bots.