Final Fantasy XIV Network

A Social Graphs and Interactions Project

2 - Dataset

We analyzed two datasets for Final Fantasy XIV:

  • The fan wiki of Final Fantasy XIV

The Final Fantasy XIV’s wiki was used to download all characters' descriptions and attributes.

The data was extracted using the fandom wiki’s API and stored in JSON. Regular expressions were utilized to extract the description of each character along with the attributes of interest: race and affiliation. Other attributes were also extracted: gender, age, and occupation.

If you want to download the files containing the characters data click here. (You will have access to a zip file containing a .txt file for each characters of the wiki.)

  • The game’s dialogue

All the dialogue between characters was obtained for text and sentiment analysis to explore the game’s characters world through text interactions.

2.1 - Data Preprocessing

Our strategy was to use regular expressions that have more false positives than false negatives and then clean the edge cases manually based on our knowledge of the dataset. The following data preprocessing was performed:

  • Duplicates Removal: Some characters are referred to using different names in the wiki. This was detected and fixed by keeping only the unique characters.
  • Redirects Handling: The redirects were detected using a regular expression, then the correct character pages were manually hard-coded to refer to the right link
  • Class Re-Assignment: Some classes have different synonyms. For instance, the race “hume” is the same as “hyur.” This was fixed by re-assigning the same categories in such cases.
  • Regex Fixing: Edge cases in which our regular expressions obtained false positives and negatives were fixed manually by assigning the false values to the correct ones.

For text analysis, the following was done:

  • Removal of stop words and punctuation: Common stop words and punctuation were deleted so that the text analysis is not influenced by it
  • Tokenization and lemmatization: The text was tokenized and lemmatized in the dialogue and for each character.

2.2 - Data Statistics:

Number of characters: 385

Total Data size: 3.21 MB

Number of links (Directed): 1477

Number of links (Undirected): 1131

Most common characters (in-degree): Alphinaud Leveilleur, 56

Most common characters (out-degree): Alphinaud Leveilleur, 33