Final Fantasy XIV Network

A Social Graphs and Interactions Project

1 - Introduction

The Final Fantasy series had an incredible effect on the gaming world. Developed by Square Enix and released for the Nintendo Entertainment System (NES) in 1987, the game was considered an outlier at the time: games had simplistic plots and little to no character development. For this reason, the game was named “Final Fantasy,” for the game’s director thought it would be a failure for being a strange game at the time and would end his career. Nonetheless, history differed from expectations. Known for its rich lore, deep interactions between characters, and outstanding music, the series changed the role-playing genre forever. The Final Fantasy series became an international hit, selling more than 164 million units worldwide as of October 2021.

Out of all the 95 Final Fantasy games, we chose Final Fantasy XIV for the following reasons:

  • It is the latest massive multiplayer online role-playing game, which has a relatively large number of characters, clans, and races.
  • The wiki is well documented and detailed.
  • Non-linear storyline.

The final goal of this project is to understand the game’s characters and world better through network and text analysis, comprehend how the characters are related to each other, and see the importance of the characters in their respective communities.

If you are interested about how we reached our goal or you want to see our code you can check this link to have access to our Explainer Notebook.

2 - Dataset

We analyzed two datasets for Final Fantasy XIV:

  • The fan wiki of Final Fantasy XIV

The Final Fantasy XIV’s wiki was used to download all characters' descriptions and attributes.

The data was extracted using the fandom wiki’s API and stored in JSON. Regular expressions were utilized to extract the description of each character along with the attributes of interest: race and affiliation. Other attributes were also extracted: gender, age, and occupation.

If you want to download the files containing the characters data click here. (You will have access to a zip file containing a .txt file for each characters of the wiki.)

  • The game’s dialogue

All the dialogue between characters was obtained for text and sentiment analysis to explore the game’s characters world through text interactions.

2.1 - Data Preprocessing

Our strategy was to use regular expressions that have more false positives than false negatives and then clean the edge cases manually based on our knowledge of the dataset. The following data preprocessing was performed:

  • Duplicates Removal: Some characters are referred to using different names in the wiki. This was detected and fixed by keeping only the unique characters.
  • Redirects Handling: The redirects were detected using a regular expression, then the correct character pages were manually hard-coded to refer to the right link
  • Class Re-Assignment: Some classes have different synonyms. For instance, the race “hume” is the same as “hyur.” This was fixed by re-assigning the same categories in such cases.
  • Regex Fixing: Edge cases in which our regular expressions obtained false positives and negatives were fixed manually by assigning the false values to the correct ones.

For text analysis, the following was done:

  • Removal of stop words and punctuation: Common stop words and punctuation were deleted so that the text analysis is not influenced by it
  • Tokenization and lemmatization: The text was tokenized and lemmatized in the dialogue and for each character.

2.2 - Data Statistics:

Number of characters: 385

Total Data size: 3.21 MB

Number of links (Directed): 1477

Number of links (Undirected): 1131

Most common characters (in-degree): Alphinaud Leveilleur, 56

Most common characters (out-degree): Alphinaud Leveilleur, 33

3 - Network

The network created contains 385 nodes, each one related to character and 1472 edges. Firstly, we discarded the isolated nodes resulting in 335 remaining nodes. Afterwards, we extracted the Giant Component obtaining the final network which has 320 nodes and 1449 links.

Network
FFXIV (Undirected) Network

While there are significant hubs, the network is not dominated by a single entity. It is expected that in an online RPG, the protagonist is not as critical to the game’s plot development relative to single-player games.

3.1 - Network Analysis

Studying the in-degree we found that the 5 most connected characters are:

  1. Alphinaud Leveilleur(57)
  2. Gaius van Baelsar(37)
  3. Y’shtola Rhul(35)
  4. Zenos yae Galvus(35)
  5. Thancred Waters(34)

Analyzing the out-degree we can see that the 5 most connected characters are:

  1. Alphinaud Leveilleur(33)
  2. Lyse Hext(30)
  3. Y’shtola Rhul(26)
  4. Elidibus(25)
  5. Thancred Waters(22)

Alphinaud Leveilleur is the most connected character both in the in-degree and out-degree sets. He is a significant character in the game. It might be surprising that the main character is not the most connected one, but it is sensible in the context of the dataset: the protagonist of the game is silent. Typically in such plots, a companion character is talkative and well-connected. This is to minimize boredom and break the monotone of silence in the game.

The plots of the degree distributions are reported below.

In-Degree Distribution Out-Degree Distribution

In-Degree distribution shows that only a few characters have a degree higher than 25, while the great majority of them are between 0-10.

Out-Degree distribution shows a similar behaviour with the majority of the nodes below a degree of 10.

Graphically, the in-degree distribution appears to follow a power law distribution, indicating a scale-free network (real network). The out-degree distribution seems to follow a power law distribution, but it is less skewed to the right, and is more difficult to interpret.

We use the powerlaw function below to fit a power distribution and get the gamma values for further analysis.

In-Degree Powerlaw Out-Degree Powerlaw

For the in-degree distribution, the exponent value (γ=2.33) indicates that the network is in the scale-free regime (ultra-small world). As for the out-degree distribution, the exponent is almost equal to the critical value of 3 (γ=3.01). While this makes the out-degree distribution more random-like, it is still not the same as a random network, for the value indicate the presence of a double logarithmic correction lnlnN which shrinks the distances of this network relative to a random network.

We will compare our network below to a random one of the same probability for further confirmation.

Random Network Degree Distribution of Wiki

As seen above, the network’s degree distribution significantly deviates from the expected random network. Our graphical, power-law and random network comparison are thus in agreement.

4 - Text Analysis

In order to perform the text analysis, we decided to prepare the data by applying tokenization and lemmatization to the character descriptions obtained from the wikipages. Words related to the structure of the wiki and not to a specific character of the story, where also removed from the text.

The frequency distribution of the 75 most common tokens is reported below. Frequency Distribution

The frequency plot of the most common words is as expected: the two most common words are “final” and “fantasy,” followed by “warrior” and “light.” It is worth mentioning that the main character of the game is named “The Warrior of Light.” Therefore, the high frequency of these two words is unsurprising.

4.1 - Word Clouds

Word clouds are a visual representation of text data, typically used to depict keyword metadata (tags) on websites, or to visualize free form text. In this case, the tags are single words, and the importance of each tag is shown with font size: bigger term means greater weight.

Goal: Analyze the word cloud for some groups of characters

Strategy:

  • Obtain the 3 greatest Affiliations, and the characters who belong to them.
  • Calculate the TF and TF-IDF values for each affiliation
  • Analyze the wordcloud over the text previously obtained.

The greatest Affiliations are:

Garlean Empire 16

High Houses of Ishgard 10

Doma 9

The output of the wordclouds we obtained are given below.

Wordcloud

5 - Finding Communities

As we are interested in analyzing the relations amongst the characters, we want to try to detect communities and study their characteristics. In order to perform this analysis, we are going to apply the Louvain Algorithm for community detection to the undirected network created previously.

Community Distribution

The algorithm detects 6 small communities that have less than 10 characters, and 8 bigger communities that have between 20 and 50 characters.

In the following plot the network is represented with a different color for every community.

Community Distribution 2

The characters belonging to the 5 most populous communities have been reported below.

Community 1 Community 2 Community 3 Community 4 Community 5
Alphinaud Leveilleur Y’shtola Rhul Estinien Wyrmblood G’raha Tia Noah van Gabranth
Alisaie Leveilleur Thancred Waters Regula van Hydrus Tataru Taru Gerolt Blackthorn
Y’shtola Rhul Urianger Augurelt Buscarron Stacks Cid Garlond Lina Mewrilah
G’raha Tia Krile Mayer Baldesion Foulques Biggs and Wedge Adalberta Sterne
Estinien Wyrmblood Minfilia Warde Ywain Deepwell Gaius van Baelsar Aldis
Krile Mayer Baldesion Louisoix Leveilleur Zhai’a Nelhah Nael van Darnus Deep Canyon
Tataru Taru F’lhaminn Qesh Lalai Lai Livia sas Junius Leavold
Unukalhai Unukalhai Waldeve Midas nan Garlond Mylla Swordsong
Biggs and Wedge Emet-Selch Ysayle Dangoulain Nero tol Scaeva Wide Gulley
Igeyorhm Elidibus Thordan VII Rhitahtyn sas Arvina Brithael Spade
Fordola rem Lupis Lahabrea Alberic Bale Vitus quo Messalla Khloe Aliapoh
Regula van Hydrus Igeyorhm Haldrath Eline Roaille T’kebbe Morh
Buscarron Stacks Nabriales Heustienne de Vimaroix Bahamut Zhloe Aliapoh
Foulques Loghrif Lucia Junius Tiamat Ejika Tsunjika
Ywain Deepwell Mitron Rasequin Mide Hotgo Mikoto Jinba
Pipin Tarupin Niellefresne Thaudour Thordan I Matoya Ramza Beoulve
Ysayle Dangoulain Severian Lyctor Hraesvelgr Seiryu Alma Beoulve
Thordan VII Midnight Dew Nidhogg Genbu Fran Eruyt
Alberic Bale Fourchenault Leveilleur Ratatoskr Sophie Ashelia B’nargin Dalmasca
Haldrath Ardbert Faunehm Soroban Rasler B’nargin Dalmasca
Heustienne de Vimaroix Branden Orn Khai Feo Ul Ba’Gamnan
Lucia Junius Cylva Vedrfolnir An Lad Eureka(primal)
Rasequin Lamitt Vidofnir Ezel II Mutamix Bubblypots
Thordan I Nyelbert The Steps of Faith Titania Alma bas Lexentale
Hraesvelgr Renda-Rae Midgardsormr Tyr Beq Jenomis cen Lexentale
Nidhogg Ryne Shiva Doga Ramza bas Lexentale
Ratatoskr Beq Lugg Ravana Unei Drake Rhodes
Tiamat Lue-Reeq Knights of the Round Ultima Weapon Jalzahn Daemir
Faunehm Gaia Sephirot Alexander Rowena
Orn Khai Giott Sophia Quickthinx Allthoughts Bajsaljen Ulgasch
Vedrfolnir Granson Zurvan Cloud of Darkness F’hobhas
The Steps of Faith Ran’jit Chieftain Moglin Radovan Jihli Aliapoh
Mide Hotgo Lanbyrd Kazagg Chah Omega
Midnight Dew Olvara Lightning
Fourchenault Leveilleur Seto Noctis Lucis Caelum
Matoya Sul Oul Shantotto
Seiryu The Twelve Garuda
Genbu Hydaelyn
Soroban Zodiark
Beq Lugg Final Coil of Bahamut
Lyna Bismarck
Tesleen Hythlodaeus
Halric Sauldia
Chai-Nuzz Tadric
Dulia-Chai
Tristol
Feo Ul
Shiva
Bismarck
Ravana
Sephirot
Sophia
Zurvan
Alexander
Susano
Lakshmi
Brayflox Alltalks
Chieftain Moglin
Ga Bu
Quickthinx Allthoughts

Characters belonging to the same community, are expected to be more connected to each other in the game compared to character belonging to different communities. To prove this point more accurately, it would be necessary to analyze the game more in deep and gather information regarding the actual story behind each character and their true relations with each other in the game. We will not continue this analysis. Instead, we are going to analyze which words are the most representative of each community and finally, we will study the average sentiment of these communities, to understand if there is a common positive or negative feeling among the character of a group.

5.1 - Common words in the communities

We identified the most descriptive words related to each community. The objective is to understand if different communities have different related words. We used two different methods: the TF and and the TF-IDF. The first one will take into account how much a specific word appear in the text, while the other will also consider how often that word appear through the whole database, adding value to those words that are more characteristic of a specific community, therefore more relevant.

The most common words using TF for the top 5 communities are like given below:

Alisaie Leveilleur’s community Estinien Wyrmblood’s community Warrior of Light’s community Alphinaud Leveilleur’s community Edmont de Fortemps’s community
man garuda yoshida ga marcelloix
woman final naoki bu character
player fantasy final alisaie final
hyuran messenger fantasy kobold fantasy
imp xv april final ehll

The most common words using TF-IDF for the top 5 communities are like given below:

Alisaie Leveilleur’s community Estinien Wyrmblood’s community Warrior of Light’s community Alphinaud Leveilleur’s community Edmont de Fortemps’s community
woman garuda yoshida alisaie marcelloix
hyuran messenger naoki kobold ehll
imp wind april titan francel
unsavory xv fool warrior family
nero statue director bu craftsman

It appear clear how the TF-IDF analysis gives more interesting results in identifying the most relevant words of a community, by eliminating recurring words as ‘fantasy’, ‘player’ and ‘final’ that are clearly related to the game itself, and thus very common in all the communities. We can also see from the results that the second method does not ever return the same word for different communities, as it happen in the TF analysis, confirming that the TF-IDF is more accurate in detecting the relevant words in a community.

6 - Sentiment Analysis

We now proceed performing a sentiment analysis of the communities, after we gather the adequate text information related to the story of the characters.

We are going to perform different analyses. Firstly, we are going to use the LabMT dataset, to calculate the sentiment of each token, created from the textual description of the characters obtained from the wikipages, after proper tokenization and lemmatization. Subsequently, we are going to repeat the same process, this time using the text obtained from written dialogues extracted from the game. This second analysis is expected to return a better understanding of the sentiment of the characters, since it will contain more textual information related to the feelings and sentiments of the characters during the game.

Lastly, we will use the VADER sentiment analysis tool, applied on the dialogues text. This method will also take into account different aspects of the speeches, like the use of punctuation, capital letters and specific expressions.

6.1 - Sentiment Analysis 1

Data: Text description from the Wikipages

Method: LabMT

Sentiment Distribution

All values are just above 5, showing that there is a slight positive sentiment, althogh since the text used for this analysis was collected from the descriptions of the characters, the information they carry about feelings and sentiments of the characters is quite poor, hence the result is expected. We now repeat the analysis with a more relevant database: written dialogues from the videogame.

6.2 - Sentiment Analysis 2

Data: Dialogues from the Videogame

Method: LabMT

Average Sentiment

As a result, all values are all between the value 5 and 6, meaning there is a slight positive average sentiment in all communities. The error bars do not show a great variance in the values, meaning that the character in every community take all similar values. We now proceed with the VADER analysis, using the Dialogue dataset.

6.3 - Sentiment Analysis 3

Data: Dialogues from the Videogame

Method: VADER

Average Sentiment Vader

The VADER analysis gives more interesting results. All communities scored a value greater than zero, that means the average sentiment is positive, in particular, the Warrior of Light (Final Fantasy XIV) ’s community, scored 0.869, meaning a great positivity in the text. In this case we can also see from the error bars that there is much more variance in the characters of one community. For example, the Lyse Hext ’s community scored the positive value 0.636 even though the error bars indicate at least one character got a negative result.

7 - Conclusion

We analyzed Final Fantasy XIV’s character descriptions and dialogue in this project. After data extraction and preprocessing using the Final Fantasy Fandom Wiki’s API and regular expressions, a directed network was built and found to be behaving as a sparse, scaled-free network. The most linked characters met our expectations based on knowledge of the game, as they are central characters with a significant influence on the plot and the outcomes of the game.

The communities found through the Louvain Algorithm have different relevant words, calculated using the TF-IDF algorithm, which could be caused by differences in the plot of the story for those groups of characters. The following sentiment analysis enabled us to study the positivity of the communities, and the differences of sentiment between the text descriptions of the characters and actual dialogues from the game. Using the LabMT database, the sentiment analyses resulted in a mild positive sentiment, both for the text description and for the dialogues, meaning either the words collected from the texts could not transpares the real feelings or the characters truly had a rather neutral sentiment. More interesting results were achieved through the VADER algorithm, that adding the punctuation to the analysis and studying the meaning of the whole sentences rather than the single words, managed to obtain a more realistic description of the feeling in the communities, although with higher variance amongst the characters.

The analyses could be improved by adding more information for every character, for example more dialogues, or more backstories. In this way we would be able to better analyze the sentiment of each character and also create more accurate communities, that could be based on their characteristics instead of their connections. For the purpose of this project however, we are satisfied with the knowledge we managed to gather by studying the information that we had about the game, its characters and the their relations.