Analysis Methods

The purpose of this investigation is to determine whether it is possible to measure quality of a story in an unbiased and reproducible way.

Rubric analysis is a well-known evaluation method for large bodies of text. It basically means that you define seperate categories/themes/topics to evaluate the text (the so-called rubric-axes) and score the text on each category. To guide the scoring process, quantitative scores are accompanied by arguments that state what awards a certain score (the so-called descriptors). If you are interested. If you are interested, you can find more about rubric scroing methods here:

The scope of our analysis are serious Hermione/Viktor Harry Potter fanfictions (so-called vikmiones), meaning:

You can find the rubric that was used for this narrative scope here. The rubric has been tuned to the scope, but formulations and criteria have been specifically formulated in a way to only judge the quality of execution of the work; not to punish any creative choices on plot, characterizations, worldbuilding, etc. As such, the rubric is aiming to be discrimination-free within the scope.

Next, a suitable portion of vikmione has to be selected, and the rubric has to be applied to each of them. This is done with generative AI (we used ChatGPT, with GPT-5.3). It is no certainty that an AI reader is able to judge a story better then a human reader. AI evaluation was chosen for its consistency and reproducibility, rather than for superior interpretative ability. Finally, the results were interpreted using statistical analysis. Read below about the details on each of the different steps.

Dataset selection

Vikmione stories were selected from AO3, one of the largest sites to host fanfiction in the world. However, it also has a detailed tag-system on relationships, characters, and other story elements that allow us to easily and effectively sweep the site content, and a standarised PDF-export. The PDF-export is an important feature, as AI-readers are heavily influenced by the format and structure of the documents offered. By using a standarised PDF-export for each investigated story, this type of bias can be effectively eliminated from the story.

You can use the AO3 tag-system to search for vikmione stories here. In this analysis, we did not consider vikmiones from another source then AO3 (because those document formats are different). We sampled the site at date 9th of APril 2026.

We used the following criteria to reduce the set of all vikmiones on AO3 to a workable scope of 'serious' stories:

On the 9th of April 2026, the total collection of Hermione/Viktor stories was 1716 works. The above selection steps reduced this to 122 works. See the definition of the vikmione scope above. usually, when Hermione/Viktor is combined in a story with another Hermione-pairing, the Hermione/Viktor-pairing is not endgame. This is not a hard-enforced rule, but a very strong pattern. Hence, together with the wordcount-filter (longer fics with meaningful narrative content), crossover-filter (our scope is pure Harry Potter) and the language-filter, this is a reasonable attempt to identify the scope of the analysis.

This reduced the datatset further from 122 stories to 30 stories. The list of these stories can be found here. These are the stories that comprise the full dataset of our analysis. Note that some stories in this dataset do not pass the wordcount-filter. However, they were included because it was known that they match the scope. As such, the above criteria should not be viewed as absolute or justifiable, but as a first-attempt to identify a suitable dataset on AO3 matching our scope.

Hence, if there are any other stories that you feel should be included in this analysis (because they match the above scope), you are welcome to contribute to this analysis. You can create a pull-request here and suggest additional stories. Provide a clear argumentation as to why the story belongs in the scope, and note that the story must be available on AO3 (this is a hard-requirement), because biases from different document-formats cannot be accepted.

Evaluation Pipeline

To apply the rubric to the Vikmione stories in our dataset, the following procedure was used:

Score all supplied PDFs against the above evaluation model. Briefly argue each score using the source material. Rely solely on the descriptors for calibration of the scoring scale. Do not compare between the stories at all. We want independent and non-relative scores. NB: use only the narrative body text. Do NOT use or infer information from: Tags, Author Notes, Summary, Chapter Titles, or other metadata of any kind. If such elements are present in the text, ignore them entirely.

Irregularities:

NB: Do not take into account whether plot circumstances look like canon, or are extreme. What matters is, would the characters act the same as their canonical counterparts do, under those new (and possible more extreme) circumstances. Use causal reasoning. We are solely interested in the characters, do not weight the circumstances. Rescore all documents. Note that this is a less strict rule then what you previously applied.

But note that this is no guarantee. One still must carefully read the justifications and scores and then determine whether this sounds reasonable. As such, Canon-Consistency is simple less reproducible with LLMs then the other axes.

Statistical Analysis

All stories in the dataset were evaluated twice (independently) for each rubric axis using the above procedure. Afterwards, a total score was assigned for each evaluation separatly (procedure is described (here). Next, the data was combined with various metadata-fields from AO3, such as completion status, wordcount, number of hits/kudos/comments, etc. Generative AI was also used to provide content-overviews such as a plot summary, strongest and weakest aspect of the story, and Viktor Krum's role in second wizard-war. The resulting data-table can be viewed here.

The results were further analysed using the following statistical methods:

Now, Cronbach's alpha measures the internal consistency of the rubric, Spearman correlation measures relative/ordinal stability of the rubric and MRD measures absolute stability of the rubric. While this is all very viable information, it does not yet state whether the rubric indeed measures story quality. As such, the correlation between the total rubric score and the popularity was investigated for all 30 stories in our database.

This brings us firtst to the question of how to measure popularity. AO3 tracks various types of metadata to measure this:

We choose to base Popularity on the number of Hits. This is done, because the other data is highly sensitive to human behaviour. Readers may have read a story and appreciate it, while not contributing to Kudos, Comments, or Bookmarks. There are various explanations for this, such as not having the time to click buttons, using a device (such as a smartphone) where typimng comments is less easy, or a story can be blocked to accept kudos from guests, etc. But Hits is unbiased by human behaviour. It simply measures the amount of times a story is clicked/loaded. Note that this is still fairly different from whether a story is actually read and/or appreciated, let alone quality. But it does seem like the best metadata to use.

Now, AO3 counts hits cumulative, meaning that it shows the total number of hits ever generated since the first part of the story was published. As such, a story that is published two years ago will like have double the hits of a story that is published one year ago. As such, we approximate popularity as Hits/Time.

However, Hits/Time is not yet a very good measure of Popularity, as new stories are shown at the top of the page on AO3 by default. As such, many stories typically collect more hits in the beginning of their life and less when they exist longer. So Pure Hits/Time would be unfair to older stories which have actually established a large number of hits. Therefore, we asked ChatGPT to construct correction factors for this problem based on typical Harry Potter Fandom behaviour or well-known stories. In our analysis, we have used:

As such, we define a story's Popularity score as: Correction*Hits/Time with Time measured in days between the first publication date of the story and 9th of April 2026 (our sampling date). This is a measure on how often the story is read, picked, chosen, etc. by people. Our suspicion is, that popularity is notibly correlation to the rubric score, but not perfect. As people choose their stories based on a combination of quality, marketing (how visible is the story on social media) and personal preferences.

While the rubric aims to measure quality independent of creative choices such as plot content, characterizations, worldbuilding choices, etc; personal preferences are usually highly dependend on such choices. Therefore, we expect to find a significant but far from perfect correlation between Popularity and Rubric score, where the correlated component is the 'independent story quality' we hope to identify.

The Popularity scores, as well as the rubric total scores can be found in the Results. All basic metadata can be found in the Raw Data in See Discussion for the outcome of these calculations and its interpretation.